
In the face of a global pandemic and the rising global threat of antimicrobial resistance (AMR), the PHA4GE Data Structures Working Group (DS WG) has developed data specifications for standardizing SARS-CoV-2 contextual data as well as the variable outputs from many widely used AMR gene detection tools. Data standards are important for creating interoperability between databases and systems, and for more efficient data exchange. Members of the DS WG wrapped up the last quarter of 2020 on a busy note, presenting our work at five different international conferences focused on FAIR, interoperable data standards, and public health microbial genomics tools and resources for COVID-19 and AMR resistance.
The DS WG presented the SARS-CoV-2 standard to public health experts at the Grand Challenges Annual Meeting 2020 that took place between 19 – 21 October, hosted by the Bill and Melinda Gates Foundation. The Grand Challenges Annual Meeting aims to catalyze collaboration among researchers, funders, and other partners to accelerate innovation for impact in solving the world’s most urgent global health and development problems. The DS WG’s presentation was part of a PHA4GE panel discussion about Pathogen Genomics (including Dr. Emma Griffiths, Dr. Emma Hodcroft, Dr. Daniel Park, Dr. Anders Goncalves da Silva, and Dr. Nicola Mulder), chaired by PHA4GE Director Dr. Alan Christoffels (Professor and Director of the South African National Bioinformatics Institute (SANBI), University of the Western Cape).
The COVID-19 pandemic has brought a number of “big data” integration and analytical challenges, not just in terms of scale, but also in the variety of the data that is being generated and shared. Integration at this massive scale requires a coordinated effort by the clinical, research, government, informatics and other communities. Ontologies are hierarchies of controlled vocabulary where the fields and terms are linked using logical relationships, and the meanings of terms are disambiguated through the use of unique identifiers. Annotating data with ontology terms helps to standardize fields and terms across systems and sectors, better enables data to be understood and used by both humans and computers, and helps to prepare information for complex querying and analytical approaches such as machine learning and natural language processing. Dr. Emma Griffiths presented the SARS-CoV-2 data standard at the CIDO/WCO-2020 Workshop on COVID-19 Ontologies held on the 23rd and 30th of October 2020. The conference brought together experts in a number of biological and biomedical domains from around the world to discuss how ontologies and data standards can help provide solutions for some of these big data challenges.
A set of key rules for increasing the longevity, interoperability and utility of digital data, are known as the “FAIR principles” (where FAIR stands for Findable, Accessible, Interoperable, Reusable). From the 30 November – 4 December 2020, the CODATA International FAIR Convergence Symposium focused on challenges and consensus-building around the implementation of FAIR approaches to data for crisis reduction and response (e.g. networks and data exchange, disaster reduction), data specifications (semantic interoperability, metadata, data objects, FAIR workflows), policy (Sustainable Development Goals, traditional knowledge and ethical implications), and data stewardship (data federation, training). At the symposium, Dr. Duncan MacCannell, Dr. Emma Griffiths, Dr. Finlay Maguire, Inês Mendes, and Dr. Ruth Timme hosted a workshop with hands-on training for participants providing instructions on how to use the PHA4GE SARS-CoV-2 contextual data template, as well as the accompanying SOP and public repository submission protocols (GISAID, INSDC). The workshop also provided demonstrations of real-world implementations of the specification, putting data standards into practice. The demos, performed by Rhiannon Cameron, Dr. Dominique Anderson, and Dr. Anders Gonçalves da Silva, included:
1) The DataHarmonizer (https://github.com/Public-Health-Bioinformatics/DataHarmonizer/releases), a spreadsheet-based data management and validation application used by the CanCOGeN sequencing network. This tool also enables users to enter data and transform it into public health and public repository-ready formats at the click of a button.
2) The Austrakka data sharing platform (https://austrakka.net.au) – a central, secure, and private online location to share, store, analyse and view aggregated national and jurisdictional data, permitting access to real time analysis of integrated pathogen genomic data for public health across Australia.
3) Baobab LIMS – an open source Laboratory Information Management System (LIMS) for biobanking developed by African and European Researchers (https://baobablims.org).
The World Health Organization has declared that AMR is one of the top ten global public health threats facing humanity, and tackling it will take the combined resources and effort of researchers working across different disciplines. PHA4GE’s standardized AMR gene detection output specification was presented by Inês Mendes at the ASM Conference on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatic Pipelines (ASM NGS) that took place on 7-11 December 2020. “Drugs & Thugs: NGS to Combat AMR” stream explored new approaches for antimicrobial genotyping and the characterization of antimicrobial resistant microbial populations, and the development of new tools to improve their impact. Dr. Josefina Campos also described how the AMR output specification is enabling better harmonization and reporting of AMR resistance determinants between health agencies and laboratories in Latin America at the Wellcome Trust AMR conference Antimicrobial Resistance – Genomes, Big Data and Emerging Technologies in November. The conference brought together basic researchers, computer scientists, clinicians and policy makers interested in pathogen and human/host genomics, epidemiology and surveillance, machine learning, and the development of novel diagnostic tools.
To find out more about our tools and resources, or to learn where we’ll be presenting our work next, contact us at [email protected] and watch the PHA4GE Data Structures Working Group space.