Participate in ethics and data sharing community | Learn More
Data Structures
Inception: April 2020
# of Members: 30+
Chair: Emma Griffiths
University of British Columbia, Canada
Vice-Chair: Finlay Maguire
Contact: [email protected]
How data is structured, organized, managed and stored greatly impacts how it can be used and integrated with existing data. Standardized data structures and interchange formats are critical to the development of an open software ecosystem that will empower the microbial genomics community to analyze and govern their own data. The working group comprises researchers, bioinformaticians and domain experts from around the globe, representing public health agencies, research institutions and different large public databases. As one of PHA4GE’s Technical Working Groups, the Data Structures team focuses on the development, adaptation and standardization of data models for microbial sequence data, contextual metadata, analytical results, and workflow metrics. Through the adoption of data models we hope to improve the transparency, interoperability, and reproducibility of public health sequencing workflows.
Antimicrobial resistance is a global health problem that contributes to tens of thousands of deaths per year around the globe. A number of widely-used AMR gene detection tools are currently available, which differ in terms of their inputs, functionality (including parameters and reference databases), and outputs. Differences in the meaning, structure and range of values in the different outputs of these tools can make comparing and interpreting results difficult for public health practitioners and researchers. To address these issues, we are developing a standardized AMR gene detection output specification to better harmonize the AMR detection results across tools and resources and improve interoperability. To support the specification, we have mapped the outputs of different tools to the standard and are developing biopython-compatible parsers that will transform the variable outputs to the PHA4GE standard. We have also created a fully automated pipeline that will run arbitrary microbial genome datasets through almost all currently available species-agnostic AMR gene detection tools. This pipeline can be used in tandem with the parsers and the standard to better enable comparisons of data and for benchmarking.
Genome sequencing of the SARS-CoV-2 virus has been a key tool for understanding the epidemiological spread of the disease at global, national and local scales. In the face of the current pandemic, we identified a clear and present need for a fit-for-purpose, open-source SARS-CoV-2 contextual data (metadata) standard. As such, we have developed a SARS-CoV-2 contextual data specification that incorporates publicly available community standards, as well as additional fields and guidance appropriate for public health surveillance and analyses. The specification is implemented via a collection template, as well as an array of protocols and tools to support the harmonization and submission of sequence data and contextual information to public repositories. Well-structured, rich contextual data adds value, promotes reuse, and enables aggregation and integration of disparate data sets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19.
St George's, University of London
SIB Swiss Institute of Bioinformatics
Washington State Department of Health
Institut Pasteur Dakar
National University of Sciences and Technology
McMaster University
Laboratório Nacional de Computação Científica
Government
Theiagen Genomics
McMaster University
University of Antwerp
University of Canterbury
National Center for Biotechnology Information
APHL, Global Health
University of Oxford
McMaster University
McMaster University
US CDC
EMBL-EBI
Theiagen Genomics
Simon Fraser University
BCCDC
National Malaria Elimination Centre, Lusaka, Zambia; Malaria Control & Elimination Partnership for Africa
University of Oxford / Wellcome Sanger Institute
US CDC
BCCDC
Swiss TPH
The Arctic University of Norway
Association of Public Health Laboratories
Dalhousie University
U.S. Food and Drug Administration
University of Western Ontario
National Center for Biotechnology Information
Monash University
Public Health Ontario
Harvard University, Broad Institute / Gates Foundation
Imperial College London
WHO - IPSN
UKHSA
Oregon State Public Health Laboratory
Landmark University, Omu-Aran Kwara State
US CDC
Imperial College London
University of Monastir
NIH
Universiteit Antwerpen
Thermo Fisher Scientific
United Arab Emirates University-Al Ain (UAEU)
Nigerian Institute of Medical Research, Yaba
University of Lagos
Instituto de Investigaciones en Bacteriología y Virología Molecular, Universidad de Buenos Aires
US CDC
Big Data Institute, University of Oxford
German Federal Institute for Risk Assessment
Bill and Melinda Gates Foundation
Chan Zuckerberg Biohub
Chan Zuckerberg Initiative
Centre for Infectious Disease Genomics and One Health, Simon Fraser University
Institute for Medical Research (IMR), Ministry of Health Malaysia
Ashoka University
FDA
Tribhuvan University
The Arctic University of Norway
Utrecht University / WHO - Integrated Pathogen Surveillance Network
Robert Koch Institute
SANBI
Centre Pasteur du Cameroun
New Jersey Department of Health
Pwani University
University of Yaoundé I
Ibnou Zohr University
University of Virginia
Universidad de Buenos Aires
SANBI
University of Melbourne
ARTPARK, Indian Institute of Science
ARTPARK, Indian Institute of Science
ARTPARK, Indian Institute of Science
Public Health Virology, Queensland Health
University of Oxford, Pandemics Science Institute, CGPS
Scripps Research
St George's, University of London
University of Florida
NIH/NLM/NCBI
Michigan State University
Queensland health
Gujarat Biotechnology University
University of Ibadan
University of Texas Medical Branch
Consultant
Nigerian Institute of Medical Research
Livestock and Fisheries Development Program
Simon Fraser University
We're excited to announce episode 12 of the PHA4GE Genomic Horizons webinar series, featuring a talk by Dr. Su Datt Lam from the National University of Malaysia!
In collaboration with 17 public health laboratories across 10 countries, the PHA4GE Data Structures working group has developed and piloted a standardized output specification for the bioinformatic detection of AMR from microbial genomes.
Amidst an atmosphere charged with anticipation and excitement, the first day of the PHA4GE Conference burst onto the scene, igniting minds and sparking conversations that promised to shape the future of pathogen genomics, globally.