Data Structures
Inception: April 2020
# of Members: 30+
Chair: Emma Griffiths
University of British Columbia, Canada
Vice-Chair: Finlay Maguire
Contact: [email protected]
How data is structured, organized, managed and stored greatly impacts how it can be used and integrated with existing data. Standardized data structures and interchange formats are critical to the development of an open software ecosystem that will empower the microbial genomics community to analyze and govern their own data. The working group comprises researchers, bioinformaticians and domain experts from around the globe, representing public health agencies, research institutions and different large public databases. As one of PHA4GE’s Technical Working Groups, the Data Structures team focuses on the development, adaptation and standardization of data models for microbial sequence data, contextual metadata, analytical results, and workflow metrics. Through the adoption of data models we hope to improve the transparency, interoperability, and reproducibility of public health sequencing workflows.
Antimicrobial resistance is a global health problem that contributes to tens of thousands of deaths per year around the globe. A number of widely-used AMR gene detection tools are currently available, which differ in terms of their inputs, functionality (including parameters and reference databases), and outputs. Differences in the meaning, structure and range of values in the different outputs of these tools can make comparing and interpreting results difficult for public health practitioners and researchers. To address these issues, we are developing a standardized AMR gene detection output specification to better harmonize the AMR detection results across tools and resources and improve interoperability. To support the specification, we have mapped the outputs of different tools to the standard and are developing biopython-compatible parsers that will transform the variable outputs to the PHA4GE standard. We have also created a fully automated pipeline that will run arbitrary microbial genome datasets through almost all currently available species-agnostic AMR gene detection tools. This pipeline can be used in tandem with the parsers and the standard to better enable comparisons of data and for benchmarking.
Genome sequencing of the SARS-CoV-2 virus has been a key tool for understanding the epidemiological spread of the disease at global, national and local scales. In the face of the current pandemic, we identified a clear and present need for a fit-for-purpose, open-source SARS-CoV-2 contextual data (metadata) standard. As such, we have developed a SARS-CoV-2 contextual data specification that incorporates publicly available community standards, as well as additional fields and guidance appropriate for public health surveillance and analyses. The specification is implemented via a collection template, as well as an array of protocols and tools to support the harmonization and submission of sequence data and contextual information to public repositories. Well-structured, rich contextual data adds value, promotes reuse, and enables aggregation and integration of disparate data sets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19.

St George's, University of London

SIB Swiss Institute of Bioinformatics

Washington State Department of Health

Institut Pasteur Dakar

National University of Sciences and Technology

McMaster University

Laboratório Nacional de Computação Científica

Government

Theiagen Genomics

McMaster University

University of Antwerp

University of Canterbury

National Center for Biotechnology Information

APHL, Global Health

University of Oxford

McMaster University

McMaster University

US CDC

EMBL-EBI

Theiagen Genomics

Simon Fraser University

BCCDC

National Malaria Elimination Centre, Lusaka, Zambia; Malaria Control & Elimination Partnership for Africa

University of Oxford / Wellcome Sanger Institute

US CDC

BCCDC

Swiss TPH

The Arctic University of Norway

Association of Public Health Laboratories

Dalhousie University

U.S. Food and Drug Administration

University of Western Ontario

National Center for Biotechnology Information

Monash University

Public Health Ontario

Harvard University, Broad Institute / Gates Foundation

Imperial College London

WHO - IPSN

UKHSA

Oregon State Public Health Laboratory

Landmark University, Omu-Aran Kwara State

US CDC

Imperial College London

University of Monastir

NIH

Universiteit Antwerpen

Thermo Fisher Scientific

United Arab Emirates University-Al Ain (UAEU)

Nigerian Institute of Medical Research, Yaba

University of Lagos

Instituto de Investigaciones en Bacteriología y Virología Molecular, Universidad de Buenos Aires

US CDC

Big Data Institute, University of Oxford

German Federal Institute for Risk Assessment

Bill and Melinda Gates Foundation

Chan Zuckerberg Biohub

Chan Zuckerberg Initiative

Centre for Infectious Disease Genomics and One Health, Simon Fraser University

Institute for Medical Research (IMR), Ministry of Health Malaysia

Ashoka University

FDA

Tribhuvan University

The Arctic University of Norway

Utrecht University / WHO - Integrated Pathogen Surveillance Network

Robert Koch Institute

SANBI

Centre Pasteur du Cameroun

New Jersey Department of Health

Pwani University

University of Yaoundé I

Ibnou Zohr University

University of Virginia

Universidad de Buenos Aires

SANBI

University of Melbourne

ARTPARK, Indian Institute of Science

ARTPARK, Indian Institute of Science

ARTPARK, Indian Institute of Science

Public Health Virology, Queensland Health

University of Oxford, Pandemics Science Institute, CGPS

Scripps Research

St George's, University of London

University of Florida

NIH/NLM/NCBI

Michigan State University

Queensland health

Gujarat Biotechnology University

University of Ibadan

University of Texas Medical Branch

Consultant

Nigerian Institute of Medical Research

Livestock and Fisheries Development Program

Simon Fraser University

University of Pretoria