Participate in ethics and data sharing community  | ​  Learn More 

Data Structures: Major updates hot off the presses!


On March 30th, 2021, PHA4GE hosted the PHA4GE Open Meeting, during which members of different working groups showcased various aspects of PHA4GE’s work. Data Structures Working Group lead Emma Griffiths discussed the rationale for adopting a metadata standard for SARS-CoV-2 and presented the metadata specification developed by the Data Structures Working Group (slides and presentation links here). While originally published as a preprint in the summer of 2020, Emma presented further updates to the specification which have been added as epidemiologists expand the different types of data we would like to collect and standardize as part of our response to COVID-19. In particular, the updated specification now has consistent fields and formats for capturing information pertaining to sampling strategy, exposures that may have led to transmission events, vaccination status, whether a case represents a reinfection event, and variant information about the infecting virus. These types of data can help epidemiologists and public health professionals monitor for the emergence of variants of interest, evaluate whether particular lineages of the virus might be evading immunity, and better understand SARS-CoV-2’s epidemiology and respond to outbreaks. Even broader benefits are realized when these data are collected in standardized ways, which can help to ensure data completeness and accuracy of datasets internally, or facilitate data sharing across multiple agencies. Importantly, through a collaboration with the National Center for Biotechnology Information (NCBI), the PHA4GE metadata specification for SARS-CoV-2 and the NCBI SARS-CoV-2 submission template have been aligned to facilitate easier tracking and submission of structured metadata and genomic data to NCBI’s public repositories.


Beyond COVID-19, the Data Structures Working Group has also been focusing on data consistency within the field of monitoring antimicrobial resistance. Members of the working group developed hAMRonization, a tool that harmonizes the outputs from different AMR detection tools to improve AMR genomic surveillance comparisons and communication. Working group member Ines Mendes presented the hAMRonization tool on May 5, 2021 at the Applied Bioinformatics and Public Health Microbiology conference, which occurred virtually this year. Furthermore, we have begun piloting the tool in AMR surveillance networks such as PAHO Latin American Network for AMR Surveillance (ReLAVRA) and PulseNet Latin America and the Caribbean. If you’d like to learn more about this work, you can find Data Structures Working Group member Josefina Campos and Marcelo Galas’ presentation at the PHA4GE Open Day here (https://pha4ge.org/open-meeting-2021/).


Alli Black

Data Structures Working Group Member

Subscribe to the PHA4GE Newsletter

We're committed to your privacy. PHA4GE uses the information you provide to us to contact you about our relevant content. You may unsubscribe from these communications at any time.

Follow PHA4GE

Related Articles

Wastewater Contextual Data Specification

The PHA4GE Wastewater Contextual Data Specification Package is scoped for data collection and sharing (within organizations, within networks and if desired, with public repositories) of both pathogen-agnostic genomics contextual data and genotypic attributes (such as antimicrobial resistance genes) derived from amplicon-based, WGS, and metagenomic sequencing approaches.

Wastewater Surveillance Guidance and Resources

This repository hosts guidance documents and resources developed by the PHA4GE Wastewater Surveillance Working Group. These documents address core challenges involved in designing effective wastewater surveillance strategies, analyzing wastewater pathogen sequencing and quantification data, and sharing this data with the global public health community.