Participate in ethics and data sharing community  | ​  Learn More 

Data Structures: Major updates hot off the presses!


On March 30th, 2021, PHA4GE hosted the PHA4GE Open Meeting, during which members of different working groups showcased various aspects of PHA4GE’s work. Data Structures Working Group lead Emma Griffiths discussed the rationale for adopting a metadata standard for SARS-CoV-2 and presented the metadata specification developed by the Data Structures Working Group (slides and presentation links here). While originally published as a preprint in the summer of 2020, Emma presented further updates to the specification which have been added as epidemiologists expand the different types of data we would like to collect and standardize as part of our response to COVID-19. In particular, the updated specification now has consistent fields and formats for capturing information pertaining to sampling strategy, exposures that may have led to transmission events, vaccination status, whether a case represents a reinfection event, and variant information about the infecting virus. These types of data can help epidemiologists and public health professionals monitor for the emergence of variants of interest, evaluate whether particular lineages of the virus might be evading immunity, and better understand SARS-CoV-2’s epidemiology and respond to outbreaks. Even broader benefits are realized when these data are collected in standardized ways, which can help to ensure data completeness and accuracy of datasets internally, or facilitate data sharing across multiple agencies. Importantly, through a collaboration with the National Center for Biotechnology Information (NCBI), the PHA4GE metadata specification for SARS-CoV-2 and the NCBI SARS-CoV-2 submission template have been aligned to facilitate easier tracking and submission of structured metadata and genomic data to NCBI’s public repositories.


Beyond COVID-19, the Data Structures Working Group has also been focusing on data consistency within the field of monitoring antimicrobial resistance. Members of the working group developed hAMRonization, a tool that harmonizes the outputs from different AMR detection tools to improve AMR genomic surveillance comparisons and communication. Working group member Ines Mendes presented the hAMRonization tool on May 5, 2021 at the Applied Bioinformatics and Public Health Microbiology conference, which occurred virtually this year. Furthermore, we have begun piloting the tool in AMR surveillance networks such as PAHO Latin American Network for AMR Surveillance (ReLAVRA) and PulseNet Latin America and the Caribbean. If you’d like to learn more about this work, you can find Data Structures Working Group member Josefina Campos and Marcelo Galas’ presentation at the PHA4GE Open Day here (https://pha4ge.org/open-meeting-2021/).


Alli Black

Data Structures Working Group Member

Subscribe to the PHA4GE Newsletter

We're committed to your privacy. PHA4GE uses the information you provide to us to contact you about our relevant content. You may unsubscribe from these communications at any time.

Follow PHA4GE

Related Articles

Bioinformatics conference seeks to make a real difference to disease outbreaks

At the 2025 ISCB-Africa ASBCB Bioinformatics Conference in Cape Town, UWC’s Professor Alan Christoffels urged students to bridge the gap between academic research and public health. Highlighting the role of bioinformatics and genomics in disease response, Christoffels emphasized the need for data standards, cross-border collaboration, and real-world impact in managing outbreaks across Africa and beyond.

New PHA4GE course plays to different learning styles

At the ISCB-Africa ASBCB Conference, Keaghan Brown presented PHA4GE’s new online course on wastewater surveillance, designed to integrate genomics, bioinformatics, and diverse learning styles. Developed with Farzaana Diedericks, the course uses an avatar-led format to teach real-time pathogen tracking through wastewater monitoring—enhancing public health capacity in Africa and globally.