Research

Publications

The publications on this page showcase the core scientific and implementation work of the Public Health Alliance for Genomic Epidemiology (PHA4GE). Together, they highlight how PHA4GE develops and shares open standards, tools, data models, and guidance to support genomic epidemiology and public health practice around the world.

Wastewater metagenomics is transforming how we monitor microbial communities, offering a powerful and cost-effective approach for pathogen surveillance, antimicrobial resistance (AMR) tracking, and environmental health insights. In Africa, high-throughput sequencing (HTS) of wastewater can serve as an early warning system for infectious disease outbreaks—especially in densely populated areas with limited sanitation—while also supporting discovery of emerging pathogens and novel microbial functions. This overview highlights both the major opportunities and the key challenges, including infrastructure limitations, bioinformatics capacity gaps, variable wastewater composition, and data governance considerations, and outlines the collaborations and investments needed to unlock equitable public health and economic benefits.

During the 2022 and 2024 global Mpox outbreaks, a standardized contextual data specification was developed to support public health genomic surveillance of MPXV. The specification defines ontology-based fields and controlled vocabularies for harmonized capture of sample metadata, epidemiological, clinical, laboratory, and methodological information, with emphasis on geo-temporal context, data provenance, and sampling strategy. Implemented within the open-source DataHarmonizer platform, the MPXV specification enables structured curation, validation, and transformation of surveillance data and is currently in use in Canada, with international applicability and extensibility to other pathogens.

This publication presents the PHA4GE wastewater contextual data specification, an ISO-compatible, ontology-based standard developed with global partners to support interoperable wastewater genomic surveillance. Implemented through open-source tools and shared frameworks, the specification enables harmonised data integration and serves as a model for broader environmental and metagenomic surveillance standards.

This publication introduces the PHA4GE Microbial Data-Sharing Accord, a set of consensus principles to guide the responsible secondary use of microbial data. The Accord provides clear, accessible guidance to promote trust, protect against misuse, and support evidence-based public health research and surveillance.

This publication describes the development of standardized contextual data quality-control (QC) tags by PHA4GE to support the responsible sharing and reuse of lower-quality or purpose-specific genomic datasets. Implemented using ontologies and adopted by public health networks such as FDA’s GenomeTrakr, the tags improve dataset discoverability, interpretation, and transparency across public repositories.

This publication describes the development of a standardized output specification and the hAMRonization tool to harmonise antimicrobial resistance (AMR) detection results across diverse bioinformatic tools. Developed with international public health laboratories, hAMRonization enables interoperable, unified AMR reporting and supports scalable integration into genomic surveillance workflows.

This publication describes the development of an open, harmonised SARS-CoV-2 contextual data specification extending the INSDC pathogen package. The specification supports interoperable metadata collection, data submission to public repositories, and improved reuse and integration of genomic data for COVID-19 surveillance and research.

This publication describes the development of an open, harmonised SARS-CoV-2 contextual data specification created by PHA4GE to support interoperable metadata collection and submission to public repositories. Implemented through standardised templates, protocols, and tools, the specification improves data consistency, reuse, and integration, and is now supported by NCBI’s BioSample database to enhance global COVID-19 genomic surveillance.

This publication presents a practical framework to support the identification and implementation of benefit sharing in research programmes. Using a socioecological model and case studies from genomics research during the SARS-CoV-2 pandemic, the framework helps researchers intentionally incorporate equitable benefit sharing from project inception.