Wastewater metagenomics in Africa: Opportunities and challenges

Wastewater metagenomics is transforming how we monitor microbial communities, offering a powerful and cost-effective approach for pathogen surveillance, antimicrobial resistance (AMR) tracking, and environmental health insights. In Africa, high-throughput sequencing (HTS) of wastewater can serve as an early warning system for infectious disease outbreaks—especially in densely populated areas with limited sanitation—while also supporting discovery of emerging pathogens and novel microbial functions. This overview highlights both the major opportunities and the key challenges, including infrastructure limitations, bioinformatics capacity gaps, variable wastewater composition, and data governance considerations, and outlines the collaborations and investments needed to unlock equitable public health and economic benefits.

The Mpox contextual data specification package: a data curation toolkit to support collaborative pathogen genomic surveillance

During the 2022 and 2024 global Mpox outbreaks, a standardized contextual data specification was developed to support public health genomic surveillance of MPXV. The specification defines ontology-based fields and controlled vocabularies for harmonized capture of sample metadata, epidemiological, clinical, laboratory, and methodological information, with emphasis on geo-temporal context, data provenance, and sampling strategy. Implemented within the open-source DataHarmonizer platform, the MPXV specification enables structured curation, validation, and transformation of surveillance data and is currently in use in Canada, with international applicability and extensibility to other pathogens.

Fixing the plumbing: Building interoperability between wastewater genomic surveillance datasets and systems using the PHA4GE contextual data specification

This publication presents the PHA4GE wastewater contextual data specification, an ISO-compatible, ontology-based standard developed with global partners to support interoperable wastewater genomic surveillance. Implemented through open-source tools and shared frameworks, the specification enables harmonised data integration and serves as a model for broader environmental and metagenomic surveillance standards.

PHA4GE quality control contextual data tags: standardized annotations for sharing public health sequence datasets with known quality issues to facilitate testing and training 

This publication describes the development of standardized contextual data quality-control (QC) tags by PHA4GE to support the responsible sharing and reuse of lower-quality or purpose-specific genomic datasets. Implemented using ontologies and adopted by public health networks such as FDA’s GenomeTrakr, the tags improve dataset discoverability, interpretation, and transparency across public repositories.

hAMRonization: Enhancing antimicrobial resistance prediction using the PHA4GE AMR detection specification and tooling

This publication describes the development of a standardized output specification and the hAMRonization tool to harmonise antimicrobial resistance (AMR) detection results across diverse bioinformatic tools. Developed with international public health laboratories, hAMRonization enables interoperable, unified AMR reporting and supports scalable integration into genomic surveillance workflows.

The PHA4GE SARS-CoV-2 Contextual Data Specification for Open Genomic Epidemiology

This publication describes the development of an open, harmonised SARS-CoV-2 contextual data specification extending the INSDC pathogen package. The specification supports interoperable metadata collection, data submission to public repositories, and improved reuse and integration of genomic data for COVID-19 surveillance and research.

Future-proofing and maximizing the utility of metadata: The PHA4GE SARS-CoV-2 contextual data specification package

This publication describes the development of an open, harmonised SARS-CoV-2 contextual data specification created by PHA4GE to support interoperable metadata collection and submission to public repositories. Implemented through standardised templates, protocols, and tools, the specification improves data consistency, reuse, and integration, and is now supported by NCBI’s BioSample database to enhance global COVID-19 genomic surveillance.

A framework for the promotion of ethical benefit sharing in health research

This publication presents a practical framework to support the identification and implementation of benefit sharing in research programmes. Using a socioecological model and case studies from genomics research during the SARS-CoV-2 pandemic, the framework helps researchers intentionally incorporate equitable benefit sharing from project inception.