The PHA4GE Genotyping Contextual Data Specification

The PHA4GE Genotyping Contextual Data Specification is a draft, ontology-based framework designed to standardize how microbial genotyping methods and results are reported and shared. Developed by the PHA4GE Data Structures Working Group, it provides machine-readable, FAIR-aligned attributes that improve the consistency, comparability, and interoperability of genotype data across laboratories, public repositories, and public health platforms. By harmonizing genotyping metadata—including methods, databases, software, and confidence values—the specification reduces duplication of effort and enhances searchable, reusable genomic surveillance data.
Primer Schemes Repository

The Primer Schemes repository is a versioned, community-driven resource for standardized tiled amplicon primer scheme definitions used in pathogen sequencing. Designed to eliminate ambiguity in naming and versioning, it promotes FAIR data principles by improving the findability, accessibility, interoperability, and reusability of primer schemes and associated sequencing data. With machine-readable indexing, structured scheme specifications, and validation via the Primaschema tool, this repository supports consistent, transparent, and reproducible genomic surveillance workflows across pathogens including SARS-CoV-2, MPXV, and Nipah virus.
The HPAI Contextual Data Specification

The HPAI Contextual Data Specification is a draft, ontology-based framework designed to standardize and harmonize contextual data for Highly Pathogenic Avian Influenza (HPAI) virus surveillance. Developed in collaboration with PHA4GE, it supports consistent, interoperable, and FAIR data collection through DataHarmonizer templates, field and term reference guides, and curation SOPs. This evolving specification enables improved data quality, integration, and global collaboration across public health, food, environmental, and host-specific surveillance efforts.
The Mpox contextual data specification package: a data curation toolkit to support collaborative pathogen genomic surveillance

During the 2022 and 2024 global Mpox outbreaks, a standardized contextual data specification was developed to support public health genomic surveillance of MPXV. The specification defines ontology-based fields and controlled vocabularies for harmonized capture of sample metadata, epidemiological, clinical, laboratory, and methodological information, with emphasis on geo-temporal context, data provenance, and sampling strategy. Implemented within the open-source DataHarmonizer platform, the MPXV specification enables structured curation, validation, and transformation of surveillance data and is currently in use in Canada, with international applicability and extensibility to other pathogens.
Infrastructure Working Group GitHub Repository

PHA4GE’s Infrastructure Working Group provides guidance and resources to address common challenges in designing and maintaining bioinformatics infrastructure for public health. These materials support standardisation, portability, and reproducibility of workflows across diverse institutional and resource settings.
Pathoplexus

Pathoplexus is an open-source viral genomics database designed to support transparent data sharing, flexible attribution, and advanced analysis of pathogens of public health importance. Built as a community-driven platform with open governance and interoperable data access, Pathoplexus aims to strengthen global research and public health responses to infectious diseases.
PHA4GE Training

PHA4GE Training provides practical, public health–focused genomics training, covering bioinformatics, data standards, and pathogen surveillance. Our courses help learners build real-world skills to analyse, interpret, and apply genomic data in routine surveillance and outbreak response.
Pathoplexus: towards fair and transparent sequence sharing

This piece highlights Pathoplexus, a community-driven viral genomics database developed with input from PHA4GE to advance open data sharing, ethical use, and transparent governance in public health genomics.
Fixing the plumbing: Building interoperability between wastewater genomic surveillance datasets and systems using the PHA4GE contextual data specification

This publication presents the PHA4GE wastewater contextual data specification, an ISO-compatible, ontology-based standard developed with global partners to support interoperable wastewater genomic surveillance. Implemented through open-source tools and shared frameworks, the specification enables harmonised data integration and serves as a model for broader environmental and metagenomic surveillance standards.
Wastewater surveillance reveals disease trends in South Africa

From COVID-19 to measles, scientists are showing how wastewater surveillance can expose underreported infections and strengthen national health monitoring.