MPox Contextual Data Specification

The MPox Contextual Data Specification is an ontology-based, FAIR-aligned framework designed to standardize metadata collection for mpox genomic surveillance. Implemented through the DataHarmonizer platform, the package includes structured collection templates, field and term reference guides, and curation and new term request SOPs to support consistent, interoperable data sharing. Supporting both Canadian and international use cases, the specification enhances data quality, comparability, and collaborative pathogen surveillance across laboratories and public health agencies.

Wastewater metagenomics in Africa: Opportunities and challenges

Wastewater metagenomics is transforming how we monitor microbial communities, offering a powerful and cost-effective approach for pathogen surveillance, antimicrobial resistance (AMR) tracking, and environmental health insights. In Africa, high-throughput sequencing (HTS) of wastewater can serve as an early warning system for infectious disease outbreaks—especially in densely populated areas with limited sanitation—while also supporting discovery of emerging pathogens and novel microbial functions. This overview highlights both the major opportunities and the key challenges, including infrastructure limitations, bioinformatics capacity gaps, variable wastewater composition, and data governance considerations, and outlines the collaborations and investments needed to unlock equitable public health and economic benefits.

Influenza Guidance Document

The Influenza Guidance Document provides a comprehensive public health resource for standardized influenza genomic analysis and surveillance. Covering viral genome structure, seasonal and pandemic influenza, zoonotic transmission, drug resistance, and antigenic drift and shift, this guide integrates bioinformatics workflows with practical tools and open-access platforms such as INSaFLU, IRMA, Nextstrain, BV-BRC, and FluSurver. Designed for both beginners and experienced bioinformaticians, the document supports global genomic surveillance, vaccine strain selection, outbreak investigation, and pandemic preparedness across human and animal health sectors.

HIV Bioinformatics Solutions Document

HIV Bioinformatics Solutions is a practical guidance resource that helps bioinformaticians navigate HIV genomics for public health and research applications. Covering HIV genome structure, evolution, and subtypes, it outlines clear analysis pathways for genomic characterization and subtyping, drug resistance surveillance, resistance prediction and drug development, and genomic epidemiology. The document pairs real-world case studies with recommended sequencing strategies, curated tool lists, and key reference databases (including Stanford HIVdb and Los Alamos), supporting more standardized, reproducible, and accessible HIV bioinformatics workflows worldwide.

Minimal Pathogen Agnostic Contextual Data Specification

The Minimal Pathogen Agnostic Contextual Data Specification defines an international, ontology-based minimal metadata standard for public health and One Health genomic surveillance. Designed for use across pathogens, sequencing types, and global initiatives, the framework supports interoperable, timely, and privacy-conscious data sharing for both single isolate and metagenomic sequencing. By standardizing essential fields such as sample identifiers, geographic location, collection date, organism, and sequencing purpose, this specification enhances traceability, comparability, and decision-making in public health emergencies while promoting FAIR data principles.

The PHA4GE QC Contextual Data Tags Specification

The PHA4GE QC Contextual Data Tags Specification provides standardized, ontology-based annotations for labeling public health pathogen sequence datasets with known quality control (QC) issues. Designed to improve transparency, discoverability, and responsible reuse of lower-quality genomic data, the specification defines five structured QC fields—including controlled vocabulary for quality determinations and issues—that can be included in public repository submissions such as NCBI SRA. Organism-agnostic and sequencing technique-agnostic, these FAIR-aligned tags support training, validation, and optimization of public health genomics workflows while enhancing communication between data submitters and users.

convAST Tool

convAST is a command-line tool designed to convert antibiotic susceptibility test (AST) results from laboratory instruments, EMRs, and LIMS into an INSDC-compatible standardized format. Supporting major platforms such as Vitek, Microscan, Phoenix, and Sensititre, convAST applies structured mappings to transform tabular AST outputs into harmonized, submission-ready data. Built using LinkML schemas and a modular object model, convAST streamlines interoperability, improves data consistency, and facilitates integration of antimicrobial resistance results into public sequence repositories.

SARS-CoV-2 Contextual Data Specification

The PHA4GE SARS-CoV-2 Contextual Data Specification provides a standardized, open-source framework for collecting, structuring, and sharing high-quality metadata to support COVID-19 genomic surveillance. Developed by the Public Health Alliance for Genomic Epidemiology (PHA4GE), the package includes a color-coded Excel collection template, ontology-mapped controlled vocabularies, JSON schema, reference guides, and detailed submission protocols for GISAID, NCBI, and ENA. Designed to promote FAIR data principles, interoperability, and global data sharing, this specification enhances the consistency, usability, and public health impact of SARS-CoV-2 sequence metadata.

hAMRonization Workflow

The hAMRonization Workflow is a proof-of-concept pipeline designed to harmonize antimicrobial resistance (AMR) detection outputs from multiple bioinformatics tools into a single, standardized report. By running leading AMR gene detection tools—including AMRFinderPlus, RGI, ResFinder, SRST2, DeepARG, and more—against genomic assemblies or sequencing reads, the workflow uses hAMRonization parsers to collate results into a unified, interoperable format. Built with Snakemake and available via Conda or containerized environments (Docker/Podman), this workflow streamlines comparative AMR analysis and supports reproducible, FAIR-aligned genomic surveillance.