Driver Project

SARS-CoV-2

During the COVID-19 pandemic, SARS-CoV-2 became the primary global focus of pathogen genomics. Rapid sequencing and open data sharing transformed surveillance and response, but also revealed major challenges around standards, interoperability, and quality assurance. PHA4GE played a central role in developing practical, community-driven bioinformatics standards, tools, and guidance that continue to support public health laboratories worldwide.

Problem Statement

The explosion of SARS-CoV-2 genomic data highlighted key obstacles:

  • Lack of harmonised metadata specifications for sample, clinical, and epidemiological information.
  • No Bioinformatics best-practices for data analysis and interpretation
  • Difficulty identifying and classifying recombinants and emerging variants such as Omicron.
  • Fragmented approaches to software development, lacking agreed principles for reproducibility and reliability.

Without coordinated standards and guidance, these issues reduced the comparability and actionability of genomic data across national and international systems.

Implementation Framework

PHA4GE, through its Bioinformatics Pipelines and Visualization Working Group, is coordinating efforts to:

Bioinformatics Guidance

Recombinant Detection

Variant Resources

Software Standards

QC Solutions

Metadata Specifications

Together, these outputs provided the foundation for interoperable data structures, robust pipelines, and reproducible analyses—contributions that not only advanced COVID-19 response but also shaped PHA4GE’s cross-pathogen standards work moving forward.

Resources

This publication describes the development of standardized contextual data quality-control (QC) tags by PHA4GE to support the responsible sharing and reuse of lower-quality or purpose-specific genomic datasets. Implemented using ontologies and adopted by public health networks such as FDA’s GenomeTrakr, the tags improve dataset discoverability, interpretation, and transparency across public repositories.

PHA4GE provides guidance on identifying and characterising recombinant SARS-CoV-2 genomes, addressing challenges in lineage assignment and breakpoint detection. The document highlights accessible bioinformatics resources to support consistent and systematic recombination surveillance.

PHA4GE developed a SARS-CoV-2 contextual data specification package, implemented through a structured collection template and supporting protocols, to harmonise and support submission of genomic and contextual data to public repositories. Adoption of the standard improves data interoperability, reuse, and integration, and is supported by NCBI’s BioSample database.

This publication describes the development of an open, harmonised SARS-CoV-2 contextual data specification extending the INSDC pathogen package. The specification supports interoperable metadata collection, data submission to public repositories, and improved reuse and integration of genomic data for COVID-19 surveillance and research.

PHA4GE provides a community-driven, living guide to SARS-CoV-2 genomic analysis, identifying common bioinformatics challenges and open-source resources for public health use. The document supports ongoing collaboration and continuous improvement.

PHA4GE provides a resource-focused guide to support genomic analysis of the SARS-CoV-2 Omicron variant, addressing challenges posed by its mutation profile. The document highlights open-access bioinformatics tools to support public health surveillance and research.

This publication describes the development of an open, harmonised SARS-CoV-2 contextual data specification created by PHA4GE to support interoperable metadata collection and submission to public repositories. Implemented through standardised templates, protocols, and tools, the specification improves data consistency, reuse, and integration, and is now supported by NCBI’s BioSample database to enhance global COVID-19 genomic surveillance.

PHA4GE provides QC guidance for SARS-CoV-2 genomic sequencing, addressing common sources of variability and data quality issues in NGS workflows. The document highlights practical bioinformatics approaches to support reliable genomic surveillance.

PHA4GE developed an open-source SARS-CoV-2 contextual data specification to support consistent, interoperable genomic surveillance. The standard enables harmonised data submission, improved reuse, and integrated analysis across public health systems.

Related Links

During a thought-provoking session at the recent ABPHM conference organised and funded by Wellcome Connecting Science, several esteemed researchers, led by Jen Jennifer Gardy from the Bill & Melinda Gates Foundation, USA, shared their perspectives on the COVID-19 pandemic.

The Africa CDC conducts these trainings for better disease control and prevention in Africa by integrating pathogen genomes and bioinformatics into public health surveillance and outbreak investigations. This also creates a networking platform and collaboration among Member states and to maintain a support system for public health emergencies in the present and future.