Participate in ethics and data sharing community  | ​  Learn More 

Bioinformatics Solutions for Mpox Genomic Analysis

PHA4GE Bioinformatics Pipelines & Visualization Working Group
Libuit KG, Southgate J, Ünal G, Maguire F, Smith E, Kapsak S, van Heusden P, Wright S, Neher R, Diallo A

Current Version

Overview

Genomic analysis of Mpox virus (MPXV) samples by public health laboratories is a critical component in understanding the global outbreak. The integration and awareness of appropriate bioinformatics tools to support these endeavors are potential challenges.

In an attempt to assist this integration process, the Bioinformatics Pipelines and Visualization Working Group of the Public Health Alliance for Genomic Epidemiology (PHA4GE) has drafted this living document to help define the major bioinformatics challenges for MPXV genomic analysis and suggest various open-source and freely available bioinformatics resources to address them.

Please note that the bioinformatics resources listed in this document are simply an attempt to highlight the most accessible solutions as per the opinions of our working group and in no way represent a comprehensive list of all available MPXV bioinformatics resources. If this document fails to include a valuable public health resource or in some way mischaracterizes a resource mentioned, we encourage community collaboration through pull-requests and/or raised GitHub issues.


Background

Mpox is a viral zoonosis which belongs to genus Orthopoxvirus in the family Poxviridae. The virus can be transmitted to humans from animals. After the eradication of smallpox in 1980, mpox emerged and became the most important Orthopoxvirus for public health aspects. The virus is an enveloped double-stranded DNA virus and has two distinct genetic clades: the central African (Congo Basin) clade and the west African clades. Historically known as the Congo Basin can cause more severe disease and more transmissible WHO. The clinical presentation of this virus is similar to smallpox but some vaccination with smallpox can help individuals for cross-immunity. Lethality rate varies %1-10 and transmission between humans mainly occurs either direct contact or body fluids and via droplets Berthet, N. et al..

MPXV is a linear DNA genome of ≈197 kb. Like other Orthopoxviruses, the central coding region sequence (CRS) at MPXV is between ≈56000–120000 and is highly conserved. The genes in the terminal end of MPXV genome responsible for immunomodulation, host range and pathogenicity and also contains at least 4 ORF in the ITR region Kugelman, JR et al..


Public Mpox Case Databases

This repository contains dated records of curated Mpox cases from the 2022 outbreak (April – ), a data dictionary, and a script used to pull contents from a spreadsheet into JSON and CSV files.

The downloadable data file contains information on the number of mpox cases reported by EU/EEA countries or collected throughout epidemiologic intelligence at ECDC. Each row contains the corresponding data for a country, day of reporting, number of cases and source of information (data are in long format). The file is updated twice a week. You may use the data in line with ECDC’s copyright and data usage policy.

This report provides an overview of the total number of cases of mpox identified by ECDC and the WHO Regional Office for Europe through IHR mechanisms and official public resources and case-based data through The European Surveillance System (TESSy) up to 9 August 2022. The first summary table and maps (first two tabs) describe the number of cases identified through the different platforms. The following figures and tables describe national case-based data for surveillance of mpox reported in TESSy from all the countries and areas of the WHO European Region, including the 24 countries of the European Union (EU) and the additional three countries of the European Economic Area (EEA).


Bioinformatics Challenges for Public Health

The PHA4GE Bioinformatics Pipeline and Visualization Working Group has defined four key public health bioinformatics challenges for genomic analysis of SC2 samples:

  1. Generating consensus assemblies
  2. Submission of sequence data to international accessible databases
  3. Screening for Variants of Concern
  4. Performing Phylogenetic analysis of MPXV datasets


Open-Access/Source Bioinformatics Solutions & Resources


Video resources


Sequencing resources


Generating consensus assemblies

  • TheiaCoV workflows (for Illumina SE/PE, ONT, and fasta files) with MPXV input variables
    • Supports amplicon and metagenomic data
  • GalaxyProject MPXV analysis effort
    • Only supports Illumina PE metagenomic data
  • Nextflow workflow from the Utah PHL
    • Supports amplicon and metagenomic data
  • Epi2Me
    • Only supports metagenomic data – Viral-Recon:
    • Workflow for raw read quality control, de-hosting, assembly, variant calling, and consensus generation for illumina and nanopore mpox data. Currently does not include pre-built support for mpox (e.g. reference genome, reference annotations, nextclade dataset, and amplicon schemes) but these can be user-supplied on the command line and should be appropriate to the sequencing method (e.g. for amplicon sequencing using the reference used to create the amplicon scheme and for metagenomic sequencing, to be consistent with Nextstrain, you can use NC_063383.1.fasta, NC_063383.1.gff, with the nextclade dataset nextclade_hMPXV_B1_pseudo_ON563414_XXXXXXX).


Submission of sequence data to international accessible databases


Screening for Variants of Concern

  • Nextclade
    • assignment of consensus sequences to nextstrain clades, quality control, and mutation effect annotation. References pre-built for inferred ancestral mpox, the human mpox clade, and the specific B.1 human mpox clade.


Performing Phylogenetic analysis of MPXV datasets

  • Augur
    • A bioinformatics toolkit for phylogenetic analysis which constructs phylogenetic trees that can be visualized in NextStrain
  • Nextstrain Mpox build workflow
    • Workflow to perform contextualized phylogenetic analysis of mpox consensus sequences (by default using the human mpox reference genome NC_063383.1)
  • Taxonium
    • Tool for exploring large phylogenetic trees – Mpox sequences from GenBank


Publicly available data

To help getting started with phylogenetic analysis, Nextstrain provides MPXV data available on NCBI in aggregated form:

Pairwise alignments with Nextclade against the reference sequence MPXV-M5312_HM12_Rivers, insertions relative to the reference, and translated ORFs are available:

Subscribe to the PHA4GE Newsletter

We're committed to your privacy. PHA4GE uses the information you provide to us to contact you about our relevant content. You may unsubscribe from these communications at any time.

Follow PHA4GE

Related Articles

Data Repositories Working Group: Welcome to our new Chairs!

Arthur Shem Kasambula and Dr. Emma Hodcroft have joined PHA4GE’s Data Repositories Working Group as co-chairs, aiming to advance tools and databases like Pathoplexus for improved pathogen data sharing. Their efforts will drive consensus-driven solutions and technical recommendations to enhance usability and integration across global data systems.

PHA4GE Newsletter – August 2024

Discover Pathoplexus, a cutting-edge database for human viral pathogens, enhancing data submission and accessibility on key viruses like Ebolavirus and West Nile. Join the PHA4GE Data Repositories Working Group to help shape this vital resource. Plus, explore new Mpox guidance from our Bioinformatics group, and insights from the latest PHA4GE member survey.