PHA4GE Bioinformatics Pipelines & Visualization Working Group
Libuit KG, Spinler JK, Southgate J, Black A, Nekrutenko A, Neuhaus B, O’Cathail C, Lemmer D, Jones D, Smith E, Gnimpieba E, Guthrie J, Maturure P, Monsierurs P, Maier W, Langhorst B, Page A, & Niewiadomska AM
Overview
The World Health Organization (WHO) has classified the SARS-CoV-2 B.1.1.529 variant as a Variant of Concern (VOC) under the advice of the Technical Advisory Group on SARS-CoV-2 Virus Evolution (TAG-VE)—an independent group of experts that periodically monitors and evaluates the evolution of SARS-CoV-2 and assess if specific mutations and combinations of mutations alter the behavior of the virus. The WHO has assigned the B.1.1.529 VOC the label Omicron per their greek-letter key variant assignment system. The elevation of Omicron to a WHO-designated VOC was based on the TAG-VE’s assessment of the variant’s large number of genomic mutations and plausible impact on COVID-19 epidemiology.
The PHA4GE Pipelines and Visualization Working Group has created this document to highlight critical open-source/accesses resources to aid in the understanding and further analysis of the Omicron variant.
In no way does this document represent a comprehensive list of all available SC2 bioinformatics resources. If this document fails to include a valuable public health resource or in some way mischaracterizes a resource mentioned, we encourage community collaboration through pull-requests and/or raised GitHub issues.
Contents
- General Information on the Omicron Variant
- Potential impacts of Spike Protein Mutations
- Bioinformatics Resources and Considerations
General Information on the Omicron Variant
Below is a list of various educational material, public health announcements and publications, thechnical details and global trackers, phylogenetic visualiations, and resources to assist in data sharing and reporting of the Omicron variant.
Omicron Lineage and Clade Nomenclature
- The Omicron Variant is the WHO SARS-CoV-2 VOC label for the pango lineage B.1.1.529 (Nextstrain clade 21M) and all descendant lineages: BA.1 (Nextstrain clade 21K), BA.2 (Nextstrain clade 21.L) and BA.3 (Nextstrain clade 21M)
Educational Material
- Nature News Article – Heavily mutated Omicron variant puts scientists on alert: Overview of the identified variant and its potential public health impacts.
- Theiagen Genomics Primer the Omicron Variant (Video): To assist public health scientists’ understanding of the Omicron Variant, Frank Ambrosio recorded a small primer on the Omicron variant that includes an overview of the Nature news article by Ewen Callaway, visual depictions of key Omicron mutations, and the genetic diversity of Omicron relative to other SARS-CoV-2 variants using MicrobeTrace.
Public Health Announcements and Publications
- Classification of Omicron (B.1.1.529): SARS-CoV-2 Variant of Concern (World Health Organization)
- CDC Statement on B.1.1.529 (Omicron variant)
- CDC Science Brief: Omicron (B.1.1.529) Variant
- SARS-CoV-2 variants of concern as of 3 December 2021 (ECDC)
- Implications of the further emergence and spread of the SARS-CoV-2 B.1.1.529 variant of concern (Omicron) for the EU/EEAECDC (2021-12-02)
- SARS-CoV-2 variants of concern and variants under investigation in England (UK Health Security Agency)
- Genomic surveillance of SARS-CoV-2 in Belgium ( National Reference Laboratory (UZ Leuven & KU Leuven))
- SARS-CoV-2 variants of concern and variants under investigation in England Variant of concern: Omicron, VOC21NOV-01 (B.1.1.529); Technical briefing 30 (2021-12-03)
Technical Details and Global Trackers
- Pango-designation proposed new lineage and the associated twitter thread (Tom Peacock)
- Proposal for third sublineage in B.1.1.529 (BA.3) (Andrew Rambaut)
- Includes table of shared and unique mutations across B.1.1.529, BA.1, BA.2, and BA.3
- Various resources for genomic information (e.g. defining mutations), visualizations, and global case counts over time:
- COV-Lineage Variant Summary Pages: B.1.1.529, BA.1, BA.2, and BA.3
- BV-BRC Lineage Profiles: BA.1, BA.2, & BA.3
- Outbreak.info Omicron Variant Report
- CoVariants 21K (Omicron) Profile
- CNCB RCoV19 Lineage Browser
- Galaxy EU Omicron Public Analysis: View of the Omicron lineage’s mutational pattern derived transparently and fully reproducibly from raw sequencing reads using the Galaxy Project bioinformatics platform
- Omicron Data Round Up: Summary of the Omicron variant and what can be inferred based on publicly-accessible data presented 2021-12-01 by Anna Niewiadomska
- COVID-19 Scenario Modeling Hub: Synthesis of over 30 COVID-19 models for public health forecasting
Phylogenetic Visualizations
Data Reporting and Sharing
- PHA4GE Resource on Data Sharing: Sharing of sample read and assembly data through internationally accessible databases allows insights to be drawn about how the virus is spreading and mutating across the globe; the more freely available these data are to international researchers and public health scientists, the stronger our decision making can be.
- PHA4GE Resource on Data Submission: Resources developed to assist in the preparation and submission of raw NGS read data (fastq files), SC2 consensus assemblies (fasta files), and contextual sample metadata to internationally-accessible databases such as NCBI, ENA, and GISAID
Potential Impacts of Spike Protein Mutations
The spike protein of the SARS-CoV-2 Omicron variant contains approximately 32 mutations, many of which have not been observed in previous VOCs. However, based on their location, several of these mutations have the potential to impact immune escape, transmissibility, and detection. Spike mutations found in the Omicron VOC can be analyzed in detail using the Stanford University Coronavirus Antiviral & Resistance Database.
- Up to 15 mutations have been observed within the receptor binding domain (RBD). The RBD region of the Spike protein interacts directly with the human receptor ACE2 and mutations in this region may have a direct impact on how well SARS-CoV-2 viral particles attach to a host cell.
- Approximately 8 mutations have been observed within the N-terminal domain (NTD). The NTD of the Spike protein aids in virus attachment and mutations in this region could also impact virus infectivity.
- Both the RBD and NTD are surface exposed areas of the Spike protein that are targeted by antibodies. Mutations in these regions have the potential to evade immunity by antibodies acquired through previous infection or vaccination.
- Three mutations occur near the furin cleavage site, the region of the Spike protein responsible for viral-host membrane fusion. Mutations in this region have the potential to affect viral entry into host cells.
Diagnostic and Sequencing Assays
Mutations in the SARS-CoV-2 genome can affect PCR-based diagnostic assays and genomic sequencing. For example, the ThermoFisher TaqPath probe targeting the Spike gene is known to result in S-gene target failure (SGTF) when amplifying nucleic acid preparations from VOC Alpha. This occurs when the SARS-CoV-2 genome contains a deletion resulting in the loss of amino acids 69-70 of the NTD. When coupled with the positive amplification of other SARS-CoV-2 genetic regions, the SGTF has been used as a diagnostic indicator of VOC presence SGF Deletion Assay.
- Thermo Fisher Scientific Confirms Detection of SARS-CoV-2 in Samples Containing the Omicron Variant with its TaqPath COVID-19 Tests: The Omicron variant contains the NTD deletion at amino acids 69/70 and results in SGTF by the TaqPath PCR assay.
- NEB’s Primer Monitor Tool: Monitor registered primer sets for overlapping sequence variants in Omicron.
- SARS-CoV-2 Artic V4.1 update for Omicron variant: Ten mutations in the Omicron VOC affect the Artic V4 primer scheme for whole genome sequencing. The Artic Network has designed 11 new primers to account for these mutations.
Bioinformatics Resources and Considerations
Genome assembly as well as clade and lineage assignment of Omicron variants should follow the same bioinformatics workflow recommendations outlined in this working group’s Bioinformatics Solutions for SARS-CoV-2 Genomic Analysis guidance document. Briefly, raw amplicon read data should be mapped to the Wuhan-1 reference genome and primer trimming performed before a consensus genome is called. Clade annd lineage assignment can then be made by analyzing the resulting consensus genome assemblies with the NextClade and Pangolin software, respectively.
Software Version Minimums
For laboraotires making clade and lineage assignements outside of the NextClade and Pangolin web applications, e.g. through a custom workflow available on CLI, Terra.Bio, or Galaxy Project, please ensure to utilize updated NextClade and Pangolin software capable of making an accurate Omicron clade and lineage designation:
- NextClade Software Version 1.7.0 (Dataset Tag >=2021-12-16T20:15:53Z)
- Pangolin Software Version 3.1.17 (Constellations >=0.1.0
Reference Sequences and Assemblies
- KRISP CERI NCBI BioProject of Omicron Data: Sequencing of the Omicron variant in South Africa by the Kwazulu-Natal Research Innovation and Sequencing Platform (KRISP) and the Centre for Epidemic Response and Innovation (CERI).
- NCBI SAMN23572360: Raw read and assembly data for the first Omicron idenfied in Minnesota, USA
- NCBI SAMN23637602: Raw reads and assembly data for first Omicron in Massachusetts, USA
- ENA Assemblies: ERZ4210179, ERZ4209688, ERZ4211168, ERZ4210738
- NCBI SAMN23998005: Raw read data for an Omicron variant sequecned with the ONT Midnight 1200 primers
SARS-CoV-2 Multiple Sequence Alignments
Primer dropouts in Omicron sequence data may lead to errant evolutionary inferences when performing phylogenetic analysis of SARS-CoV-2 genomes. A proposed work around to these dropout regions is to mask the spike region and adjust the molecular clock rate accordingly, as performed by Trevor Bedford in a recent phylodynamic analysis.