
Genomic analysis of SARS-CoV-2 has made a profound impact in support of the global response to the pandemic. The ability to carry out analysis on samples is increasingly being considered as a crucial functionality of public health labs. However, many labs are faced with the daunting task of starting an entire bioinformatics program and integrating the appropriate tools and software from the ground up. The benefits these technologies bring to public health are invaluable.
In an effort to bridge the gap in integrating these technologies for SARS-CoV-2 (SC2) analysis, PHA4GE Pipelines and Visualization Working Group members collaborated on a guidance document to define the major challenges and to highlight various open source resources that have emerged from the public health community. Conceptualizing the challenges faced aided in identifying and accentuating the major open access and open source resources available in the public health space.
One of the major challenges identified by the Working Group is the generation of census genome assemblies from PCR tiling NGS Data. Tiled amplification sequencing through the ARTIC protocols for example is one of the most commonly adopted approaches for generating SC2 sequencing data. These sequencing experiments generate thousands of amplicon reads that represent fragments of the original SC2 genome present in the sample. The raw data that these labs generate are often amplicon reads. As a result, one of the initial bioinformatics demands that laboratories face is the assembly of PCR tiling Next-generation sequencing (NGS) data into a contiguous SC2 genome. The generation of consensus genomes is incredibly powerful as it allows for further downstream analysis such as lineage typing and genomic epidemiology studies that help to inform public health decision making.
Working Group members noted across multiple public health laboratories, was the sharing of SARS-CoV-2 sequence data. The sharing of sample read and assembly data through internationally accessible databases allows insights to be drawn about how the virus is spreading and mutating across the globe. In making the data available to international researchers, strong minded public health decisions can be made; nonetheless preparing and submitting data to these repositories can be a challenge in itself.
The screening for Variants of Concern (VOC) or in essence making lineage and clade assignments for sequenced SC2 samples bears crucial influence on the decisions made by public health officials. Thus the invitation to accurately and reliably screen for VOC’s such as B.1.617.2 (Delta) is a critical component to the Bioinformatics analysis of SC2 genomes carried out by public health laboratories. Genetic relatedness as inferred through performing phylogenetic analysis on collections of SC2 samples can be a powerful proxy for epidemiological associations that can help resolve transmission networks, enable real-time surveillance, provide insights on genetic variance over time and support local outbreak interventions. Gaining access to the Bioinformatics Solutions for SARS-CoV-2 phylogenetic analysis can greatly benefit public health efforts.
Particular attention was made to demonstrate open access and open source solutions to these tasks to reduce the barrier to access and can be assessed thoroughly by the greater community of public health bioinformatics practitioners. We knew that as soon as we made this resource document available, that necessary modifications were inevitable to account for new software, resources and other approaches released by the public health bioinformatics community. With this in mind, the document is hosted on a public GitHub page to enable timely updates from the working group but also continuous community contributions. A special thank you to all Working Group members and external persons who participated in the creation of the document. It is worth noting that the document is a reflection of the opinions of our Working Group and to enhance its value we gladly welcome and encourage external collaboration. Additions to the document can be made via raised issues/pull requests or emailing us at the Working Group ([email protected])
On the 24th of June, the Working Group hosted Dawn Roellig and Jillann Hagey from the US CDC’s Technical Outreach and Assistance for States Team (TOAST) who presented their Menu to assist labs starting out with SC2 sequencing. The Menus cover wet-lab (library preparation) all the way to submitting data to public repositories whilst highlighting the various workflow options available for Illumina and Oxford Nanopore.
Forthcoming, the Working Group is making great progress on a Quality Control (QC) Solutions for SARS-CoV-2 genomic analysis collaborative document which will be released in the coming weeks. NGS has expanded the approach of genomic analysis for pathogen surveillance systems, While the demand for NGS continues to grow, the quality of NGS sequencing data can be affected by library preparation and sequencing processes, systematic variation in quality scores across sequence reads, biases in sequencing due to base composition, and less-than optimal library fragment sizes and indexes. The collaborative document aims to assist in defining the QC challenges for SC2 genomic analysis and suggest QC systems solutions to address them. Keep a look out on the PHA4GE social media pages and website for further announcements.
In the upcoming quarter, the Working Group has set its sights on discussing validation sets solutions for SC2 genomic analysis; Informing public health action through genomic visualization and dashboarding; and supporting the research software developer arena in assisting in collaboration, containerization and showcasing best practices.
In conclusion, if any of the topics mentioned in this update may be of particular interest, please feel free to join the Working Group to participate in discussions at our meetings or on the PHA4GE Slack Channel. New contributions are immensely welcomed.
by Jamie Southgate, on behalf of the Bioinformatics Pipelines and Data Visualization Working Group