
In the second half of 2020, the PHA4GE Infrastructure Working Group carried out a survey of SARS-CoV-2 sequencing around the globe. While material shortages of all kinds, from reagents to power, were noted in that survey, a consistent theme was shortage of skilled workers and ready to use bioinformatics solutions. With a novel pathogen and fast evolving global perspective, SARS-CoV-2 left many public health and research labs scrambling to respond while also tasked with an immense diagnostic burden.
The detection of novel variants of concern (VOCs) in South Africa, the UK and Brazil at the end of 2020 led to renewed interest in pathogen genomic sequencing. While the first half of 2020 saw a substantial amount of “first genome” work with labs aiming to produce the first few genomes for a region, the discovery of these VOCs was rooted in substantial investment in genomic surveillance, with its roots in work going back years. The alarm that resulted by these discoveries, however, led to a fresh wave of interest in virus sequencing and genomic epidemiology.
Some PHA4GE Infrastructure Working Group members have been involved in supporting this drive and in the process have had to face questions of infrastructure. Many public health labs have limited on-site computing and, even when they do have computing environments, have limited skills in configuring software environments and other systems administration tasks. In this context Platform as a Service (PAAS) offerings like Terra.Bio have proved useful as they allow off-site use of pre-configured scientific workflows.
On the other hand, cloud resources require a network connection that is stable and provides enough capacity to transfer the data that requires analysis. While SARS-CoV-2 has a small genome and thus sequencing files for the virus tend be smaller than for some other pathogens (e.g. Ṗlasmodium, the parasite that causes malaria), whether the network available to public health labs in lower income settings is good enough to work on the cloud is still a question under investigation.
For those who do require self-hosted environments, container technology such as Docker simplifies the process of software installation. Software environments and workflows deployed using containers offer a middle ground between the ease of use of cloud and the skills required to run an entire computing infrastructure locally.
In the next months the Infrastructure Working Group will complete analysis of last year’s survey and forge ahead with pilot research projects on the use of both offsite and on-site options for SARS-CoV-2 genome analysis. We welcome collaborators in this journey and hope that our insights will ease the path for the many labs making the transition into bioinformatics supporting public health.
To learn more about other activities of the PHA4GE Infrastructure workgroup, view their webpage or contact [email protected]