
The ability to compare how SARS-CoV-2 lineages are evolving around the world in different contexts depends on harmonizable contextual data across labs and datasets. Contextual data is the sample metadata, methods information as well as the lab, clinical and epidemiological data that enables the interpretation of sequence data.
In August 2020, the Data Structures Working Group (DSWG) responded to the need for a contextual data standard designed for public health genomic surveillance and pandemic response releasing its first contextual data specification. The specification contains standardized fields and terms for critical information such as sampling strategies, information about samples and hosts, as well as software and sequencing tools. Over the past two years, the specification has continued to evolve according to user data needs and requests.
In December 2021, the DSWG released a major update to the PHA4GE SARS-CoV-2 contextual data specification. Version 3.0 contains updated vocabulary, improved mappings to public repositories as well as contextual data recommendations made by the World Health Organization, and includes terms and identifiers from 24 different OBO Foundry Ontologies to improve interoperability and to implement FAIR principles (Findable, Interoperable, Accessible, Reusable) for scientific data management. The specification was also published in February 2022 in GigaScience, and is supported by a number of tools, reference materials and protocols for data curation and submission.
Read more about the specification package in GigaScience
In 2021, members of the DSWG worked with 10 teams across Africa and southeast Asia to implement data standards for antimicrobial resistance (AMR) and SARS-CoV-2, through seed funding from the Bill & Melinda Gates Foundation. The goals of these partnerships included piloting the standards and resources developed by PHA4GE in real-world settings, learning from partners in a wide variety of contexts about how they should be improved, and building lasting relationships with public health bioinformatics practitioners in the community.
One such partnership included a team led by researchers at the National University of Malaysia (UKM). The team piloted PHA4GE’s “hAMRonization” – a specification and command-line parsing tool used to harmonize the outputs of widely used gene and mutation detection software in a standardized report – for sharing data about clinically relevant methicillin resistant Staphylococcus aureus isolates between labs in Malaysia and Argentina. In the past month, the DSWG met with the team, which included researchers Dr. Hui-min Neoh, Dr. Su Datt Lam, Dr. Sabrina Di Gregorio, Mr. Mia Yang Ang, Dr. Tengku Zetty Maztura Tengku Jamaluddin and Prof. Dr. Sheila Nathan to discuss further improvements to the tool and its supporting materials.
These discussions included ways the Malaysia team could increase the usability of the tool for non-bioinformatician colleagues working in hospitals. The team created a “Google Collaboratory” (known as a Google Colab) which enables users to execute python code through a browser without any software installations. Google Colabs are Jupyter notebooks that run in the cloud and are highly integrated with Google Drive, making them easy to set up, access, and share.
The team hopes that the simplicity of the Google Colab version of hAMRonization will better enable their colleagues (which include clinicians and microbiologists less familiar with command-line) to quickly compare antimicrobial resistance in hospital settings. The DSWG is now working with the Malaysia team to make the Google Colab publicly available in GitHub. The Malaysia team will be hosting a workshop in April that will include training for using hAMRonization via the Google Colab.