Participate in ethics and data sharing community  | ​  Learn More 

Responding to community needs: The Data Structures Working Group is updating its tools for AMR and SARS-CoV-2

The ability to compare how SARS-CoV-2 lineages are evolving around the world in different contexts depends on harmonizable contextual data across labs and datasets. Contextual data is the sample metadata, methods information as well as the lab, clinical and epidemiological data that enables the interpretation of sequence data.

In August 2020, the Data Structures Working Group (DSWG) responded to the need for a contextual data standard designed for public health genomic surveillance and pandemic response releasing its first contextual data specification. The specification contains standardized fields and terms for critical information such as sampling strategies, information about samples and hosts, as well as software and sequencing tools. Over the past two years, the specification has continued to evolve according to user data needs and requests.

In December 2021, the DSWG released a major update to the PHA4GE SARS-CoV-2 contextual data specification. Version 3.0 contains updated vocabulary, improved mappings to public repositories as well as contextual data recommendations made by the World Health Organization, and includes terms and identifiers from 24 different OBO Foundry Ontologies to improve interoperability and to implement FAIR principles (Findable, Interoperable, Accessible, Reusable) for scientific data management. The specification was also published in February 2022 in GigaScience, and is supported by a number of tools, reference materials and protocols for data curation and submission.

Read more about the specification package in GigaScience

In 2021, members of the DSWG worked with 10 teams across Africa and southeast Asia to implement data standards for antimicrobial resistance (AMR) and SARS-CoV-2, through seed funding from the Bill & Melinda Gates Foundation. The goals of these partnerships included piloting the standards and resources developed by PHA4GE in real-world settings, learning from partners in a wide variety of contexts about how they should be improved, and building lasting relationships with public health bioinformatics practitioners in the community.

One such partnership included a team led by researchers at the National University of Malaysia (UKM). The team piloted PHA4GE’s “hAMRonization” –  a specification and command-line parsing tool used to harmonize the outputs of widely used gene and mutation detection software in a standardized report – for sharing data about clinically relevant methicillin resistant Staphylococcus aureus isolates between labs in Malaysia and Argentina. In the past month, the DSWG met with the team, which included researchers Dr. Hui-min Neoh, Dr. Su Datt Lam, Dr. Sabrina Di Gregorio, Mr. Mia Yang Ang, Dr. Tengku Zetty Maztura Tengku Jamaluddin and Prof. Dr. Sheila Nathan to discuss further improvements to the tool and its supporting materials.

These discussions included ways the Malaysia team could increase the usability of the tool for non-bioinformatician colleagues working in hospitals. The team created a “Google Collaboratory” (known as a Google Colab) which enables users to execute python code through a browser without any software installations. Google Colabs are Jupyter notebooks that run in the cloud and are highly integrated with Google Drive, making them easy to set up, access, and share.

The team hopes that the simplicity of the Google Colab version of hAMRonization will better enable their colleagues (which include clinicians and microbiologists less familiar with command-line) to quickly compare antimicrobial resistance in hospital settings. The DSWG is now working with the Malaysia team to make the Google Colab publicly available in GitHub. The Malaysia team will be hosting a workshop in April that will include training for using hAMRonization via the Google Colab.   

Subscribe to the PHA4GE Newsletter

We're committed to your privacy. PHA4GE uses the information you provide to us to contact you about our relevant content. You may unsubscribe from these communications at any time.

Follow PHA4GE

Related Articles

Wastewater Contextual Data Specification

The PHA4GE Wastewater Contextual Data Specification Package is scoped for data collection and sharing (within organizations, within networks and if desired, with public repositories) of both pathogen-agnostic genomics contextual data and genotypic attributes (such as antimicrobial resistance genes) derived from amplicon-based, WGS, and metagenomic sequencing approaches.

Wastewater Surveillance Guidance and Resources

This repository hosts guidance documents and resources developed by the PHA4GE Wastewater Surveillance Working Group. These documents address core challenges involved in designing effective wastewater surveillance strategies, analyzing wastewater pathogen sequencing and quantification data, and sharing this data with the global public health community.