Participate in ethics and data sharing community  | ​  Learn More 

Call for Community Input: Exciting Updates from the Data Structures Working Group!

Projects Requesting Community Input

Wastewater Contextual Data Specification

Wastewater genomics data is increasingly being used to support a wide variety of surveillance programs and research to support public health decision making and responses e.g. detection and monitoring of emerging and existing pathogens, detection and characterization of antimicrobial resistance (AMR) determinants, quantifying infectious disease genotypes and assessing prevalence, etc. Wastewater genomic surveillance integrates diverse data from various sources, agencies, and systems, presenting challenges in harmonization, integration and interpretation. Structuring contextual data using data standards, aids accessibility and usability by both humans and computers supporting reuse.

The PHA4GE Wastewater Contextual Data Specification Package is scoped for data collection and sharing (within organizations, within networks and if desired, with public repositories) of both pathogen-agnostic genomics contextual data and genotypic attributes (such as antimicrobial resistance genes) derived from amplicon-based, WGS, and metagenomic sequencing approaches. The goal of the specification is to create a data interoperability framework that enables exchange and communication between data generators and consumers using wastewater for surveillance, that is extensible for other types of water-based surveillance (e.g. agriculture-based water monitoring, freshwater and marine environmental studies, and wildlife vector investigations), and is compatible with existing clinical and One Health standards. The specification was designed through consultation with partners involved in different wastewater projects in LMICs and HICs.

PHA4GE is currently seeking community input on the draft specification package, and will shortly be launching a new round of subgrants to better enable LMICs to participate in the co-creation, testing and feedback process. 

The specification package can be found at: https://github.com/pha4ge/Wastewater_Contextual_Data_Specification


New Projects

Data Standards Registry

Developing new contextual data standards is cumbersome and error prone, even for experts in the field. General knowledge of data standards development principles and practices is often localized within data standards communities of practice, limiting the ability of public health institutions to develop their own interoperable specifications. To democratize standards development, and to ensure best data standards development principles and practices are entrenched in the public health community, PHA4GE is developing a data standards registry to which members of the community may submit and share specifications that conform to best practices. As part of the project, PHA4GE is developing a set of tools and development guides, as well as a database of existing standards with searchable modules of attributes, enabling the assembly of fit-for-purpose specifications and collection instruments for different public health use cases (i.e. enabling guided, DIY standards development). The goal of this project is to enable labs to identify community-veted standards, or build their own use case-specific standards re-using existing fields and terms, increasing interoperability across the systems and datasets.

The project is in its initial scoping stage, and the Data Structures Working Group welcomes community input and participation. 


What Is A Sample?

The idea of a “sample” is defined in different ways in different lab information management systems, public repositories, and data models. Samples obtained in the field can be subsampled, and multiple organisms can be isolated from samples. Many colony picks can be obtained from isolates, and different sequencing libraries can be prepared from different colonies. Multiple sequences can be obtained from different library preparations and sequencing platforms. Different sequencing strategies (e.g. amplicon sequencing, single isolate sequencing, metagenomics) also create complexity. In short, the continuum from sampling to sequence is nuanced, hierarchical, and often with one:many relationships. The heterogeneity in language used to describe samples and downstream entities can be confusing for data entry, tracking, and sharing. The DSWG is exploring the concept of a “sample” and its derivations in a public health context, and will work toward developing standardized language that can be mapped to existing databases and repositories. It is hoped that this work will help inform the development of future data management systems, and provide guidance for data generators sharing data within different systems.

The project is in its initial scoping stage, and the Data Structures Working Group welcomes community input and participation.  

Subscribe to the PHA4GE Newsletter

We're committed to your privacy. PHA4GE uses the information you provide to us to contact you about our relevant content. You may unsubscribe from these communications at any time.

Follow PHA4GE

Related Articles

Wastewater Contextual Data Specification

The PHA4GE Wastewater Contextual Data Specification Package is scoped for data collection and sharing (within organizations, within networks and if desired, with public repositories) of both pathogen-agnostic genomics contextual data and genotypic attributes (such as antimicrobial resistance genes) derived from amplicon-based, WGS, and metagenomic sequencing approaches.

Wastewater Surveillance Guidance and Resources

This repository hosts guidance documents and resources developed by the PHA4GE Wastewater Surveillance Working Group. These documents address core challenges involved in designing effective wastewater surveillance strategies, analyzing wastewater pathogen sequencing and quantification data, and sharing this data with the global public health community.