Working Group

Data Repositories

Improving how public health sequence data are submitted, accessed, and reused.

The PHA4GE Data Repositories Working Group works with the global community and major sequence repositories to improve community-driven standards for sequence submission, query, retrieval, and metadata management. Our aim is to make public health sequence data easier to share, discover, and use while supporting equitable data sharing and FAIR (Findable, Accessible, Interoperable, Reusable) principles.

A current focus of the Working Group is Pathoplexus, a community-driven platform designed to support rapid and equitable sequence data sharing for public health.

What you’ll work on

This Working Group brings together public health users, repository developers, and data stewards to improve repository workflows and tools.

Key focus areas include:

  • Sequence submission, query, and retrieval workflows

  • Repository-based tools, databases, and APIs (Application Programming Interfaces)

  • Management and standards for sequence-associated metadata

  • Community-driven repository development, including Pathoplexus

  • SRA management for community projects (e.g. umbrella BioProjects, organizational standards)

  • Link-outs to sensitive epidemiologic, clinical, and commercial data resources

  • Data sharing and governance, including equitable use and reuse

  • Engagement with major public repositories (NCBI, ENA, DDBJ / INSDC)

  • Cloud-based repository interfaces and integrations

Key deliverables include:

  • Consensus recommendations for public health–relevant submission and metadata practices

  • Community feedback on repository features, APIs, and user interfaces

  • Guidance to improve documentation, usability, and discoverability of repository tools

  • Support for training, workshops, and hackathons focused on repository use

  • Recommendations that help repositories better support pathogens of public health interest

Why join

If you submit data to public repositories, build tools that depend on them, or rely on repository data for surveillance and research, your experience can help shape systems used worldwide.

By joining, you can:

  • Influence how repositories support public health needs

  • Help improve usability and accessibility of sequence data

  • Contribute to more equitable and FAIR data sharing practices

  • Collaborate directly with repository developers and global stakeholders

Overview

Inception: August 2024

# of Members: 20+

Chairs

Emma Hodcroft

Swiss TPH

Arthur Shem Kasambula

Independent Consultant

Projects

Resources

Pathoplexus is an open-source viral genomics database designed to support transparent data sharing, flexible attribution, and advanced analysis of pathogens of public health importance. Built as a community-driven platform with open governance and interoperable data access, Pathoplexus aims to strengthen global research and public health responses to infectious diseases.

Members

Remigio Arteaga | Laboratorio Para Investigaciones Biomédicas, Facultad de Ciencias de la Vida, Escuela Superior Politécnica del Litoral (ESPOL), Guayaquil, Ecuador | Ecuador

Sunday Ayuba Buru | Genomic Research Lab, College of Health Sciences, Kaduna State University Kaduna | Nigeria

Swapan Bhuiyan | NYC Public Health Laboratory | United States

Anderson Brito | Instituto Todos pela Saúde (ITpS) | Brazil

Chaoran Chen | ETH Zurich | Switzerland

Nyasha Chin’Ombe | University of Zimbabwe | Zimbabwe

Alan Christoffels | SANBI-UWC | South Africa

Matheus de Andrade Silva | Instituto Todos pela Saúde | Brazil

Karthikeyan Govindan | Wellcome Trust Research Laboratory, Christian Medical College and Hospital | India

Eneida Hatcher | NIH/NLM/NCBI | United States

Emma Hodcroft | Institute for Social and Preventive Medicine at the University of Bern | Switzerland

Kathryn Holt | LSHTM | United Kingdom

Arthur Shem Kasambula | Makerere University | Uganda

No’emie Lefrancq | ETH Zürich | Switzerland

Duncan MacCannell | CDC | United States

Francis Mubigalo | National Institute of Biomedical Research | Congo, The Democratic Republic of the

Kingsley Achilike Njoku | Georgetown Global Health Nigeria | Nigeria

Purity Oreng’ | Kenya Institute Of Primate Research | Kenya

Theo Sanderson | Francis Crick Institute | United Kingdom

Kin-Ming (Clement) Tsui | University of British Columbia | Canada

Tesfahun Wubetu | Addis Ababa university | Ethiopia

Tesfahun Wubetu | Addis Ababa university | Ethiopia

Related Research

This piece highlights Pathoplexus, a community-driven viral genomics database developed with input from PHA4GE to advance open data sharing, ethical use, and transparent governance in public health genomics.