Well-structured, rich contextual data adds value, promotes reuse, and enables aggregation and integration of disparate data sets. We are delighted to announce that the PHA4GE Data Structures working group has released its first preprint on SARS-CoV-2 contextual data specification for open genomic epidemiology.
The preprint identifies a clear data standard which extends the INSDC pathogen package, to provide a contextual specification which is both harmonisable and publicly available.
Development of the specification was led by Dr. Emma Griffths and the PHA4GE Data Structures Working Group members spanning five continents and multiple time zones.
The specification can be implemented using a collection template, as well as an array of protocols and tools which support the harmonisation and submission of sequence data and contextual information to public repositories.
The adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19.
Link to preprint: here