Sharing Data
In our work comparing different approaches to pathogen detection and piloting a biosurveillance system we are collecting a range of data, primarily metagenomic sequencing. We are excited to share with other groups, both to support progress on problems that are key to our mission and to allow scientists exploring a wide range of unrelated questions to work with this rich data. We aim to share as much as possible, but some is subject to access restrictions from our partners.
We’re currently able to share data from:
- Boston Swab Sampling: we've been collecting a small number of nasal swab samples at busy public places around greater Boston. Sequences we identify as potentially human-infecting viruses are linked from our sample log in FASTQ format. In the future we plan to make all non-human-genome sequences public, but need to improve our filtering first to ensure we don't accidentally share human DNA.
- Los Angeles Wastewater Sequencing: we collaborated with Jason Rothman, formerly of Katrine Whiteson’s lab at the University of California, Irvine, and now at his own lab at the University of California, Riverside, to sequence and analyze wastewater. The sequencing is complete and available on SRA (PRJNA1198001), with a total of 45B read pairs. We’re aiming to make the metadata and other details public with a data paper in early 2025, but in the meantime please contact us if you have questions.
- Ongoing Wastewater Sequencing: we're collaborating with Marc Johnson's group at the University of Missouri to sequencing wastewater from multiple metropolitan areas. As of 2025-01-03 this is 136B read pairs going back to samples collected in December 2023, with the addition of about 16B every other week. Marc intends to submit a manuscript and make these public in early 2025, after which we'll make future sequencing runs public on an ongoing basis. In the meantime, if you'd like access to this data please send us a description of your planned research and we may be able to share it.