Sharing Data
In our work comparing different approaches to pathogen detection and piloting a biosurveillance system we are collecting a range of data, primarily metagenomic sequencing. We are excited to share with other groups, both to support progress on problems that are key to our mission and to allow scientists exploring a wide range of unrelated questions to work with this rich data. We aim to share as much as possible, but some is subject to access restrictions from our partners.
We’re currently able to share data from:
- Boston Swab Sampling: we've been collecting a small number of nasal swab samples at busy public places around greater Boston. Sequences we identify as potentially human-infecting viruses are linked from our sample log in FASTQ format. In the future we plan to make all non-human-genome sequences public, but need to improve our filtering first to ensure we don't accidentally share human DNA.
- Los Angeles Wastewater Sequencing: we collaborated with Jason Rothman, formerly of Katrine Whiteson’s lab at the University of California, Irvine, and now at his own lab at the University of California, Riverside, to sequence and analyze wastewater. A preprint is available on SSRN and the raw data (45B read pairs) is available on SRA (PRJNA1198001).
- Ongoing Wastewater Sequencing: we're collaborating with Marc Johnson's group at the University of Missouri to sequence wastewater from multiple metropolitan areas. As of 2025-05-16 this is 368B read pairs going back to samples collected in December 2023, with the addition of about 35B per week. Of these, 95B read pairs from Boston MA, Chicago IL, and Riverside CA are available on SRA (PRJNA1247874). We are working on approvals and automation to submit data to SRA on an ongoing basis, but in the meantime if you'd like access to non-public data please send us a description of your planned research and we may be able to share it.