Biographical / Historical
Processing Information
Scope and Contents
Arrangement
Immediate Source of Acquisition
Conditions Governing Access
Preferred Citation
Conditions Governing Use
Accruals
Existence and Location of Copies
Contributing Institution:
University of California, San Francisco Archives & Special Collections
Title: COVID Tracking Project records
Creator:
COVID Tracking Project
Identifier/Call Number: MSS 2022-74
Physical Description:
439.74 Gigabytes
640,295 digital files
Date (inclusive): 2020-2022
Abstract: The COVID Tracking Project was a volunteer organization launched from The Atlantic and dedicated to collecting and publishing
the data required to understand the COVID-19 outbreak in the United States. Their records include data products and sources,
blog and social media posts, correspondence, internal communications, and team documents. Technical infrastructure, such as
data entry and quality tools, public code repositories, and internal databases are also present.
Language of Material: English .
Biographical / Historical
The COVID Tracking Project was a network of volunteers that compiled, managed, and published state- and territory-level COVID-19
testing, hospitalization, and death data from March 7, 2020 through 2022. The Project began when two informal COVID-19 tracking
initiatives - Alexis Madrigal and Robinson Meyer's research for The Atlantic, and Jeff Hammerbacher's independent data collection
- combined their efforts; Erin Kissane joined soon after as co-founder. The Project grew to include hundreds of volunteers,
who contributed via data entry, data quality review, web and infrastructure development, community development, and science/government
communication. In addition, the Project employed up to 30 contractors to manage operations (divided into teams).
Due to the difficulty of automatically aggregating and maintaining data from dozens of health departments, the Project enlisted
volunteers to monitor, enter, and review COVID-19 tracking data. It also worked with government agencies to advocate for data
transparency and standardization. The Project also collected and maintained data regarding COVID-19's racial impact (in partnership
with Boston University's Center for Antiracist Research) and on long-term care centers. As a result of its focus on data quality
and transparency, the Project was one of the leading sources for COVID-19 data in the United States.
The COVID Tracking Project received administrative support from The Atlantic magazine, but was otherwise an independent organization.
The Project received funding through donations from foundations.
The final daily data publication was on March 7, 2021. The Project continued through the summer of 2021, during which time
it produced reports, updated existing data, and wound down the Project's activities.
Processing Information
Processed by Alexander Duryee and Kevin Miller in 2022-2023.
Scope and Contents
The COVID Tracking Project records consists of COVID-19 data products, data creation and quality records, organizational records,
correspondence, and code repositories.
The Project existed entirely online, with no physical headquarters, as reflected by the nature of the records. Files primarily
came from Google Drives created under the covidtracking.com account, which were used to create, manage, and share documents,
spreadsheets, presentations, and other files. Other sources of records included Github for code, Amazon S3 for screenshot
storage, Airtable for project management, and Front for correspondence.
The Teams series documents the activities of the Project's teams, which managed tasks such as data entry, data quality, data
and web infrastructure, and community development. As all of the Project's work was done by a team, these records represent
the organization's internal operations, and encompass the full breadth of its activities. When possible, team lead files are
present as Google Takeout exports, which include the lead's Google Drive and GMail records.
The Archive Files contains files selected by Project contractors and volunteers for preservation in the collection. This
series includes data spreadsheets, training materials, community records, data quality notes, and personnel records.
The Datasets represent the final published data products created by the Project. These were the Project's main output and
its primary function, with the Testing and Outcomes (TACO) dataset being the best known. Included are the final COVID Racial
Data Tracker (CRDT) and Long-Term Care (LTC) datasets, and annotations, which contain "per-state, per-metric structured notes
on state reporting practices".
Screenshots contain records of the data sources used by Project teams. The majority of items in this series are screenshots
of government websites; however, other data formats, such as Excel spreadsheets and PDFs are included. Each primary data product
had its own collection of screenshots; in addition, unpublished and secondary research, such as variants and vaccinations,
had their sources captured. The screenshots were designed to provide a record of provenance for the Project's data and a stable
backup for potentially unstable data.
The Slack series includes export of internally public discussions from covidtracking.slack.com, which was the Project's primary
hub for communication and internal activity. Access copies of discussions were generated by the archivists, which include
files and images associated with Slack messages.
Social Media contains records from the Project's Twitter and Instagram accounts (including Twitter Direct Messages), along
with newsletters emailed to a mailing list.
The Github Repositories series includes public code and data repositories published by the Project. Github was used to maintain
active code for the website, API, and other technical infrastructure for the Project; in addition, certain research data was
published on Github. One repository, issues, used Github's issues feature as a public forum to discuss errors, questions,
and features.
Arrangement
The collection has been subdivided into seven series: Teams, Archive Files, Datasets, Screenshots, Slack, Social Media, and
GitHub Repositories.
Immediate Source of Acquisition
This collection was donated to the UCSF Archives and Special Collections by
The Atlantic magazine in 2022.
Conditions Governing Access
The UCSF Archives and Special Collections policy places access restrictions on material with privacy issues for a specific
time period from the date of creation. Restrictions are noted at the series level. This collection will be reviewed for sensitive
content upon request. Contact the UCSF Archivist for information on access to restricted files.
Preferred Citation
The UCSF Archives and Special Collections policy places access restrictions on material with privacy issues for a specific
time period from the date of creation. Restrictions are noted at the series level. This collection will be reviewed for sensitive
content upon request. Contact the UCSF Archivist for information on access to restricted files.
Conditions Governing Use
Copyright has not been assigned to the Library and Center for Knowledge Management. All requests for permission to publish
or quote from material must be submitted in writing to the UCSF Archivist. Permission for publication is given on behalf of
the Library and Center for Knowledge Management as the owner of the physical items and is not intended to include or imply
permission of the copyright holder, which must also be obtained by the researcher.
Accruals
No future additions are expected.
Existence and Location of Copies
Subjects and Indexing Terms
COVID-19 Testing
Public health
Data Science
Pandemics and COVID-19