Jump to Content

Collection Guide
Collection Title:
Collection Number:
Get Items:
COVID Tracking Project Records
MSS 2022-74  
View entire collection guide What's This?
Search this collection
Collection Details
Table of contents What's This?
  • Biographical / Historical
  • Processing Information
  • Scope and Contents
  • Arrangement
  • Immediate Source of Acquisition
  • Conditions Governing Access
  • Preferred Citation
  • Conditions Governing Use
  • Accruals
  • Existence and Location of Copies

  • Contributing Institution: University of California, San Francisco Archives & Special Collections
    Title: COVID Tracking Project records
    Creator: COVID Tracking Project
    Identifier/Call Number: MSS 2022-74
    Physical Description: 439.74 Gigabytes 640,295 digital files
    Date (inclusive): 2020-2022
    Abstract: The COVID Tracking Project was a volunteer organization launched from The Atlantic and dedicated to collecting and publishing the data required to understand the COVID-19 outbreak in the United States. Their records include data products and sources, blog and social media posts, correspondence, internal communications, and team documents. Technical infrastructure, such as data entry and quality tools, public code repositories, and internal databases are also present.
    Language of Material: English .

    Biographical / Historical

    The COVID Tracking Project was a network of volunteers that compiled, managed, and published state- and territory-level COVID-19 testing, hospitalization, and death data from March 7, 2020 through 2022. The Project began when two informal COVID-19 tracking initiatives - Alexis Madrigal and Robinson Meyer's research for The Atlantic, and Jeff Hammerbacher's independent data collection - combined their efforts; Erin Kissane joined soon after as co-founder. The Project grew to include hundreds of volunteers, who contributed via data entry, data quality review, web and infrastructure development, community development, and science/government communication. In addition, the Project employed up to 30 contractors to manage operations (divided into teams).
    Due to the difficulty of automatically aggregating and maintaining data from dozens of health departments, the Project enlisted volunteers to monitor, enter, and review COVID-19 tracking data. It also worked with government agencies to advocate for data transparency and standardization. The Project also collected and maintained data regarding COVID-19's racial impact (in partnership with Boston University's Center for Antiracist Research) and on long-term care centers. As a result of its focus on data quality and transparency, the Project was one of the leading sources for COVID-19 data in the United States.
    The COVID Tracking Project received administrative support from The Atlantic magazine, but was otherwise an independent organization. The Project received funding through donations from foundations.
    The final daily data publication was on March 7, 2021. The Project continued through the summer of 2021, during which time it produced reports, updated existing data, and wound down the Project's activities.

    Processing Information

    Processed by Alexander Duryee and Kevin Miller in 2022-2023.

    Scope and Contents

    The COVID Tracking Project records consists of COVID-19 data products, data creation and quality records, organizational records, correspondence, and code repositories.
    The Project existed entirely online, with no physical headquarters, as reflected by the nature of the records. Files primarily came from Google Drives created under the covidtracking.com account, which were used to create, manage, and share documents, spreadsheets, presentations, and other files. Other sources of records included Github for code, Amazon S3 for screenshot storage, Airtable for project management, and Front for correspondence.
    The Teams series documents the activities of the Project's teams, which managed tasks such as data entry, data quality, data and web infrastructure, and community development. As all of the Project's work was done by a team, these records represent the organization's internal operations, and encompass the full breadth of its activities. When possible, team lead files are present as Google Takeout exports, which include the lead's Google Drive and GMail records.
    The Archive Files contains files selected by Project contractors and volunteers for preservation in the collection. This series includes data spreadsheets, training materials, community records, data quality notes, and personnel records.
    The Datasets represent the final published data products created by the Project. These were the Project's main output and its primary function, with the Testing and Outcomes (TACO) dataset being the best known. Included are the final COVID Racial Data Tracker (CRDT) and Long-Term Care (LTC) datasets, and annotations, which contain "per-state, per-metric structured notes on state reporting practices".
    Screenshots contain records of the data sources used by Project teams. The majority of items in this series are screenshots of government websites; however, other data formats, such as Excel spreadsheets and PDFs are included. Each primary data product had its own collection of screenshots; in addition, unpublished and secondary research, such as variants and vaccinations, had their sources captured. The screenshots were designed to provide a record of provenance for the Project's data and a stable backup for potentially unstable data.
    The Slack series includes export of internally public discussions from covidtracking.slack.com, which was the Project's primary hub for communication and internal activity. Access copies of discussions were generated by the archivists, which include files and images associated with Slack messages.
    Social Media contains records from the Project's Twitter and Instagram accounts (including Twitter Direct Messages), along with newsletters emailed to a mailing list.
    The Github Repositories series includes public code and data repositories published by the Project. Github was used to maintain active code for the website, API, and other technical infrastructure for the Project; in addition, certain research data was published on Github. One repository, issues, used Github's issues feature as a public forum to discuss errors, questions, and features.


    The collection has been subdivided into seven series: Teams, Archive Files, Datasets, Screenshots, Slack, Social Media, and GitHub Repositories.

    Immediate Source of Acquisition

    This collection was donated to the UCSF Archives and Special Collections by The Atlantic magazine in 2022.

    Conditions Governing Access

    The UCSF Archives and Special Collections policy places access restrictions on material with privacy issues for a specific time period from the date of creation. Restrictions are noted at the series level. This collection will be reviewed for sensitive content upon request. Contact the UCSF Archivist for information on access to restricted files.

    Preferred Citation

    The UCSF Archives and Special Collections policy places access restrictions on material with privacy issues for a specific time period from the date of creation. Restrictions are noted at the series level. This collection will be reviewed for sensitive content upon request. Contact the UCSF Archivist for information on access to restricted files.

    Conditions Governing Use

    Copyright has not been assigned to the Library and Center for Knowledge Management. All requests for permission to publish or quote from material must be submitted in writing to the UCSF Archivist. Permission for publication is given on behalf of the Library and Center for Knowledge Management as the owner of the physical items and is not intended to include or imply permission of the copyright holder, which must also be obtained by the researcher.


    No future additions are expected.

    Existence and Location of Copies

    Data Explorer is available at: https://explore.covidtracking.com/
    Oral histories are available on Calisphere: https://calisphere.org/collections/28036/

    Subjects and Indexing Terms

    COVID-19 Testing
    Public health
    Data Science
    Pandemics and COVID-19