Mobility network models of COVID-19
explain inequities and inform reopening.
If you are writing about or interested in our work, we highly recommend reading our FAQ for the main takeaways and for non-technical answers to common questions!
We model the spread of SARS-CoV-2 within 10 of the largest metropolitan statistical areas in the United States using dynamic mobility networks that encode the hourly movements of 98 million people between 56,945 neighborhoods and 552,758 points of interest (like restaurants, gyms, and grocery stores) using 5.4 billion edges. A video of our model in Chicago, starting from March 1, is shown below: from left, the plots show the total number of visits to points of interest in the mobility data; the model's predicted fraction of the population in the Susceptible, Exposed, Infectious, and Removed states; and the model's predicted geographic distribution of infections. After showing that our model accurately fits case counts, we use it to study the equity and efficiency of fine-grained reopening strategies.
What data do you use?
We use anonymized, aggregated data from SafeGraph, a company that tracks human movement patterns using cell phone data. Our data records how many people go to points of interest (POIs) like restaurants and grocery stores at every hour, and also records the neighborhoods they come from.
What time period do you analyze?
We analyze data from the first wave of infections in the spring, from March to May 2020, in 10 large US metro areas. Please be careful to avoid overgeneralizing from that time period, because mobility patterns, infection rates, and the precautions that people take (like mask-wearing) have changed since then. We are working on analysis with more recent data.
How does your model predict infections?
Every neighborhood starts our simulation with some low level of infection; this represents the beginning of March 2020. As the simulation progresses hour-by-hour, people move around based on the mobility data. If there are multiple people visiting the same POI in the same hour, and some are infectious while others are susceptible, then our model predicts that there is some probability of new infections occurring. How large that probability is depends on the area of the POI, how long visitors stay there, and how many of the current visitors are infectious. Our model also gives people a chance of getting infected at home from household transmission. So, for every hour, we move people around, and we simulate the number of new infections happening at each POI and in each neighborhood at home. This is how we are able to model who is infected, where they are infected, and when they are infected.
Do you use data on how many people were infected
at different types of places?
No. While we use mobility data on how many people visited each different place, we calibrate our model for each metro area against the overall case counts for that metro area. While data on where people were infected might in principle come from contact tracing efforts, unfortunately, that kind of data was not available at a large scale in the areas that we studied.
What does your model say about the risks of different categories of places, like restaurants or gyms?
We model the risk of fully reopening a POI category: that is, how much would predicted infections increase if we returned this category to pre-pandemic levels of mobility without taking additional precautions like increased mask-wearing or occupancy caps. (Because we model the risk of reopening a category, we can find that a category is risky to reopen even if it was closed during most of the time period we study.) We find that on average across metro areas, reopening full-service restaurants, gyms, hotels, and cafes produces the largest predicted increase in infections. This is because 1) there are many POIs in these categories (especially restaurants), and 2) when fully reopened, these places tend to be relatively crowded with people spending long times there. Our analysis agrees with prior work about which categories of business are risky to reopen.
Importantly, the risk to society of fully reopening a category is not equivalent to how risky it is for you, as an individual, to visit a POI in that category now. There are a number of differences: 1) we study the risk of reopening the entire category, not the risk of one person visiting one of these places; 2) POIs within the same category vary a lot in how risky they are; 3) we study data from the spring, but nowadays many places have modified their levels of mobility and may also be taking additional precautions like mask-wearing.
What does your model say about "superspreader POIs"?
The model predicts that a small fraction of POIs accounted for a large fraction of infections at POIs during the time range we study. For example, in Chicago, the model predicts that 10% of POIs accounted for 85% of infections at POIs. These riskier places come from multiple categories (eg, they are not all restaurants or gyms), but tend to have higher densities of visitors, and visitors who stay longer. Importantly, not all infections occur at POIs, because the model also allows people to get infected in their homes. Across metro areas, our model predicts that infections at POIs account for 70% of all infections on average.
How do you model reduced occupancy reopening? Why does a 20% maximum occupancy cap result in only a 42% reduction in visits in Chicago?
We assume that mobility patterns to all POIs return to pre-pandemic levels, except that hourly visits to each POI are capped so they cannot exceed a fraction of the POI’s maximum occupancy. For example, if a POI’s original maximum occupancy was 100 people, a 20% cap would mean that the business could not have more than 20 visits per hour. This would not affect visit patterns during hours where the POI already had fewer than 20 visits. This means that even stringent occupancy caps can result in relatively small reductions in the total number of visits because they only affect businesses during their most crowded hours, and leave visit patterns during less crowded hours unchanged. We have not tested other types of partial reopening, like curfews, only opening on some days of the week, or assuming that people revisit at another time if they are prevented from entering a business because it is too crowded, although our model is equipped to simulate these things.
What does your model say about socioeconomic and racial disparities in COVID infection rates?
Our model predicts that lower income and less white neighborhoods will have higher infection rates, which is consistent with what actually happened during the time period we model. It predicts this for two reasons: 1) people from these neighborhoods were not able to reduce their mobility by as much (in part because they were more likely to be essential workers), and 2) when they went out, they worked in or visited POIs that were more crowded and more dangerous. Consequently, reopening strategies can have a different impact on disadvantaged groups than on the population as a whole, and it’s important to take this into account.
What are the takeaways of your findings for policy-makers?
Reopening does not have to be “all-or-nothing”: strategies like reducing maximum occupancy can enable us to reopen more efficiently by providing a large reduction in infections for a relatively small reduction in visits. Our model also suggests that racial and socioeconomic disparities are driven in part by mobility: they’re not inevitable, but can be influenced by short-term policy decisions. Therefore, in evaluating reopening strategies, it’s important not just to consider the impact on the population as a whole, but also the impact on disadvantaged groups. This supports steps being taken by California and the Biden-Harris transition team to specifically consider the impact of reopening policies on disadvantaged populations.
What are the takeaways of your findings for individuals?
Our results show how mobility can have a real impact on infection rates: social distancing matters, and your daily choices and sacrifices make a difference! For example, our model predicts that if people had not reduced their mobility in March, the Chicago metro area would’ve seen 6x the number of infections by the beginning of May, and the San Francisco metro area would’ve seen 10x the number of infections. We do not recommend using our findings about risky POIs to plan your daily life, because our analysis is designed for policymakers, not individuals (see our answer above to “What does your model say about the risks of different categories of places, like restaurants or gyms?”).
Do you model changes in mask-wearing?
No. The goal of our analysis was to model the effects of changes in mobility. However, abundant scientific evidence demonstrates that mask-wearing is an essential part of reducing infections, in combination with the mobility reductions that we measure.
Do you study the impact of schools? What about public transportation?
We do not specifically examine the impact of school reopenings because children under 13 are not well-tracked by our cell-phone mobility data, so we are not sure we can fully capture the risk of these places. We do not specifically study public transportation because we are similarly concerned that the data does not allow us to properly model disease transmission there. In general, we report results for the categories of places where we are most confident we can adequately model risk.
Can you use the model to predict what will happen in the next weeks/months?
In principle we can, but we need to provide the model with updated data on mobility and COVID-19 cases, since the analysis in our paper uses data from the spring. We are working on doing this now.
Click here to read our paper!
Citation: Serina Y Chang*, Emma Pierson*, Pang Wei Koh*, Jaline Geradin, Beth Redbird, David Grusky, and Jure Leskovec. Mobility network models of COVID-19 explain inequities and inform reopening. Nature, 2020.
Abstract: The COVID-19 pandemic dramatically changed human mobility patterns, necessitating epidemiological models which capture the effects of changes in mobility on virus spread. We introduce a metapopulation SEIR model that integrates fine-grained, dynamic mobility networks to simulate the spread of SARS-CoV-2 in 10 of the largest US metropolitan statistical areas. Derived from cell phone data, our mobility networks map the hourly movements of 98 million people from neighborhoods (census block groups, or CBGs) to points of interest (POIs) such as restaurants and religious establishments, connecting 57k CBGs to 553k POIs with 5.4 billion hourly edges. We show that by integrating these networks, a relatively simple SEIR model can accurately fit the real case trajectory, despite substantial changes in population behavior over time. Our model predicts that a small minority of "superspreader" POIs account for a large majority of infections and that restricting maximum occupancy at each POI is more effective than uniformly reducing mobility. Our model also correctly predicts higher infection rates among disadvantaged racial and socioeconomic groups solely from differences in mobility: we find that disadvantaged groups have not been able to reduce mobility as sharply, and that the POIs they visit are more crowded and therefore higher-risk. By capturing who is infected at which locations, our model supports detailed analyses that can inform more effective and equitable policy responses to COVID-19.
Our mobility networks are available for download through the SafeGraph Data Consortium. All SafeGraph data is anonymized and aggregated. We obtained an IRB exemption for SafeGraph data from Northwestern University.
Our code is available on Github.
What if we hadn’t socially distanced? What was the impact of mobility to different types of places on the overall epidemic curve? Find out in our interactive simulation below!
This graph illustrates the amount of mobility you would have had under your scenario (red), compared to what actually happened (black). Mobility is represented as daily total number of visits to points of interest (any non-residential place), based on aggregated geolocation data from SafeGraph.
This graph depicts the cumulative COVID-19 case counts predicted by our model under your scenario (red), compared to our model predictions when run with actual mobility data (green), which closely track real case counts (as reported by The New York Times).
Interactive simulation frontend produced in collaboration with J.D. Zamfirescu-Pereira, Mark Whiting, Jacob Ritchie, and Michael Bernstein.
The New York Times; The Washington Post; MIT Technology Review; Huffington Post; The Telegraph; Bloomberg; CNN; STAT; Nature News and Nature Accompanying News and Views; Stanford Press article and video
This research was supported by US National Science Foundation under OAC-1835598 (CINES), OAC-1934578 (HDR), CCF-1918940 (Expeditions), IIS-2030477 (RAPID), Stanford Data Science Initiative, Wu Tsai Neurosciences Institute, and Chan Zuckerberg Biohub. S.C. was supported by an NSF Fellowship. E.P. was supported by a Hertz Fellowship. P.W.K. was supported by the Facebook Fellowship Program. J.L. is a Chan Zuckerberg Biohub investigator.