During the longest government shutdown in the US history, many industries were affected, but maybe none more than data science.
[UPDATE 1/28/19 The US Government Shutdown has ended, and online databases have been restored. The following article documents how the Dec 2018- Jan 2019 shutdown affected data science pursuits as federally funded resources were unavailable.]
In the fourth week of the longest shutdown in US history, the impact of suspended services is hitting more than just federal employees. Data scientists across the world have depended on public databases fueled by US federal funding and research. With the loss of funding to all services deemed non-essential, the productivity of data science research may also be threatened.
Public databases are dropping like flies – literally.
Although this is by no means the first shutdown rodeo for US agencies, by being the longest it is testing the limits of how long certain services can ‘take care of themselves.’ Unfortunately, it seems that publicly available data sets, even those which should be automatically reporting independently collected data, don’t last much longer than a common housefly.
Data.gov, the federal government’s central catalog for open data sets, went offline sometime during the morning of January 9th. All visits to a data.gov/ address now redirect to a message explaining how their services will be unavailable until further notice. The landing page also links to a JSONL file with a monthly snapshot of the Data.gov metadata.
Captured from https://data.gov on 1/14/2019
While it may be the largest, Data.gov is not the only source of data sets to go down in the past weeks. While the websites for research services such as the National Oceanic and Atmospheric Administration (NOAA), the US Census Bureau, and the US Geological Survey remain up, they warn users that their databases are not being updated and may not display accurate information. In other cases, trying to access specific data sets, such as NOAA’s Drought Risks prediction data, will redirect yet again to a shutdown message.
Captured from https://governmentshutdown.noaa.gov/ on 1/14/2019
An article from Pew Research is keeping track of several databases that have been affected by the shutdown thus far. The trend seems to be that while certain branches are deemed generally essential, such as the Justice Department, their corresponding statistics or research divisions have been suspended. Others report that they have a limited time frame of remaining “carryover funding” for operations such as planning the 2020 census count.
Luckily, there are some data and research services which are funded by previously enacted appropriations. These include the Centers for Disease Control and Prevention, the National Center for Health Statistics, National Center for Education Statistics and the Energy Information Administration. The Federal Reserve and Federal Housing Finance Agency also operate independently of congressional funding.
So how is this affecting the data scientists who use public data in their work?
An obvious sector of data science to take a hit is environmental research. Pictures of the effects of suspended services at national parks have gone viral, but this is only mildly reflective of how services related to environmental conservation, health and research are largely considered non-essential. The only exceptions are given to services that relate to immediate health emergencies such as natural disaster reporting. However, many researchers argue that on a different scale,crucial life-saving services are still threatened.
Andrew Caballero-Reynolds, AFP/Getty Images
Angeline Pendergrass reported to Science News that she needs data even as simple as weather reports to inform predictive models on the impacts of climate change. The importance of having accurate climate change prediction models now is that they may inform future disasters such as droughts or floods which should be prepared for as soon as possible. Her work for the National Center for Atmospheric Research in Boulder, Colorado, has been delayed several days by simply looking for workarounds and new sources of rainfall data.
The data Pendergrass uses to verify her predictions is still being gathered, but she can no longer access that data through NOAA, and there is an element of mystery around the accuracy of what information can still be found. There is a massive unknown variable for all data scientists who gather data from stations that may be malfunctioning, without the necessary staff to repair or even directly announce what has crashed. Reports from concerned researchers on social media, such as Robert Rhode, have been the only alerts when stations have apparently gone offline.
According to the Washington Post,the National Weather Service’s forecast reports are still available, but the prediction model has actually deteriorated since the shutdown. There were some known flaws in the current Global Forecast System (GFS), which is why researchers were working to launch a new and improved version by February. However, with no one employed to fix the model or actively compensate for bugs, the GFS will continue to display data that is inaccurate, which can pose a risk to all the services which rely on its weather predictions.
Inside Climate News reported even federally employed researchers who are considered essential – thus working without pay – are prohibited from some aspects of data science work. January is typically the time of year when workers at the National Hurricane Center in Miami, Florida, would improve their own prediction models. However, specialist Eric Blake told ICN the center’s employees are now “limited to only essential lifesaving activities, which means current weather.”
Despite the current setbacks, researchers remain optimistic that data catalogs and research will recover in the long-term. Yet it seems hundreds if not thousands of federally reliant data sets will have permanent errors and holes in them that could throw off the work of analysts. The longer the shut-down lasts, the more drastic the long-term setbacks may be.
What’s going to happen now?
Despite our proficiency with predictive models and autonomous programs, the impact this will have on industry is largely unknown to data scientists. Regardless of when these services may resume, perhaps this event may support an increase in the privatization of data catalogs, or more dependence on research and reporting from NGOs.
[UPDATE 1/15/19] With somewhat ironic timing, on January 14th the president signed into law the OPEN Government Data Act,which would ensure federal data be made more accessible to the public. To the joy of data scientists, the law also specifies that data sets should be made machine-readable. This law does not appear to have any affect on the current data shutdown, but the implications are interesting, and at least one reason to be optimistic for the future.
Research libraries such as UC Berkeley and Brandeis University have begun publishing guides on where to find research resources during a government shutdown. Others have taken to Twitter to suggest ways to find backlogs of federal data in other locations:
While the shutdown continues, we will do our best to keep this article updated with relevant information and any major resources data scientists may use to replace those which have been suspended. Comment below if you have any helpful contributions!