It was a particularly Boulder chapter to an unfolding national story.
More than 50 volunteers from the community came together to save government data from the digital memory hole many fear will swallow it during the administration of President Donald Trump.
So-called “data refuge” events have been taking place around the country in the months since Election Day, with major meetups or hackathons in cities such as Philadelphia, New York and Berkeley. But those events have been about archiving websites.
The Boulder group, led by some of the city’s typically tech-savvy, science-minded and politically conscious residents, sought instead to harvest the data that lies behind many government websites. It’s “spreadsheet stuff,” as one organizer put it, in the form of graphs, charts and encyclopedic rows and rows of numbers. The volunteers targeted information treasure troves compiled mostly by the Environmental Protection Agency and the National Institutes of Health — data reflecting climate change patterns, carbon monoxide readings, water-level changes and so on. Terabytes of information that they found and downloaded from government servers.
“For pointy-headed people like me, we know the thing is to save the raw data,” said Joan Saez, an executive at data company Cloud BIRST in Lyons and the woman who spearheaded the event. “The thing is to download it and land it in a shared distributed cloud repository where it can live and be accessible forever.”
In other words, some place safe from shifting political, ideological and budgetary priorities.
The Boulder event, which ran last Saturday and Sunday and was hosted by the University of Colorado law library, came together in just three weeks. The enthusiasm and willingness to volunteer for hours of what can be tedious screen-time reflects the heightened political engagement evident in the charged and sometimes enormous protests that have marked each week of the Trump presidency, both downtown at the Capitol and on streets across the state.
Saez’s hackathon was a particularly Boulder manifestation. The presence of the state’s flagship university, not to mention the government atmospheric and climate research labs and clean-energy startup businesses that dot the city, make Boulderites especially aware of the threat posed by climate change-denying government leaders and by the increased influence of the fossil fuel lobby on a Republican-controlled Capitol Hill.
Saez told The Independent she began to worry last summer that raw public data would go missing – maybe not as a result of censorship and deletion, but simply for being de-prioritized under the administration’s planned steep budget cuts. The feeling is that Trump-era agency heads will be looking to free up the resources it takes to house and manage the huge amounts of publicly accessible data the government produces, especially the kind of data that may undermine preferred policy positions.
Before the event, Saez targeted hundreds of datasets to harvest, some of them enormous.
By the time the hackathon ended Sunday, she was bowled over by the work the group completed. She said volunteers harvested 1.5 million NIH publications and 1,000 EPA datasets, noting that she and other volunteers were still taking the measure of what they had collected.
“We archived the entire StreamCat dataset!” she rushed to add.
StreamCat hosts data on 2.65 million small waterways in the United States. It summarizes “a suite of 203 natural and anthropogenic landscape features” tied to the nation’s streams, as an EPA index puts it. The data is “distributed within state and hydro regions.” It concerns catchments, watersheds, internally draining basins, riparian buffers, off-network water and so on.
In other words, StreamCat contains exactly the kind of raw data that could help make cases for and against new oil and gas drilling techniques, for example, or demonstrate in incontrovertible detail the local or regional or national effects of a warming planet. The data has taken years to collect, and now that it has been harvested in Boulder, its existence and public accessibility are more secure.
Nevelow Mart studies “legal informatics” and writes about information policy, information retrieval systems and the relationship between national security and the country’s libraries.
The data rescue event was right in the library’s bailiwick, she said. People don’t often think about the fragility of data, but librarians do. “There are actually very few laws that require data be preserved,” she said.
Whenever the White House changes hands, for example, librarians kick into action and conduct what they call an “end-of-term harvest” in which libraries work with the Government Publishing Office to copy administration websites.
“There’s always the feeling that it’s best to be proactive,” said Nevelow Mart. “This time it was the environmental and scientific data in particular that people were focused on.”
As part of the harvest at the end of the Obama era, the law library copied that administration’s websites, printed them out, catalogued them, and put them in binders available for patrons to use.
National politics was the big driver for last weekend’s community data rescue event, said organizer Stephanie Minutillo, who is studying environmental law at CU.
“It’s hard to imagine there would have been this much energy and motivation in a different political environment,” she said.
The event saw a good turnout despite the presence of only five or six law students and graduate students. “It was mostly concerned citizens of Boulder,” Minutillo said. “It was an outlet, I think, for people looking around and seeing a problem and wanting to be part of a solution — to do something more than just writing a Facebook post.”
Photo credit: Andrés Monroy-Hernández, Creative Commons, Flickr