The Mystery of Missing Metadata

By Tawheeda Wahabzada (Open Data Watch) and Nada Zohdy (Open Gov Hub) 

Given the recent People’s Climate Movement and steps from the administration to roll back action on climate change - from reversing Obama’s efforts to reduce carbon emissions to the proposed slashing of the Environmental Protection Agency’s budget by 31 percent – there is clearly deep concern for the future of our planet.

But programs to prevent climate change are not all that’s at risk. Open access to basic environmental and climate data is now in jeopardy. Previously published, government-generated data on the environment and other public issues may be vulnerable to a politicized backlash against evidence-based policies.

Before considering the implications of disappearing public information, what exactly is open data and why does it matter?

According to the Open Data Charter, “open data is digital data that is made available with the technical and legal characteristics necessary for it to be freely used, reused, and redistributed by anyone, anytime, anywhere.” Government generated data should be open: it is a public good that many can benefit from. And open data is vital for promoting transparent and accountable governance.

Here at the OpenGov Hub, a network of global organizations focused on transparency, accountability, citizen engagement, and open data, one of our roles is to advocate for governments worldwide to make information public about their activities, expenditures, and official statistics.

There are many types of open data, but environmental data are some of the most compelling examples of open data’s relevance to our everyday lives. Consider NOAA’s National Centers for Environmental Information (NCEI), which “hosts and provides public access to one of the most significant archives for environmental data on Earth,” including data on weather and climate, coasts, oceans, and geophysics.

There are countless examples of re-purposing open climate data. These are only possible because the format and terms of use are non-proprietary and open.

The NOAA Data Catalog is a centralized metadata portal of existing NOAA datasets. Metadata is descriptive data about the data. Metadata provides critical context and technical characteristics that help users understand and use the data appropriately. For example, if you are seeking data on the precipitation frequency in California, the metadata provides critical information about the location, collection methods, data collector.  

So, for open data to be properly understood in context and effectively utilized, information about the data (i.e. metadata) must also be open to the public.

As of April 2017, the NOAA Data Catalog had 66,266 metadata sets. While this may appear high, the Wayback Machine-- which allows internet users to archive web pages-- produces some alarming results. On December 2016, the NOAA Data Catalog provided 71,425 metadata sets. As of January 26, 2017, under the new administration, that number had fallen to 70,833, and the number of metadata sets has steadily decreased over the course of the past few months.

Source: Internet Archive’s Wayback Machine and NOAA Data Catalog

With the recent climate change rollback, it is even more critical and urgent to ask: why is the NOAA Data Catalogue missing over 5,000 metadata sets? Were duplicated metadata sets removed? Or rather, does the decrease of metadata signal a parallel decrease in the datasets themselves?

Information (data) is indeed power, and that is even more true when it comes to scientific data that validates climate change. The open data movement is about a decade old, and while some may argue that open data has yet to fully delivery on its promise (or even that it is currently in the ‘trough of disillusionment’ of a government hype cycle), we see the risk of disappearing climate data as a prime example of how the very existence of certain data in the public domain can significantly impact political dynamics – even if this data is only used by a relatively small number of people. 

As open data advocates, we find ourselves at a crossroads. Several leading organizations signed on to a letter in support of the OPEN Government Data Act, which was reintroduced late-March, and has bipartisan support in the House and Senate. This legislation could make information across federal agencies open by default and in doing so protecting climate and other data that may fall vulnerable to politicization.

We see an opportunity today to prove the potential of open data: first, by protecting information already made public (both data and its corresponding metadata); and second, by encouraging innovative use of these vast information sets about public matters - by actors within, and especially outside of government - to develop new tools, techniques, policies and ideas to improve quality of life for us all.

Open data isn’t just for geeks. It’s for anyone who believes in the power of transparent information to shape how our public policies are formed, our resources are invested and, our governments are held to account.

Tawheeda Wahabzada is a researcher at Open Data Watch, an international nonprofit that tracks the coverage and openness of national data and statistics in 173 countries around the world.

Nada Zohdy manages the OpenGov Hub, a community and network of 40 global organizations in Washington, DC promoting transparency, accountability, citizen engagement, and open data worldwide.