Better uses for open spending data – thoughts on the GIST beta
The new Government Interrogating Spending Tool - or GIST - says it represents "the next stage in the government's transparency agenda". It provides some nice looking graphics that allow a couple of layers of government departmental spending to be examined. It is in beta, so one should be cautious about criticising it in its current form.
Based on previous work by the GDS team, there will be many iterations and improvements to come. In the spirit of its beta status, I have a couple of suggestions for how it might be improved and developed, both to do with the data and its representation, and the stated ambitions for the data.
Despite the lofty press release claims, and as the Open Knowledge Foundation have pointed out, this is scarcely the first attempt to make public spending data easier to use. The Guardian's public spending diagram, the Open Knowledge Foundation project Where does my money go?, and others have tried to make these figures more open to interrogation.
The problem with GIST, and with some of the open data sets that precede it, is that it uses an existing government hierarchy which doesn't make much sense to the outside world. I can find out that BIS spent £75 million via the MRC on "Current Grants to Private Sector - NPISH" in Quarter 1 of 2012-13. But it's not obvious how I find out what NPISH might stand for, which area of work this might relate to. It's scarcely more obvious when you discover NPISH stands for Non-Profit Institutions Serving Household. The hierarchical data set also means that comparing data from two quarters, or from spending in two different departments is hard to do.
If the goal is really public scrutiny of the data, then the presentation needs to do a better job of answering questions such as 'how much does the government spend every year on health?' The hierarchical nature of the data, combined with amounts represented by area (which humans find hard to compare) makes it difficult to make meaningful comparisons, or to understand what should be meaningfully compared. [Edit: they also offer bar and doughnut charts]
To make the data better able to answer public questions, it will need additional semantic information attached to each item of spending. To be useful for transparency, this data should ideally be linked to lots of other sources to make something that is worth interrogating.
One way to think about this problem is in terms of two types of data:
- official data, which might not always be classified in useful ways, will need to form a complete hierarchy, but can provide quality assurance and consistency of measures over time.
- and new data, alternative and novel ways of measuring things that policy and the public care about, which might not offer complete coverage, or consistency over time, but which can be calibrated and linked with official information to provide more detail.
In order to be able to meaningfully link one with the other, we need data at a granular level that can be linked with common identifiers and core reference data (what Jeni Tenison at the Open Data Institute describes as a National Information Infrastructure).
This sort of connected data would also help realise greater value from the data. And this is another area where I hope the tool will develop. The press release for GIST suggests that one of the main uses for this data is to encourage transparency and eliminate waste from spending. Although using open data for this sort of transparency is an important first step, it only scratches the surface of the potential for data in this realm. It's only when you move on to using that data to scrutinise how you do things, and measuring improvements that you can unlock some value, and only when you try using the data to approach things from a completely new angle can you create real change.