Finding technology innovators using big data from the web
As previous Nesta-funded research has shown, innovative companies, industrial clusters, and high-growth companies are critical to the UK economy. Policymakers and investors aim to accelerate and encourage such companies, and to push forward greater innovation and growth.
There is particular interest at present in software and technology innovators and startups. However, it’s hard for organisations from outside the field to navigate this foreign world well; the official data is about companies, not people, and tends to be out of date since the most innovative firms are often small and fast-changing. For large organisations and investors to invest in or engage with new and innovative technologies, they need much better information about the networks of innovative people, what they work on, and which people and firms they work with.
The good news is that there is a large amount of raw material which can help close this information gap: implicit and informal information about high-technology firms, influencers, and topics is shared all the time, through both professional resources (e.g. Github and StackOverflow, which are heavily used by innovative software developers) and social networks (e.g. Twitter), as well as in company data.
This project collected and joined raw data from multiple sources about software developers and what they work on. We used the structures and relationships within these data to spot innovative people, innovative companies, and to understand the technology innovation landscape in more detail than we can with just official information.
Using data in this way is one of the most interesting and ubiquitous changes to how we can do research and analysis in the ‘big data’ era - now that large streams of activity data of many kinds are available, and we have technologies which allow us to reshape and analyse these data streams, we can repurpose raw data in many different and new ways.
The final site gives full details. The analysis is based on data from three main sources: Github, a popular code sharing service which many programmers use to store, track, and share their projects, both public and private; Twitter, where a lot of social chat goes on, and Open Corporates, the open database of the corporate world, which pulls together the official public data available on companies.
By joining these together, identifying the UK-based individuals who are most influential and central within these networks, and then looking at the companies and locations of these individuals, we can start to escape the ‘filter bubble’ of known innovators and spot those people and companies who are slightly under the official radar at present.
We found that the most influential individuals broke down into two main groups: "builders", who have created their own useful and usable libraries and components which others reuse a lot, and "teachers" who are more explicitly aiming to create demos and learning materials for others. We used these people as a starting point to gain more insight into the wider social network of innovative software developers.
Within employers of those individuals, there was a strong public sector presence - the biggest organisations included the BBC, the Government Digital Service, and the University of Cambridge. There was also a strong American presence, from UK developers working for US-owned companies including Google and Twitter. Finally, there were a number of open source companies, including Mozilla and Red Hat, which we would particularly expect to see working 'in the open' in this data.
The location count was very heavily dominated by London, to an even greater extent than we might intuitively expect, given that these are virtual and not in-person networks.
See the site for much more detail, and to explore the networks of individual innovators.