Innovation policymakers are interested in finding new stuff first: new technologies, new industries, new clusters and new businesses. Traditionally, this has involved lots of networking, sleuthing and hard graft - going to events, talking to lots of people, reading the trade press. How could new data sources and data science methods help them in their job?
Innovation policymakers are constantly on the lookout for emerging technologies and sectors. This has two main functions:
Innovation detection: Many technology races have a first mover advantage element, which means that those countries that first bet on a future growth sector are better positioned to benefit from its expansion. This is a consequence of ‘network effects’, as with web platforms that become more attractive to new users the more users they have already, or ‘learning by doing’, where businesses become more productive over time, as they gain experience about customer needs, and iron out glitches in their processes. The sooner you get started, the better you can do.
Innovator engagement: Businesses in emerging sectors may have completely different skills, finance or infrastructure needs from existing ones - for example, the R&D expenses that a pharma company incurs in are very different from a video games studio, so R&D tax credits designed for the former might not support the latter. If innovation policymakers want to put in place programmes that work for emerging sectors, they need to find them first, and talk to them.
Traditionally, both activities have been informed by 'local' knowledge, personal contacts and ‘snowballing’ from existing networks into new ones. Innovation policymakers go to tech conferences, read the trade press, and meet start-ups and researchers to find ‘what’s hot’. This doesn't tend to be a data-driven process because, in general, official data sources are not very good for identifying and measuring new activities that don’t fit with existing industrial codes.
Web data - including company websites, and websites used by innovative companies and workers - offers new opportunities to identify emerging technologies, industries and businesses at a higher level of granularity, and closer to real time. For example, big data B2B marketing companies like GrowthIntel are analysing company websites in order to offer their clients more relevant leads. There is no reason why innovation policymakers couldn’t benefit from similar information.
In the second data pilot for Arloesiadur, our innovation data analytics project for Welsh Government, we have explored some web data sources in order to understand their potential for tracking emerging technology trends for innovation detection, and innovator engagement. Here is what we did, and here is what we found:
Meetup.com is an interesting platform to analyse emerging tech topics: this is a website used by technologists, coders and entrepreneurs to organise events. If these people are getting interested in a new tech trend, perhaps we will find an uptick in the number of meetup groups and events around it.
In order to test this idea, we extracted information about UK tech meetups, events and their attendees from the platform’s API from its beginning until March 2016. We then tracked levels of activity in three ‘hot’ tech-topics that we know are of interest to policymakers: Bitcoin (distributed digital currencies and online transaction ledgers), Deep Learning (a sophisticated method for building Artificial Intelligences) and Virtual Reality (technologies to generate highly immersive digital environments, often used in video games). We identified tech groups active in these areas based on the keywords that their organisers use to label them.
Why did we focus on these three?
While these three topics are now relatively well established, during the period of research they fit with the most recent definition of 'emerging technology' in the innovation study literature, by Rotolo, Hicks and Martin, which characterises emerging technologies in terms of:
As in previous Arloesiadur pilots, we have uploaded the code we used to download the data in GitHub. You can get it here.
We have identified 58 groups with relevant keywords (25 in VR, 18 in Bitcoin, and 15 in Deep Learning). These groups set-up 449 events since September 2012, when the first groups of interest were formed (Augmenting Reality and Coinscrum, in August and September 2012, respectively). The first Deep Learning meetup group - Deep Learning London - didn’t appear until April 2014. It’s worth noting that all these groups were formed in London, consistent with high levels of meetup activity in the city, including in cutting-edge fintech, creative industries and analytics areas relevant for the three technologies we’re looking at.
The two charts below show a timeline of activity in the three tech topics we picked, smoothed in three-month periods to remove some of the noise. The first chart considers total levels of activity according to different metrics (number of events, number of attendees and average attendees per event), while the second one normalises by levels of activity in a random sample of 200 meetup groups with the aim of controlling for Meetup’s growing popularity.
Even after smoothing the data and normalising by baseline levels of activity in Meetup, we see spikes followed by slower periods. What explains them?
Interestingly, some of the spikes appear to be linked to milestone tech moments ‘in real life’ (we have represented those with vertical lines, using the same colour to identify each tech topic).
If our interpretation of the data is correct, this means that activity around new technology in Meetup isn’t so much focused in its development, as in its diffusion or popularisation among wider communities of practitioners. The initial Research & Development seeding these new technologies is probably happening elsewhere, in universities and businesses.
Does this means that Meetup.com is more suitable for innovator engagement than innovation detection? Not necessarily. First, the data generated by this analysis is still quite timely, and could give those organisations who react to it fast significant lead-times over the competition. Second, the analysis we report is constrained by the fact that we are searching for meetup groups and events using agreed-upon terms for ‘emerged’ technologies. Could one use meetup data to analyse interesting combinations of disciplines giving rise to new topics before those even get a name? This is a topic we will continue exploring as part of Arloesiadur - check the conclusion for some ideas.
So far, we have looked at ‘emerged’ tech areas. What about the present and the future, when we don’t know what keywords to focus on? We have explored this by measuring the levels of Meetup activity in groups labelled with novel, popular, keywords (that is, the 20 most popular keywords among those that didn’t appear in the platform until after March 2015). The charts below show what these keywords are, and their levels of activity.
The first noticeable thing is that few of these keywords seem particularly 'new'. Rather, they appear to capture growing specialisation inside existing domains. It also seems, in some cases, that we are picking up existing industries (e.g. in the cybersecurity community - ethical hackers - and in the creative industries - visual effects, as well as e-commerce businesses) that perhaps are becoming more active in meetup, or more active networkers.
All this is again consistent with the idea that most meetups are about diffusion and dissemination rather than development. This means that, at least when using the simple keyword-based methods we adopted in this blog, Meetup may be a more suitable platform for innovator engagement than for innovation detection. This is no mean feat - a recent analysis of the latest UK Innovation Survey suggests that innovative businesses rarely benefit from government support. New sources of data, like Meetup, that help UK innovation agencies promote and target their programmes more effectively could be used to address this.
Our analysis in this blog was descriptive rather than predictive, and based on user specified keywords, rather than coherent collections of keywords defining a tech topic. As we have seen, this could make it difficult to discover emerging tech areas that one isn’t already aware of - an area of interest to policymakers and other new tech ‘cool-hunters’ such as Venture Capitalists and open innovation teams in large corporations. Our next goal is to use more sophisticated methods for this innovation detection. There are at least three ways to pursue this:
We’ll keep you posted about what we find. Drop us a line if you are interested in any of this.
This blog benefitted from helpful feedback and comments from Katja Bego, Gail Dawes, John Davies, Antonio Lima and Jen Rae.
 One important exception to this is patent data, which is often analysed to identify new and interesting science and technology areas - one limitation with this is that patent data comes with a big time lag.
 Before using keywords, we explored the possibility of extracting ‘topics’ of interrelated keywords from the data using text mining methods. The results we were able to obtain within the timetable for the pilot were insufficiently robust. When extracting a low number of topics, the categories were too coarse to capture emerging areas of activity; when extracting a high number of topics, some of the categories became noisy, and generated many false positives. Fine tuning our topic modelling algorithms to capture new and interrelated keywords is an important follow-up we come back to in the conclusion.