Big data, messy data, fast data or simply data: how do we secure the right skills to create value from it?

Anecdata

It has almost become a cliché to use mindboggling figures to illustrate how data is transforming the economy and society: billions of transactions in popular websites, years of content flowing through the internet, millions of devices generating streams of data day in and day out. But big as they may be, these numbers don't really give a sense of the actual applications of data, or of the effort and skills required to create value from it. They also hide the humans who (often if not always) generate the data and analyse it.

Maybe it's more fruitful to think about the way in which our lives at Nesta have been transformed by data. What are we doing at our jobs today that we couldn't have done, say, five years ago, when there was less data around?

  • Before, we used to almost exclusively work with official, well-structured data or expensive survey data. Nowadays, we scrape data from the web or from open APIs. This data often has relational features: it captures networks and communities instead of “atomistic” units.
  • We used to publish static outputs: reports, presentations, and recommendations. We still do that, but we increasingly also create interactive data visualisations, interactive tools for data exploration, and are learning how to build interactive data applications.

Big data big schmdata

Is it big data? Not really. Up until now, we haven’t had to run any Hadoop jobs (volume), store our data in NOSQL databases (variety) or work with streaming APIs (velocity), although we are getting there.

But that’s the wrong question anyway. Thinking of data in terms of size, or of the technology infrastructures required to create value from it can distract us from the truly important question: can we use this data to create value?

We believe so. Our business is to understand the dynamics of innovation and propose actions to support it in a way that creates economic and social value, and data is helping us to do that much more effectively, measuring innovation phenomena in a way that would have been uneconomical or impossible before: for example, to track connectivity between participants at innovation events, map informal communities of innovators, or identify companies operating in sectors that are poorly captured by standard codes, such as video games.  We are even using data to understand innovation in areas like the arts and culture which traditionally have had little quantitative analysis, like theatre and literature

Innovation in data outputs helps us communicate research findings in a more compelling and enlightening way, and engage new audiences. Going forward, we want to create “self-service” applications that policymakers and other agents in the ecosystem (investors, entrepreneurs, etc.) can use to answer their burning questions, instead of having to rely on us to do it.

Doing all the above requires new skills: to get online data, clean it and wrangle it. Skills to put data in a shape that can be analysed, and skills to analyse it and visualise it. Also skills to understand the limitations and pitfalls of these news datasets.

Without those skills, all that new data would be for nought, and might even be detrimental if, for example, it led us to make the wrong decisions, or recommend the wrong actions. As Nate Silver put it in the introduction to The Signal and The Noise,  “Big data will produce progress – eventually. How quickly it does, and whether we regress in the meantime, will depend on us.”

The bigger picture: Skills of the Datavores

Nesta’s new report in partnership with Creative Skillset, Skills of the Datavores, suggests that our personal and organisational experience with data – in terms of its opportunities and skills aspects - reflects the wider situation of other organisations in the UK. This has policy implications that we explore in Analytic Britain, a policy briefing that we have developed jointly with Universities UK.[1]

For starters, Skills of the Datavores shows that there isn’t a “one size fits all” for data: Different “data active” businesses are taking different approaches to creating value from data. We find:

  • 16% of Datavores defined by their strong use of data and analysis to drive decisions across the business.
  • 21% of Data Builders which are indeed working with “big” volumes of data.
  • 31% of Data Mixers distinguished by the way in which they combine data coming from many different sources.

We also find 30% of dataphobes who aren’t doing much with their data: apparently they have decided to give the data revolution a pass.

Our analysis of the impact of data in performance suggests that this is a big mistake: data-active companies, and particularly Datavores and Data Builders are significantly more productive than the Dataphobes, even after we control for other important firm level factors such as their sector, their age, their size and their self-reported levels of innovation.

Another striking difference between data active companies and dataphobes is that the former don't just use data to save costs, but also to discover new opportunities and develop new products and services: consistent with our experience at Nesta, data doesn’t just create value by allowing us to do the same things better, but also by allowing us to do completely new things.

What methods are data active companies using to do this?

As the figure below shows, a mix of data management and analytics methodologies coming from a variety of disciplines such as statistics, computer science and software engineering. As one would expect, Data active companies tend to rely on more innovative and sophisticated methods, ranging from advanced statistics (e.g. non-parametric methods and time-series analysis) to unstructured data analysis (e.g. social network analysis and text mining) or machine learning.

Applying these methods requires talent with the right skills (i.e. “data scientists”). The data active companies in our sample are much more likely to have sought to hire such people than Dataphobes in the 12 months before we surveyed them. 59% of Datavores tried to recruit at least one analyst over that period, compared to around a quarter of Dataphobes.

Worryingly, data active companies are struggling to find analytical talent  to create value from their data: for example, two thirds of the Datavores that sought to recruit had difficulties filling at least one analytical vacancy.

The three hardest to find skills were:

  • Domain knowledge, or an understanding of the function of data inside a business' industry: what are the “real world” processes that generate it, what are its limitations, what important questions can it help answering?
  • The right mix of skills, or the combination of coding skills (to get and work with the data) and analytical skills (to extract insight from it) one finds in data scientists.
  • Experience working with big and/or messy data, and with the tools and technologies to do this.

 

We were very interested to find that the businesses that we surveyed are using innovative types of training to keep the skill of their analysts up to date. Leaving aside internal and external training, they were also:

  • Participating in peer-to-peer learning networks such as meetups and hackathons – almost two-thirds of business are doing this. In a different project looking at Meetup data, we found 178 meetup groups specialising  in data analytics spread all over the UK (see map below, where the size of the circles captures the logarithm of the number of  data analytics meetup groups in a given location).
  • Training online: there are many online training options for people who want to pick up data science and data analytics skills, such as John Hopkins University’s Data Science specialisation track at Coursera, or DataCamp’s R courses. Around 50% of businesses are using these.
  • Getting involved in web communities and competitions such as Stack Exchange, or Kaggle. Around a third of businesses are doing this.

This innovation reflects the speed with which the data field is moving, in terms of flows of knowledge and talent, and the opportunities for collaboration and networking between data analysts working in different industries.

What we need to do: Analytic Britain

Our findings show that data is enhancing innovation and productivity in UK businesses, and can help close the UK's productivity gap with other G7 countries, a priority for the government. Removing analytical skills shortages such as those identified in Skills of the Datavores, as well as allied reports by the British Academythe Tech Partnership and Universities UK is therefore vital. 

Analytic Britain, a policy briefing that we have developed together with Universities UK, makes recommendations on how to do this. These recommendations cover the whole “analytic talent pipeline”, including schools, universities and the labour market and industry networking. We need to:

  • Strengthen the teaching of data skills in schools, increasing the number of young people who study maths and statistics, and their awareness of the relevance of these subjects for an expanding number of industries, from fashion to biotechnology.
  • Encourage disciplinary crossover in higher education by embedding the teaching of quantitative skills in a broader number of disciplines, support interdisciplinary research and training to prepare tomorrow’s data scientists, and increasing the visibility of high quality courses among the deluge of data science courses that have started being offered by universities.
  • Develop innovative training solutions to upskill UK communities of analysts.

Analytic Britain sets out in detail who we think should be doing what, covering a “broad church” of stakehoders which mirrors the disciplines and industries being transformed by the data revolution.

We believe that acting on our recommendations will greatly strengthen the supply of analytical talent in the UK, and our ability to create value from data, regardless of whether it is big, messy, fast...or simply data.

(Voronoi treemap of the Mammals via Anders Sandberg)

 


[1] Skills of the Datavores is based on a telephone survey of 404 medium and large businesses for whom data plays some role in operation, working in 6 sectors (Creative Media, Finance, ICT, Manufacturing, Pharmaceuticals and Retail). 

Author

Juan Mateos-Garcia

Juan Mateos-Garcia

Juan Mateos-Garcia

Director of Data Analytics Practice

Juan Mateos-Garcia was the Director of Data Analytics at Nesta.

View profile