Data science work is creative work (Part 1)
Let’s define data scientists as experts who use analytical techniques from statistics, computing and other disciplines to create value from new (‘big’) data. It’s fair to say that most people wouldn’t think of data science work as ‘creative’ in the ordinary sense. In fact, many would consider it to be the opposite of creative: routine, predictable, impersonal…boring.
In Model Workers: how leading companies are recruiting and managing their data talent, we show that this view is misguided. It seems that data scientists are in fact highly creative, and not just because you see a lot of them in creative industries like video games, fashion and advertising, or because some of them use graphic design skills for visualising complex data. Rather, data scientists are creative because the defining features of their work are those of a creative occupation.
We develop this argument over two posts: In this one, we score data science work against five heuristic criteria that we developed in previous Nesta research to identify creative occupations in the economy. In our next post, we will show that the practices that leading firms adopt when it comes to managing data talent identified in Model Workers, greatly overlap with the conclusions of academic research on how to manage creative teams.
We are not writing these posts to make a conceptual point. Our motivation is practical. We think that acknowledging the creative aspects of data science will help managers build and organise their data capabilities more effectively, inform better policies to prepare and train future data scientists, and help communicate the excitement and dynamism of this occupation to young people who might otherwise be put off it if it is – inaccurately – perceived to be un-creative and dull.
The creative aspects of data science work
In A Dynamic Mapping of the UK’s Creative Economy, we defined a creative occupation as a role “that brings cognitive skills to bear to bring about differentiation to yield either novel, or significantly enhanced products whose final form is not fully specified in advance.” For the purposes of classification and measurement, we proposed five criteria against which the creative component of an occupation can be assessed, and applied these to the Office for National Statistics’ Standard Occupation Codes.
Here, we consider data science roles, as described by the industry experts we interviewed, against these criteria.
1. Novel process
The first criterion is that a creative role uses novel approaches to solve a problem, or exhibits ‘creativity’ at many stages of the process.
Is this true for data science work? We believe so. A key element of data science work is exploratory data analysis, where the data scientist ‘gets a feel for the data’ by undertaking basic analyses and generating visualisations, before deciding on the most promising approach to model the data. This echoes the process of divergent thinking (idea generation) and convergent thinking (idea selection) we see at different stages of the creative process. Lateral thinking to transfer ideas across domains (e.g. applying survival models from epidemiology to the analysis of innovation diffusion or systemic risks in banking ecosystems) is another example of creativity at play in data science work.
A striking consequence of this process novelty is the high level of user innovation we see among data scientists – many established tools for data management and analysis (e.g. Hadoop, Cassandra, Hive, R) are open source: they originated as data scientists resolved novel problems in innovative ways.
2. Mechanisation resistant
The idea here is that if it is possible to wholly replace an occupation with an algorithm or machine, then that occupation isn’t creative. “Creatives adopt, adapt and absorb new technologies in pursuit of creative excellence. They are seldom made redundant by it.”
Will it be possible to replace data scientists with ‘analytical robots’? Here our research reveals divergent views. Some of the people we interviewed think that it is becoming easier to use analytics tools to automate many data science tasks. Others are more sceptical, pointing out that even if such automation were possible, human expertise would still be required to decide which questions to ask, and to sensibly interpret analytical outputs.
3. Non-repetitiveness or non-uniform function
This criterion refers to whether the outputs of a role vary with each project because of the interplay of factors, skills, creative impulse and learning – in other words, it captures the fact that the exact outputs of a creative role are hard to predict.
Our interviews strongly suggest that data science work fulfills this criterion: data science projects often involve new datasets, new questions, and new areas of application (e.g. using data insights to develop new products, services or strategies). As a consequence, project outcomes can be uncertain, and managers need to live with (and manage) the risk of failure. Several practices that we identified in our research – such as piloting and ‘timeboxing’ data science projects – seek to address this risk..
4. Creative contribution to the value chain
The question here is whether the work performed in a role is novel or creative independently of the context (e.g. industry, department) where it happens. This criterion seeks to avoid a situation where a job is defined as creative on the basis of its industry, rather than the intrinsic nature of its processes.
Again, we’d argue that data science is creative according to this criterion. We find data science activities across a variety of sectors, from the creative industries to manufacturing, pharmaceuticals and retail. Regardless of the context, data science work involves comparable novelty in processes and unpredictability in outcomes as identified in criteria 1. and 3. A result of this is transferability of techniques (and talent) across industries. As one of our interviewees put it: “Someone in finance uses data as predictive models of how someone is likely to default on their mortgage. We are doing predictive models based on patient attributes – how likely they will be hospitalised in the next six months. The topic is very different, but the techniques are very similar.”
5. Interpretation, not mere transformation
In the fifth criterion, we are considering whether a role involves the direct translation of an idea or artefact from one form to another or if, in contrast, it leaves room for discretion and creativity by the person undertaking it.
We think the latter is true for data science. Even in those instances where a data scientist is responding to a specific question generated elsewhere in the business, rather than open-ended exploration of a dataset, translating that question into a research design (identifying and acquiring data to answer the question, choosing the variables and modeling approach, and so on), and generating and communicating the answer to others in the business, requires, we would contend, a substantial degree of creative interpretation.
A provisional conclusion
In the baseline scenario analysed in A Dynamic Mapping of the UK’s Creative Economy, any occupation that scored 4 or more in the creative grid was treated as creative. On the basis of the discussion above, we might award data scientists a score of 4.5/5 (one point each for criteria 1,3,4,5, and 0.5 – given the divergence of views about mechanisation – in criterion 2). Hence our argument that data science work is creative work.
Data scientists uncover new patterns in data, and build and develop innovative products, services and business models. The novelty in the processes that frame their work, the unpredictability in outcomes and space for interpretation in project scope and outputs we have described also generate significant challenges for managers who need to put in place practices to motivate their data scientists, de-risk projects, learn from failure, ensure smooth communication between data teams and other parts of the organisation, and so on.
These are all challenges that will be familiar to the managers in creative businesses, and have been studied in the management literature on creative teams and organisations. We will explore the connections between our findings and this literature in our next post.
 This attitude manifests itself in the tendency to contrast ‘geeks’ and ‘luvvies’, bean counters and creatives, economists and artists...
 And which now underpin the methodology used by DCMS to produce its creative industries economic estimates.
 The ultimate goal of this exercise was to determine the ‘creative intensity’ of different industries (i.e. their propensity to employ creative labour), in order to identify and classify the UK’s creative industries.
 As an illustration, at the time of writing the R statistical programming application currently offers 5,748 user-contributed packages.
 We’ll have more to say about this in our follow-up post.
 This was a baseline insofar as the scoring of occupations against these criteria was subjective and therefore it was crucial to explore the robustness of the study’s findings to different scores. See Dynamic Mapping for more details.