HackSTIR 2019: Deep learning for humans

The mission of Nesta’s innovation mapping team is to inform better innovation policy by detecting, measuring and visualising innovation activity wherever it is happening. It’s no small task, so we are always seeking new ways to grow and strengthen the community that has chosen to take it on. Here’s what we learned from our most ambitious attempt to date.

Modern innovation policy challenges go beyond knowing about volumes of activity and spending. The idea of a rising tide of science and technology lifting us all is giving way to more mission-oriented innovation policies that aim to direct resources towards societal and technological changes that will yield specific societal improvements. Increasingly, we need to be able to answer questions about where innovation is happening, who is taking part and benefiting, what problems are being tackled, and how inclusive and equitable innovation ecosystems can be nurtured.

Sankey diagram of SME flow through EU funding programmes.

Sankey diagram showing the flow of SMEs through EU research and innovation funding programmes. Created by James Paterson at HackSTIR with Holoviews in Python.

A holistic approach to technical learning

Our team has built up a toolkit of knowledge and methods for collecting and analysing new data sources that can help to answer these kinds of questions, and we have delivered workshops at universities and conferences for others working on the same problems. In doing so, we noticed the level of demand for these skills among many other researchers. 2019 has seen us take a new step in our knowledge-sharing and community-building efforts in the field of science, technology and innovation research, by hosting our first hack week.

HackSTIR was a five-day immersive data science workshop for learning, collaboration and prototyping innovation research projects, produced in partnership with the Alan Turing Institute (who have also written about the hack week in a separate article here), and with support from SAGE and the Intellectual Property Office. It was based on the hack week model developed at the eScience Institute, incorporating interactive tutorials, seminars, and participant-led project work. This breaks with conventional workshop or hackathon models which usually focus on either taught or experiential learning only, and often don’t allow sufficient time to explore a domain and develop knowledge.

We brought together researchers from academia, the public sector, innovation agencies and more with a programme of data science tutorials and open project time. Each morning of the week consisted of tutorials while the afternoons were reserved for participants to work in teams on self-driven projects, supported by Nesta innovation mapping researchers. This kind of schedule allows for taught components to be directly complemented with peer-learning and hands-on experience.

Participant experiences

Over the duration of the hack week, people of all different skill levels produced prototype projects including a text search method for research projects, an interactive data visualisation for tracking organisations through funding programmes, an investigation into whether machine learning could replicate peer-review decisions, and an analysis of Twitter data about digital social innovation.

The following sections are three short contributions from participants, talking about what they learned and produced over the week:

HackSTIR was one of the best learning experiences I have had in years! It was a perfect mixture of hands-on tutorials, inspirational talks and project work.

I started into the week with a fair amount of experience in programming, but with only a very basic knowledge of Python. While this proved to be a little challenging at times, it was always easy to find help by asking around.

It was somewhat relieving to see that everyone had similar challenges to get meaningful results for their projects. As easy as it may be to apply algorithms written by someone else, it takes a lot more to get a sound outcome. A big part of the process is to clean the data not only beforehand, but also during your analysis, because it is then that you really get to know your data and its pitfalls.

What did I take home from HackSTIR? I left full of inspiration and motivation to integrate some ideas into my day-to-day work as a data analyst (e.g. version control for code to ensure reproducibility). The week also sparked enough ideas to keep on polishing the project I was working on – and if I run into any difficulties, I have made some friends to ask for help.

Marvin Herzan, FFG

My project at HackSTIR was about exploring different methods for the classification of EU-funded research projects in order to kick-start the actual project at FFG-Austria. The basic idea is to build a text-mining-based model to classify research projects with subject index codes (SIC). These codes are openly available in CORDIS (a database of EU-funded research and innovation projects) for research funded under the FP6 programme (2002 - 2006), but not for the later and current programmes, FP7, H2020 and Horizon Europe. The intention is to use the FP6 data to train and validate multi-label classification models that can apply SIC classifications to the newer programmes.

At FFG-Austria we have been using SIC labelling for our funded projects since 2012. The ultimate goal is to have SIC classification of all EU projects from 2002 and national funded projects since 2012. This way we would be able to get a synoptic view of Austrian research activity in areas defined by the SIC classification both at national level and EU level. Besides the ex-ante view like "emerging topics and fields" that focus on the future, our results could be used for identifying pathways along SICs over time to comparison countries in general, and in our particular use-case to compare EU vs national activity in each subject over time.

We plan to reuse the methods that I began to develop at HackSTIR for classifications other than SIC because there are many use-cases for other actors like governments, industrial associations, and universities, which are partly stakeholders and partly clients of the Data Analysis department of FFG. If you would like to talk about these ideas, please get in touch.

Doga Ince, FFG

HackSTIR was well organised, and covered a lot of content in a manageable way. Each module was well paced, and despite being a relative beginner to programming, I was able to work at an appropriate level, and progress through the week.

My main takeaway from the event was realising the potential of the data we have at Social Investment Business (SIB). The event inspired me to think more creatively about utilising the data we already have, to ultimately better serve the charities and social enterprises with whom we partner.

Despite being new to coding languages, I found the sessions on machine learning and network science truly eye-opening; these sessions provided useful insight into a number of exciting avenues.

Since the event, I intend to continue to improve SIB’s data maturity and support improved data-led decision-making. There is also a significant opportunity to continue developing data infrastructure and maturity in the social sector more broadly, through implementing some of the mentioned techniques within the Social-Economy Data Lab (SEDL) – a sector-wide initiative led by SIB, to create data standards, promote cross-partner data sharing, and ultimately support improved data use across all partners.

James Paterson, Social Investment Business

A network diagram of Twitter users involved in digital social innovation.

A network visualisation of Twitter users who have tweeted about digital social innovation. Created by Nur Gizem Yalcin at HackSTIR 2019 with Bokeh in Python.

What next?

HackSTIR has brought data science to a new cohort of researchers and analysts who are now ready to start addressing 21st century questions about science, technology and innovation. Needless to say, we also learned a lot from running HackSTIR ourselves, including how to balance teaching with project work, the importance of having a distraction-free environment to facilitate effective learning, and how to create engaging technical tutorials for mixed ability audiences. We will be working hard to incorporate this new knowledge to create even more effective learning experiences in the future, so stay tuned.

If you are interested in attending, hosting or co-producing future innovation mapping learning events like HackSTIR, please get in touch with George Richardson.

Notes

All of the interactive data science learning materials from HackSTIR 2019 can be found on our Innovation Mapping Tutorials GitHub repository.

Thanks to hack week veterans Anthony Arendt and Daniela Huppenkothen, who provided useful guidance and advice for organising the hack week.

Author

George Richardson

George Richardson

George Richardson

Head of Data Science, Data Analytics Practice

George is Head of Data Science in Nesta’s Data Analytics Practice.

View profile