About Nesta

Nesta is an innovation foundation. For us, innovation means turning bold ideas into reality and changing lives for the better. We use our expertise, skills and funding in areas where there are big challenges facing society.

More about us

Bringing arXiv data to life

Insights into innovation trends from open publication data.

02 July 2019

In Innovation policy

2 min read

Russell Winch

Junior Data Engineer, Innovation Mapping

Russ was a Junior Data Engineer in the Innovation Mapping Team and worked on the development of data products and the implementation of a data production system.

View profile

Joel Klinger

Data Engineering Senior Lead, Data Analytics

Joel is Nesta’s Data Engineering Senior Lead

View profile

In the past two decades, it has become the norm for entirely new industries to arise in just a few years. Traditional means of monitoring industrial and academic activity are relatively slow, and this leads to laggy policy decisions - which means that the full benefits of these industries will not be distributed evenly. In response to this, we are developing 'arXlive', an open-source web application underpinned by Nesta's data analysis and production system, in order to monitor innovation trends from publications data in real-time.

Just a few years ago, countries around the world began clamouring to stake their claims amidst an artificial intelligence (AI)gold rush, with national strategies for investment and development flying in thick and fast. In economist-speak AI is now commonly described as a “general purpose technology” (GPT), which along with other GPTs such as transistors or the combustion engine, is able to dramatically innovate countless industries in the global economy. This kind of system-wide shock could very well lead to a level playing field, where relative outsiders could leapfrog strong economies to become market leaders in various industries. In practice, this is only likely to happen if your economy is already equipped with the talent and infrastructure to compete.

For all economies to truly be on an equal footing, they would need some insider information. If a region or country were able to identify emerging industries or technologies in real-time, they could proactively equip themselves with talent and infrastructure to prepare themselves accordingly.

We began trying to understand this ecosystem some time ago, with our work analysing data from arXiv (pronounced ‘archive’), a popular pre-prints website where scientists share their findings before submitting them to journals and conferences. This work later became one of the top 10% of most downloaded papers on SSRN within the last 12 months, and we have presented the paper to economics of innovation audiences at research institutes like SPRU and ZEW. The final stage in the evolution of this work is the arXlive project.

arXlive will be an open source platform for live monitoring of innovation activity in arXiv publications. Underpinning arXlive is a data analysis and production system, which orchestrates a stable pipeline of data collection, enrichment and machine learning. Initially, arXlive will have two main web apps; the first of which will effectively be a live version of our paper. The second web app will apply the Rhodonite algorithm, which we have developed in Nesta to identify emerging industries or technologies. By applying this live to the latest arXiv data, business leaders and policymakers around the world will have access to the insider information required to prepare themselves for the next big tech disruptor.

We’ve set ourselves a soft deadline for September to go live with the first two initial apps. From there, we are considering several possible extensions such as:

A service for powerful "search engine" exploration of arXiv data (i.e. including intelligent ranking and synonyms).
Automatic identification of key funding bodies or informal collaborations from paper acknowledgements.
Paragraph-level topic tagging.

Please do get in touch if you’re interested in the project, or would like to get involved in the future!

Russell Winch

Junior Data Engineer, Innovation Mapping

Russ was a Junior Data Engineer in the Innovation Mapping Team and worked on the development of data products and the implementation of a data production system.

View profile

Joel Klinger

Data Engineering Senior Lead, Data Analytics

Joel is Nesta’s Data Engineering Senior Lead

View profile

Get our regular newsletter and tailor your updates on our missions, programmes and events

Join our mailing list to receive the Nesta edit: your first look at the latest insights, opportunities and analysis from Nesta and the innovation sector.

* denotes a required field

Sign up for our newsletter

First name:

Last name:

Organisation:

Job title:

Country of residence:

I'm interested in *

A fairer start

A sustainable future

A healthy life

Discovery Hub

You can unsubscribe by clicking the link in our emails where indicated, or emailing [email protected]. Or you can update your contact preferences. We promise to keep your details safe and secure. We won’t share your details outside of Nesta without your permission. Find out more about how we use personal information in our Privacy Policy.

Bringing arXiv data to life

About Nesta

Bringing arXiv data to life

Russell Winch

Russell Winch

Joel Klinger

Joel Klinger