Skip to content

Monitoring innovation in real-time

People are accountable for their decisions, and so it is natural to push back against black-box tools. arXlive keeps humans in the loop, to help people make smarter decisions about technical research.

arXlive [pronounced {arc-live}] is an open source platform for live monitoring of innovation activity in arXiv articles. arXiv is full of ground-breaking physical, quantitative and computational research, from the first report on the discovery of graphene, to seminal advances in AI research and the discovery of new building blocks of matter. As a rich source of research data, we have already done two studies based on arXiv data: "Deep learning, deep change?" and "Gender diversity in AI". arXlive was originally spawned from this research, as we developed the tools to fully automate one of these papers so that results would be kept relevant. But since then, arXlive has grown into a more general purpose tool that puts humans centre-stage.

arXlive: a human-in-the-loop tool

In the Innovation Mapping team we develop tools and infrastructures to enable people to make better decisions, and to be able to do so with up-to-date data. We deliver tools to local, national and international policymakers and funders who rely on being kept abreast of the latest innovations in science, technology and society. Since these people are accountable for their decisions, this generally rules out tools which adopt "black-box" methods. Our way of addressing this is by developing ‘human-in-the-loop’ tools, and our latest such example is arXlive, which is underpinned by a data analysis and production system, that orchestrates a stable pipeline of data collection, enrichment and machine learning.

HierarXy: the contextual search engine

Everyone knows that when you use a search engine, you'll get back results containing an exact, partial or near match to your search query. These kinds of near-exact matches are perfect when you're searching for something very specific, but when you want to do a broad contextual search, you'll have to use complex 'advanced' searches whilst also having an expert on-hand to give you a list of keywords. Let's say you wanted to find recent interesting work on 5G systems, it's hard to think about how you might find a broad set of relevant results without being an expert in the field. HierarXy vastly simplifies this procedure by explicitly performing contextual searches, so that you don't get stuck searching for rare gems. Let me give two examples.

Example 1) Sustainable Development and AI

Imagine that you're using a regular search engine. How would you find academic articles at the intersection of "AI" and "Sustainable Development"? One way would be to search for any of the terms in {AI, Machine Learning, Big Data} plus any of the terms in {Sustainable Development, Poverty, Natural Resources}. The problem here is that "AI" and "Big Data" (or "Sustainable Development" and "Poverty") are related terms, but they're definitely not the same thing. So, by following this strategy you'll end up with a lot of junk results and, on top of this, a search for "Sustainable Development" and "AI" is anyway significantly contextually different to a search for "Natural Resources" and "Big Data". HierarXy's contextual search deals with this by ranking results higher which are most contextually similar to "Sustainable Development" and "AI".

Example 2) What was like BERT before BERT?

One of the most significant advances in Machine Learning is Google's BERT, which among many other applications, is the current state-of-the-art system for automated Question Answering and Sentence Completion. But what was around before BERT, which emerged in 2018? With HierarXy, you can simply search for "BERT", and then select a date before 2018. The fact that the term "BERT" doesn't appear before 2018 doesn't matter, since context is the name of the game.

A note on "Novelty"

Novelty can't be fully represented by any single number, since it doesn't have a straightforward definition. Novelty generally could be defined as any (and more) of {new, original, unusual}, and the procedure we follow best encapsulates the 'unusual' concept (or, more formally, "how different are you from your nearest neighbours?"). We describe this, and generally our entire procedure for HierarXy, in this more technical blog (and you can look at our codebase here). Other more sophisticated definitions of novelty are being developed at Nesta, which are being made possible by the Rhodonite python package.

The Keyword Factory: what else should I be searching for?

You might still be thinking that HierarXy is a little too black-box for your liking; or the articles you're looking for simply aren't on arXiv. The Keyword Factory allows you to expand your technical vocabulary, based entirely on arXiv data. For example, you might be searching for the latest research relating to blockchain, but did you know that you probably also take a look at smart contracts? How about in addition to searching for graphene you also consider gnrs, or rather, graphene nanoribbons?

How are you using arXlive, and what would you like to see?

At Nesta, we're looking forward to using this tool to guide our research into innovation systems, and we see arXlive as a template for some of our future work. We'd be delighted to hear about your use cases, problems, discoveries, and issues with arXlive, and whether something similar would be useful for you. We're humans too, so keep us in-the-loop at [email protected]

Author

Joel Klinger

Joel Klinger

Joel Klinger

Data Developer, Innovation Mapping

Joel is a Data Developer in the Innovation Mapping Team, which researches international innovation systems by drawing together new data sources, data science methods and state-of-the...

View profile
Russell Winch

Russell Winch

Russell Winch

Junior Data Engineer, Innovation Mapping

Russ was a Junior Data Engineer in the Innovation Mapping Team and worked on the development of data products and the implementation of a data production system.

View profile