The world of research is a complex system. 21st century science policy needs open, transparent and reproducible data science tools to help make sense of it.
The International Conference on Computational Social Science (IC2S2) is an annual meeting of scientists who work at the intersection of big data, computational analysis and social science, a field that has grown rapidly in the last decade. This year, I was lucky enough to attend the conference with my colleague Kostas (who has written an excellent summary of the event), where I presented a paper using network science to analyse novelty in scientific research.
In this post, I outline how social scientists are using new data sources and computational methods to analyse science, why this matters and other examples of this research that were presented at IC2S2. I also argue that in order to create a bigger impact, computational social scientists looking at science need to make their methods and indicators more open so that everyone can validate them and use them, in an open science of science.
In 2010, Claudio Cioffi drew a comparison between Galileo’s revolutionary use of the telescope to study the stars and the potential for computational social science (CSS) to deepen our understanding of society. As the programme at IC2S2 demonstrated, it hasn’t taken long for the CSS community to turn that telescope on itself and its contemporaries; the conference hosted 23 talks on ‘Science Studies’, a field that analyses the production and proliferation of scientific knowledge.
Researchers studying themselves and their peers is no exercise in narcissism. CSS practitioners who investigate research communities and outputs understand that generating knowledge about their structures, dynamics and underlying mechanics could inform better science policies and yield greater benefits for society.
As a 2018 article in Science, Toward a more scientific science, points out, “many notions about how people and practices, policies, and resources influence the course of science are still more rooted in traditions and intuitions than in evidence”. The article goes on to cite the works and words of several leading computational social scientists, who highlight the potential for new data oriented methods to equip us with a more nuanced and holistic picture that truly captures the complexity of scientific enterprise.
Traditional methods for measuring progress in research still rely largely on straightforward variations on counting the citations generated by publications. We are awash with metrics that create perverse incentives, are open to gaming, and fail to describe reality. At the same time we are seeing a decrease in productivity in science and some critics are suggesting disconnect between research activity and societal benefits. If we want to create the right conditions for a research system that is sustainable and can help tackle some of the world’s biggest challenges, we need to harness the potential of new data sources and replace simplistic metrics. This is precisely what research presented in the Science Studies sessions at IC2S2 did.
One speaker highlighted a link between research impact and the gender and ethnic diversity of the team, while another demonstrated how the recipients of Nobel Prizes go through both phases of exploration and tight focus around the time they produce their prize winning works. The work that I presented investigated how we can look for novel combinations of topics mentioned in publications to find emerging fields of research, while another presenter suggested that these new combinations are driven by the skills mix within research teams.
This sample of work points to the variety of new analytical capabilities that could be made available to everyone with a stake in the research process. There is a great opportunity to take them beyond the walls of conferences and the pages of journal articles, to be used in further research and policy settings. However, there is also a risk that they are developed in silos and deployed without being triangulated, resulting in a jumble of subtly varying methods that are hard to reproduce.
Scientific computing more broadly has faced and tackled a similar challenge over recent years. Programming languages such as Python and R have efficient, well-documented, and community-maintained open source software packages that consolidate reproducible methods for quantitative analysis and open them to a wider audience. One example is scikit-learn, a tool that brings together many different machine learning algorithms in one package - a blessing for data scientists like those of us in the Innovation Mapping team.
Now imagine an open source software package that implemented the latest research and innovation indicators, algorithms and analysis methods, built by the community that uses them. Academics and practitioners would gain access to next generation tools and would be able to triangulate results and compare methods, deepening our understanding of research and propelling advances in the field.
A tool such as this has the potential to rapidly transition us away from the intuitions and traditions that guide research policy today, to a world where we are able to intelligently create cultivate 21st century research systems.
The Innovation Mapping team is already developing one component - Rhodonite - a set of network science algorithms for measuring combinatorial changes in research topics over time. We know that some of you reading this are building the other blocks. All we need to do is bring them together.
This blog sets out the vision for building the toolkit for an open science of science. If you would like to take the conversation further, please leave feedback here, or get in touch via email or on Twitter.