Public service coding: the BBC as an open software developer

On Monday, the BBC published British, Bold, Creative, a paper where it put forward a vision for its future based on openness and collaboration with its audiences and the UK’s wider creative industries.

In this blog post, we focus on an area where the BBC is already using an open and collaborative model for innovation: software development.

The value of software

Although less visible to the public than its TV, radio and online content programming, the BBC’s software development activities may create value and drive innovation beyond the BBC, providing an example of how the corporation can put its “technology and digital capabilities at the service of the wider industry.

Software is an important form of innovation investment that helps the BBC deliver new products and services, and become more efficient. One might expect that much of the software developed by the BBC would also be of value to other media and digital organisations. Such beneficial “spillovers” are encouraged by the BBC’s use of open source licensing, which enables other organisations to download its software for free, change it as they see fit, and share the results.[1]

Current debates about the future of the BBC - including the questions about its role in influencing the future technology landscape in the Government’s Charter Review Consultation - need to be informed by robust evidence about how it develops software, and the impact that this has.

In this blog post, we use data from the world’s biggest collaborative software development platform, GitHub, to study the BBC as an open software developer.[2]

GitHub gives organisations and individuals hosting space to store their projects (referred to as “repos”), and tools to coordinate development. This includes the option to “fork” (copy) other users’ software, change it and redistribute the improvements. Our key questions are:

  • How active is the BBC on GitHub?
  • How has its presence on GitHub changed over time?
  • What is the level of adoption (forking) of BBC projects on GitHub?
  • What types of open source projects is the BBC developing?
  • Where in the UK and in the rest of the world are the people interested in BBC projects based?

But before tackling these questions, it is important to address a question often raised in relation to open source software:

Why might an organisation like the BBC want to share its valuable code on a platform like GitHub?

There are several possible reasons:

  • Quality: Opening up a software project attracts help from other developers, making it better
  • Adoption: Releasing software openly can help turn it into a widely adopted standard
  • Signalling: It signals the organisation as an interesting place to work and partner with
  • Public value: Some organisations release their code openly with the explicit goal of creating public value

The webpage introducing TAL (Television Application Layer), a BBC project on GitHub, is a case in point: “Sharing TAL should make building applications on TV easier for others, helping to drive the uptake of this nascent technology. The BBC has a history of doing this and we are always looking at new ways to reach our audience.”

The BBC has an important presence on GitHub in three main areas: News, R&D and Services

We have identified 18 BBC organisations on GitHub, with 115 unique members - a level of activity that is far from insignificant. To put these numbers into context, the Government Digital Service has 41 members on GitHub, and Google 514.[3] Other UK broadcasters are also much less active in GitHub than the BBC: ITV has 11 GitHub members, Channel 4 has 7, and Sky UK has 6.

Further analysis reveals six main BBC software development areas in GitHub.[4] News, R&D and Services are where most activity is concentrated. They are followed by Platform, Archive and Mixed.[5]

We provide some examples of digital innovation in these different areas later, but before doing that, we look at how the levels of BBC activity in GitHub have changed over time.

BBC development activity on GitHub has greatly expanded in recent years

There are currently 380 projects associated with BBC organisations – 298 of these are “original” (that is, not forks of other projects). We have also found 817 forks of BBC projects – instances where other users have copied BBC code to continue working on it.[6] The two charts below show the recent evolution in BBC projects and forks (by development area).

BBC projects in GitHub

They show three things:

  • BBC development activity on GitHub has grown rapidly. Between 2012 and today, the number of BBC software projects on GitHub has grown ten-fold.
  • More parts of the BBC are getting involved in open software development: Until around 2013, only BBC R&D was present on GitHub. This has changed since then: News and Services projects started appearing in 2013, and Archives soon after.[7] This presumably reflects the increasing importance of digital technology in more areas of work at the BBC.[8]
  • Interest in BBC code among GitHub users has also grown: the number of forks of BBC projects has multiplied by almost 25 between 2012 and 2015. These are instances where another user copies BBC code for their own purposes. Such actions are prima facie evidence of interest in the “parent” code developed by the BBC, and an indication of its value.[9]

Some examples of the BBC’s open digital innovations on GitHub

The graph below illustrates the variety of areas where the BBC is attracting interest from other users on GitHub, measured by fork numbers.

BBC innovations on GitHub

  • News: BBC News software projects are mostly focused on web development technologies for the BBC News site. In particular, we find two ’blockbuster’ projects for responsive web design, Imager.js (for loading images responsively in websites) and Wraith (to compare webpage screenshots), both of which have almost 200 forks each. Also in the News area, BBC News Lab has developed Datastringer, a tool that helps journalists integrate different data feeds and sets up alerts if interesting patterns arise. BBC Visual Journalism hosts code for interactive data visualisations and data applications on BBC websites (the fact that these are “finished projects” might explain why they are not being forked).
  • R&D: BBC R&D attracts interest with a family of projects for audio waveform visualisation (a tool that helps audio editors in BBC Radio to manipulate audio files visually), and Similarity, a text-mining tool used to compare documents in data journalism (this was used in a project to visualise the text in Iraq war logs).
  • Services: In this area, FMTVP (BBC Digital TV Platforms) has created TAL (Television Application Layer), the open source library for Connected TV applications that we mentioned earlier.

The community of developers interested in BBC projects are spread across the UK, and globally

We have also looked at the location of GitHub users who have forked BBC projects (see map below). [10]

We find these users in 53 different countries. A third of them are based in the UK, and a fifth in the US.

London is the city with the largest number of people forking BBC projects (16%). Other active UK cities include Manchester, Leeds, Glasgow, and Edinburgh.

Internationally, the ‘hotspots’ of interest in BBC development include Paris, New York, San Francisco, Berlin, and Amsterdam – interestingly, more than half of the cities in this “top 10” also show up in Compass’s recent ranking of global tech start-up ecosystems. A possible interpretation of this is that some software development activities at the BBC are highly innovative, attracting the interest of entrepreneurs in reputed tech clusters.

Implications

We have shown that the BBC has an important presence on GitHub, covering an expanding number of technology areas, such as web design, data journalism, data visualisation and content standards. These activities are garnering interest from significant numbers of GitHub users, not least developers in thriving tech start-up ecosystems in the UK, Continental Europe and the US.

But what does this mean for ongoing debates about the future of the BBC?

In its Public Consultation for the Charter Review, the Department for Culture Media and Sport highlights how the UK has benefited from R&D at the BBC, while also mentioning concerns about “crowding out” technology investment in the private sector, and high costs.

In some ways, the open software development activities we analyse in this post appear to increase the public benefits from BBC’s R&D while removing some of the risks:

  • Crowding in, not crowding out: These projects generate open outputs that other organisations – commercial or not – can build on or use for their own purposes. A cursory glance at the digital economy shows that open source tools and infrastructures have created manifold opportunities for private sector innovation and growth. More generally, early concerns about the impact of open source on the commercial sustainability of the software industry have proven unfounded.[11]
  • Collective intelligence: By inviting collaboration from other developers, openness can boost the productivity of BBC’s R&D investment, increasing the quality of its outputs and decreasing production costs (our analysis of contributors to BBC software projects on GitHub suggest that there is substantial involvement from developers outside the corporation – which has the potential added benefit of strengthening the networks that underpin innovation).

Given all of this, and consistent with the open vision for the future of the BBC set out in British, Bold, Creative, our question is: how can the BBC can use its considerable technological capabilities to maximise its impact on innovation, by making even greater use of the open source model we have studied in this post?

Appendix: Limitations and issues for further research

  • What is the control group? In our exploratory analysis, we have not controlled for the fact that some of the trends that we observe might be partly driven by the growing popularity of GitHub. We could address this by comparing the BBC with other broadcasters or media companies – a challenge for doing this is that the BBC is that, as we said before, the BBC is much more active on GitHub than other UK broadcasters. Another option would be to use a larger public organisation (e.g. the Government Digital Service), or a random sample of GitHub organisations as controls.
  • What is the impact? How do we quantify the value of the software that the BBC shares on GitHub? Although a fork can be seen as a proxy for a user’s interest in a project, that is still a long way away from measuring the economic impact of the software being downloaded. One option would be to adapt the methodologies that researchers have used to quantify the economic value of other open source software projects such as Linux.[12]
  • What about the BBC’s contribution to other organisations? This blog has focused on the creation of open source software by BBC organisations (in some cases in collaboration with others), and their diffusion via “forks”. What we have not looked at here is the participation of BBC developers in projects outside the BBC – such activities might be creating value both for the BBC and the wider tech and creative ecosystem (e.g. by creating better standards, stronger infrastructures and more powerful tools). Looking at the effort spent by BBC developers on such public value projects is a natural follow-up to our analysis.

Note: Data collection and analysis for this blog was done with R. Social network graphs were produced with Gephi. The scripts and data are available in GitHub.

Endnotes

[1] Several key components of modern ICT systems, such as the Apache server or the Linux operating system (which, for example, underpins Google’s Android mobile OS), and a multitude of programming languages and applications (including R and Gephi, the tools used for data collection, analysis and visualisation in this blog post) have been developed using an open source model.

[2] GitHub was founded in 2009 and, as of today, it has a community of 10 million users working in over 26 million projects. On GitHub, users can share their software code in “repos” (repositories for software projects), “fork” other users’ repos (create copies of the code that they can work on) and give back their improvements through “pull requests”. They can also subscribe to interesting repos, or “star” them (the equivalent of a bookmark). GitHub’s open Application Programming Interface (API) provides easy access to data about GitHub users and their organisational affiliations, repos and their contributors and forks, among other things.

[3] There are in fact 192 members affiliated to BBC organisations including duplicates (members affiliated to more than one BBC organization). One thing to remember here is that it is always possible for an employee of an organization to participate in GitHub without being “officially” affiliated with the organization in GitHub.

[4] In practice, this meant producing a social network where BBC members were connected if they were members of the same organisation. We then used a community detection algorithm to find distinct “components” in that network. One can use different algorithms to do this, and we opted for the one that broke up the network in a cleaner (more modular) way, the “leading eigenvector” method (giving us five communities or development areas). Having done this, we allocated BBC organisations to the development area that contained most of their members. If a BBC organisation did not have a “majority” development area, we allocated it to a “Mixed” category.

[5] The Mixed area includes BBC “crossover” organisations like iPlayer, which intuitively sits between BBC Services and Platforms, and BBC Connected Studio, which, because of its crosscutting nature, includes developers from a variety of BBC communities.

[6] This list of forks excludes forks from BBC members, and forks from a single individual who is no longer active in GitHub who had forked 87 BBC projects (including 74 forks in a single day).

[7] It is worth noting that one of the development areas (“Platforms”) that we identified previously is missing from these charts because it has no original repos or forks in GitHub. One potential explanation for this is that BBC developers in that area are using GitHub on a personal basis, or that they are collaborating in projects that are not being shared openly (premium GitHub users are given the option to keep their repos private).

[8] Of course, it could also be that development activities that were previously “private” (i.e. not open) or taking place in other platforms were relocated to GitHub.

[9] Forks are an imperfect proxy for interest in BBC code for two reasons: on the one hand, developers can download code from a GitHub repo without forking it (this would lead us underestimate interest in the repo if we only look at forks). On the other hand, forking a repo is easy. There is always the possibility that the individual who forked it did not carry out any subsequent development, or used it in any meaningful way. One way to address this would be to look at levels of development activity in forks of BBC repos – this is an issue for further research.

[10] Just over two-third of GitHub users who have forked BBC projects provide information about their location (this captures 489 unique GitHub users, excluding members of BBC organisations). We have used Google Maps geo-coding API to, where possible, identify their country, and extract the geographical coordinates of their location for mapping. This allows us to, to some degree, deal with inconsistencies in the way that users provide their location. The geo-coding process has helped us to identify the countries for 423 forkers, and the localities for 316 forkers (we have excluded instances where the geo-coding process generated a large number of matches for a single location).

[11] See for example, Moody (2001).

[12] http://www.scirp.org/journal/PaperDownload.aspx?paperID=53076

Author

Juan Mateos-Garcia

Juan Mateos-Garcia

Juan Mateos-Garcia

Director of Data Analytics Practice

Juan Mateos-Garcia was the Director of Data Analytics at Nesta.

View profile