Our research on the effect of new technology on popularity of artists in the music industry
A handful of superstars and a huge number of unknown artists; inequalities of outcome can be huge in the creative industries.
There is, however, a belief that these inequalities are decreasing due to technological and industry innovations brought about by the internet, accessible data storage technology, and online sharing platforms. These innovations enable lesser known artists to share their work more easily, and enable the consumers to find it quickly, and at low cost.
Power is thus being shifted away from large distribution channels and mainstream media, such as record companies and television, towards sharing platforms and, more directly, towards the producers and consumers. But are these innovations changing the inequalities in popularity for creative goods?
Our research suggests reasons to be sceptical - apart from the way in which the creative industries are organised, it is our human social nature and herd behaviour that drives the inequalities.
Inequalities in popularity can be examined by looking at how outcomes are distributed across people. Unlike popularity, as we will see, many things in the natural world can be described by the ubiquitous ‘Normal distribution’, recognisable by a bell shaped curve (see Figure 1 if it doesn’t ring a bell). In the case of Normal distributions, the average is a very useful measure. Take people’s height; there’s an average height around which people’s heights are distributed, but the further away from the average, the more exceptional one is. Distributions of popularity, also within the creative industries, can look very different from this. Figure 2 shows a stylised example of such a popularity distribution, in which the shape of the curve illustrates that many products or individuals are barely known (the top left of the curve), and a small number of products are highly popular (bottom right corner). Another interpretation from this curve is that the 20 per cent most popular products are together responsible for 80 per cent of the total sales, a relationship also known as the ‘Pareto Principle’. Different versions of such ‘long tail’  distributions exists, but the most observed one is a power law  or Pareto distribution.
Figure 1: The curve of this Figure corresponds to the Normal distribution. In this example, the x-axis shows levels of popularity, which might be approximated by sales data or the number of ‘likes’ on a website, and the y-axis indicates the number of (artistic) products that correspond to the popularity on the x-axis. The curve thus illustrates the hypothetical situation that products are centred on an average level of popularity with some negative and positive deviation.
Figure 2: The curve of this Figure is a stylised example of a distribution with a ‘long tail’. Similarly to Figure 1, the x-axis shows levels of popularity and the y-axis indicates the number of (artistic) products that correspond to the popularity on the x-axis. The curve thus illustrates that there are a few products that are extremely popular, but most other products are not popular. This kind of distribution is often found in empirical data on popularity. 
One could argue that these uneven distributions in popularity are a result of the limited variety and quantity of products that the average consumer is aware of: book and music stores only have limited space on their shelves; television and radio are limited in their amount of airtime. Optimising profits, the focus of business is inevitably on bestsellers, and variety hence suffers.
Digitization, however, has changed this picture dramatically. Online, the consumer is offered quick and low-cost access to an almost unlimited range of cultural products. This might lead us to expect more equal opportunities for artists, now and increasingly in the future; underdogs are given much more potential to grow big.
An alternative explanation for the observed inequality is given by social network influences, peer pressure and consumers’ herd- behaviour, that drive the already popular to become more popular. The idea is simple; if consumers tend to favour products that are already favoured by others, any existing inequality will be amplified in time. Research finds that precisely this mechanism, known as the cumulative advantage  or preferential attachment , in which the probability of retrieving an additional ‘like’ depends on the current popularity, gives rise to exactly the same ‘long tail’ distributions found in empirical data. Computer simulations of diffusion that explicitly model social influences and peer-pressure are shown to result in the same ‘long tail’ distributions  as well.
This makes us doubt whether the digital innovations in the creative industries will indeed give rise to a more equal distribution for creative goods and artists, or whether the inequalities remain, as a result of our social nature and system effects. Using recent empirical data on popularity from an online music platform to perform a small case study, we find support for the latter.
We used data from the ‘Hype Machine’, an online music platform where users can listen to the latest music and can choose to ‘like’, or ‘love’ (in Hype Machine’s terminology), specific tracks. Unique to the Hype Machine is that users can directly read what other people have been blogging about the specific music they are interested in, directing them to the relevant sites in the enormous online conversation that is being held about music in the form of blogs. The site was launched in 2005 and has grown to over one million users.
The Hype Machine aims to make it easier for people to discover new music, and thus contains several features that allow also the lesser-known tracks to be found. The homepage, for example, lists tracks according to the date of their most recent blog posts, irrespective of the number of likes the track has received. Searching for tracks with a particular tag, the user can choose on Hype Machine to display only the tracks with less than 25 likes. These kinds of features of the Hype Machine promote visibility of less popular tracks and counteract the cumulative advantage effect mentioned earlier, which would translate to a more equal popularity distribution.
As the Hype Machine is designed to make it easier for users to discover new music, we may expect that the general user of the Hype Machine is interested in discovering new music, looking out for those tracks that are perhaps not ‘liked’ so much. Supposedly, this user is also less susceptible to peer pressure than the average music consumer. Again, this works against the cumulative advantage effect, translating to a more equal distribution of popularity.
We extracted  data on all the 1 248 tracks present on the Hype Machine that were tagged as ‘Witch House', a dark ambient house genre that emerged around 2007. See this blogpost, or this other blogpost to get a sense of the music. Witch House received sufficient attention from the music scene to be called a (sub) genre, but never became mainstream. As a result, we know that the influence of mainstream channels (e.g Radio, TV) on our data is smaller than for more popular genres . This means our data serves as a good case study to see if the inequality in popularity remain if we consider online shared creative goods subject to relatively little influence of mainstream distribution channels. We now look at the popularity distribution of our Witch House tracks. What do we find?
Figure 3: The popularity distribution of our 1 248 Witch House tracks. Similarly to Figure 1 and 2, the x-axis shows the popularity (indicated by the number of likes) and the y-axis indicates the percentage of tracks that correspond to the popularity on the x-axis. The x-axis is on a log scale , as is commonly done with these kind of distributions in order to allow better visualisation; on a normal scale the curve would have been difficult to distinguish from an L-shape that almost overlaps the axis. The distribution is extremely unequal; the 20 percent most popular tracks account for 80 per cent of the total likes, a relationship also known as the ‘pareto principle’.
This popularity distribution is extremely unequal; the 20 per cent most popular tracks are responsible for 84 per cent of the total likes of the genre. This finding opposes the expectation of more equal popularity distributions for online creative goods, especially considering the fact we are looking a very niche genre that had little attention from the large media channels. To the contrary, this finding suggests the presence of some other process creating the inequality, such as the cumulative advantage effect. In fact, recalling that our data comes from a music platform that promotes the visibility of less popular tracks and has users that are potentially less influenced by the opinion of others, two forces that go against the cumulative advantage effect, we expect the inequalities to be even more extreme for other platforms.
In sum, we find reason to be sceptical about the influence that innovations in the creative industries will have on creating more equal opportunities for artists, as it could well be our own human and social natures driving a system effect that creates the inequality.
 See for example Hartley’s Key Concepts in Creative Industries, Cox et al ‘The concentration of commercial success in popular music: An analysis of the distribution of gold records’, Davies’ ‘'The individual success of musicians, like that of physicists, follows a stretched exponential distribution' , or De Vany’s Hollywood Economics: How Extreme Uncertainty Shapes the Film Industry.
 See de Solla Price (1976) : ‘A general theory of bibliometric and other cumulative advantage processes’.
 See for example Watts (2001) ‘A simple model of global cascades on random networks’.
 The Hype Machine keeps track of approximately 800 music blogs and gathers links to the relevant blogs below the tracks on their website.
 Data was extracted from the HypeMachine by a self written scraper with permission of Anthony Volodkin, the founder.
 The tags of tracks are ascribed by users on the famous streaming service 'Lastfm'. Here, every user can create their own tags and ascribe them to any track they want. The Hype Machine then lists the ten most popular tags per track. To identify the ‘Witch House’ tracks, we selected all tracks that had the ‘Witch House’-tag on place one to five on the Hype Machine.
 This is important for the validity of our results. We would like to see if the inequalities of popularity could also be caused by something other than the power structure of the industry I.e. what if there aren’t many influential channels (e.g Radio, TV) distributing the tracks within this genre? Do we still see extreme inequality?
 Taking a log scale means that all the number are transformed by the ‘log’ function (in the base of 10), i.e. x becomes log x. For example: 10 becomes log 10 = 1, 100 becomes log 100 = 2, 1000 becomes log 1000 = 3. Inspecting these numerical examples you might notice that this log transformation decreases the distance between numbers of different magnitude. The difference between 10 and 1000 on a log scale is only 2. Due to this property, extremely unequal distributions (such as the Power Law) can better be visualised using log scales. In this example, we have only applied a log scale on the x-axis, not to the y-axis.
 Here we define the genre as the 1 248 Witch House tracks of our data, and the total likes are all the likes these 1 248 tracks obtained on the Hype machine until 01/01/2015.
 Another intuitive explanation for the inequalities is that some tracks are simply ‘better’ than others. To test if this ‘objective beauty’-hypothesis against the cumulative advantage- hypothesis, D.J. Watts and his colleagues did several experiments with two parallel worlds; one in which the participants were shown the taste of other participants, and one in which they didn’t have this information. It turned out that inequalities were much more extreme in the first -‘social influence’- world. What is more, some of the very unpopular tracks of the second -‘no influence’- world, became number one hits in the social influence world. The results of this experiment are clearly in favor of the cumulative advantage hypothesis and Watts and his colleagues therefore explicitly question if products of cultural markets, and also of other market they argue, can be judged as “good” and “bad”, whether objectively or in the democratic sense of a competitive market. Read more here.