Tracking startups using domain name registration data
We believe that startups are a vital force for innovation. As we have written elsewhere, startups are not only a crucial mechanism for bringing new ideas to life in their own right, but they are also an important means of exerting pressure on established firms.
Knowing where startups are being formed, and in what sector, is very useful for researchers, policymakers and businesses alike.
For instance, there is evidence that technology clusters - such as East London Tech City, also called ‘Silicon Roundabout’ - create a synergistic environment which is beneficial for startups. However, since current opinion is mixed concerning the ability of national or local government to create such clusters from scratch, a better approach may be to identify clusters that are already in the process of formation, and stimulate these with additional support.
Similarly, researchers are interested in understanding the effectiveness of policies to promote entrepreneurship in many countries around the globe. Understanding which policies work and which ones don’t, helps to shape innovation policy going forward.
However, tracking startups can be very difficult.
We often don’t know when they come to be, where they are based and what they do. Especially in their early days, startups are by definition very small entities and can leave little footprint. Startups often start in garages or dorm rooms and may exist for months or years before they are formally registered or hire their first employee.
A fundamental problem for entrepreneurship research is that young startups leave little footprint
This is a pretty fundamental problem in entrepreneurship policy and research. Without direct measures of how many startups are being formed in different parts of the world, what they’re working on and how they’re doing, evaluating policies in support of startups is difficult.
Company registration data is one obvious source, though this often lags activity by many months. In addition, company descriptions based on SIC codes are - as we’ve previously lamented - not really fit for purpose. Moreover, comparison between countries can be difficult - and several nations do not allow open access to their company registries.
Furthermore, many countries simply lack such registries, or do not make them openly available to researchers and policymakers.
For this reason, Nesta is interested in alternative metrics - including job adverts, meetups, social media chatter and code commits - which may help to track instantly and accurately the formation of firms when they are small and hard to detect.
One potentially useful alternative metric, which has not been much explored to date, is that of domain name registration.
Increasingly, there is one act in common for young companies: registering a domain name and launching a website.
Increasingly, one of the first common acts of innovative, young companies is to register a domain name and launch a website. Given the simplicity and low cost, even startups that wish to stay in ‘stealth mode’ for a period will often still reserve a new domain name for future use.
What makes such analysis particularly attractive for entrepreneurship research is the fact that domain name registration regulations are universal, making it possible to detect startups across the globe and over time using a single centralised database and consistent methodology. This may be especially useful for understanding entrepreneurship in the developing world, where other registration data can be hard to access.
The main challenge with domain name analysis is one of filtering.
Although most new companies have websites, most websites do not involve companies. Thus one must develop methods to filter out domain names that are unrelated to startups.
One approach is via machine learning and text-analysis techniques, using key-words to determine whether a website relates to a relevant company or an irrelevant blog, say.
To test whether this approach could work, it was applied to measure new firm formation in two UK cities, Oxford and Cambridge, and then compared with alternative datasets - as described in the linked paper. In our view, it shows how domain name registration data could be used to count new firm formation at an extremely granular level both over time and at the micro-geographic level.
This work highlights the promise of using domain name registration data to measure entrepreneurial activity across many regions of the world, especially when systematic business registration data are unavailable.
Image: Visualization of the Internet from the Opte Project (2005), licenced under Creative Commons CCBY2.5. The image represents less than 30 per cent of the Class C networks reachable by the data collection program in early 2005.