Big Data Resources
NESTA hosted an event, The Power and Possibilities of Big Data on Wednesday 17 November, 2010.
This page holds a collection of examples, links and videos that relate to the topics that were discussed at the event. We are looking for your ideas and suggestions. If you have a good article, video or case study that should be listed here, please let us know by email, or on twitter with the hashtag #nestahottopics.
Hal Varian, Google’s Chief Economist, has said:
“The sexy job in the next ten years will be statisticians… The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it”
Access to large data sets is no longer the preserve of insurance companies and giant retailers. With cheap technology that makes it easier than ever to capture and store this data, a wide range of organisations can now tap into the power of ‘big data’.
This event will explore this data as a tool for innovation, and the technical and social challenges involved. We will discuss examples of companies who are collecting, processing and interpreting data in order to change the way they operate. How is the proliferation of data being used as a platform for innovative ideas, and how can you avoid being paralyzed by data?
This is not new – insurance companies have analysed risk factors to determine premiums for decades. Capital One was created on the concept of using detailed customer data to create a range of customised offers for credit cards, and analysing the results of each test. Wal-Mart has collected huge volumes of customer data since the 1980s – by 2004, the New York Times reckoned its databases contained more data than the entire internet (460 Terabytes).
There are three key changes that are bringing the issue of data onto many more agendas. Firstly, it is obvious that data storage, processing power and cloud services continue to make large scale data analysis more and more accessible. Secondly, it is now possible to analyse unstructured data – through natural text or photographs – rather than only structured, coded information. Thirdly, there are many more opportunities to capture data, from sensors in phones and RFID tags in products, as well as a greater social acceptance of contributing manually entered data to social services.
A good summary of some of the issues was published in the Economist in February this year (subscription required), as well as a recent report on Smart Systems. The McKinsey Quarterly also featured Big Data as one of their 10 tech-enabled business trends to watch.
Topics for discussion
Innovation from data
Once you have figured out how to capture and store your data, you need to extract meaning in a way that allows you to act upon it. The following examples give some ideas of what you can do, and how you can address the challenges that arise.
Find correlations and trends
- Wal-Mart discovered in 2004, that along with flashlights, batteries and other emergency supplies, Pop-Tart sales increased before a predicted hurricane.
Customise marketing to your customers
- Capital One’s history is one of test marketing, and refining products based on customer responses.
- Tesco and Dunnhumby put together the Clubcard. Whilst it provides a way to collect data, the value comes from acting upon this by providing customised offers, and recognising more complex patterns of purchasing.
Change your business model:
- Wal-Mart’s Retail Link system that carries shelf inventory and real-time sales data means it can pass stock management to suppliers in some cases.
- Data marketplaces such as Microsoft’s Project Dallas and start-ups InfoChimps and Pachube aim to make it easier to find and use structured datasets, by acting as brokers.
Sifting large amounts of data can make it possible to identify key warning signals that help to predict what will happen next:
- Cablecom identified that the decision to leave a network was made well before the end of a contract – around month 9. This allowed them to offer special deals and incentives to stay at the right time.
- Sickness in premature babies can be identified before any visible symptoms appear by monitoring subtle changes in seven kinds of real-time data, such as respiration, heart rate and blood pressure.
- Li & Fung, the large Chinese supply-chain operator, got advance warning of the economic crisis, and of the recovery, by looking at customer ordering patterns.
- FlightCaster uses 10 years of data on flights to predict whether a flight will be delayed before the airline confirms it (U.S. only)
- By monitoring manufacturing machinery for changes in heat or vibration, breakdowns can be anticipated before they impact production.
Improve your products
- Google is the master of this, using feedback from search behaviours to create and refine its translation and spell check tools, as well as creating many other products, such as Google News.
- New York City has run its 311 information service since 2003 to connect callers with the government information and services. They have taken over 100 million calls to date, and have used calls to pinpoint the origin of a strange maple syrup smell experienced in Manhattan on certain days.
- Open innovation projects often involve making data widely available, in order for external partners to work on improvements. Both the Netflix prize, to improve their movie recommendation algorithm, and the Goldcorp Challenge, looking for the best places for gold mining, made large datasets available.
- OpenStreetMap is an open source map that has gathered data automatically (from courier routes) and manually (asking contributors to use satellite images to create maps), and also allows new services to be built with its data. OpenStreetMap mobilised its contributors to create a detailed map of Port au Prince in Haiti in the days after the earthquake.
What are the technical challenges?
- Accuracy – ensuring that the data you store is of good quality, without duplicates, misspellings and other errors.
- Linking different types of data - to amplify the power, you need to connect different sources of data together – but that can be very challenging to do.
- Visualisation – creating meaningful views of the data that allow you to make decisions.
- Cloud-based services and open source software Hadoop aim to make the technical challenges accessible to smaller companies.
- Scraperwiki aims to create structured data sets out of public documents, and runs Hacks and Hackers days for journalists and programmers to experiment with what can be achieved with this data.
What are the social challenges?
- Privacy – can individuals be identified from their data, and have they consented to storage and re-use?
- Ownership – who owns data collected for one purpose that is then used for a different purpose?
- Contribution – how to encourage users to contribute and maintain up-to-date information.
NESTA’s work on big data
- NESTA Investments has interests in several data companies. Acunu is building a new type of storage for managing Big Data problems, based on work on dictionary algorithms and using today's storage hierarchy. They are entering public beta in February 2011. Gnodal has developed technology that gives an order of magnitude improvement in data centre performance and power efficiency.
- The Reboot Britain programme of work uses data in a variety of different ways. Safeguarding 2.0 with FutureGov, Headshift, ThinkPublic and the Local Government Information Unit (LGIU) looked at ways to use better data visualisation and connections to improve social work. It uses online co-production of self-development plans with families in crisis, to involve citizens in the service.
- Two other projects within Reboot Britain also rely on data: Buddi pairs miniaturised GPS tagging with a successful rehabilitation programme for repeat young offenders to support the rehabilitation in real time. Data will feed into mentoring and drug rehab programmes to help keep people on track. Another Buddy - Buddy the social radio, is piloting a radio that connects to a mental health patients to their professional and social network to help manage and monitor their mood. Built into the Buddy system – is a software platform that collects and presents the data back to health professionals, and allows users and their various communities of carers to view, analyse, comment and share their mood data.
- Make it Local is a programme of work to show how local authorities can work with digital agencies to unlock their data. The projects are due to launch in January 2011, and you can follow the work in progress on the project blog.