Talking data like a pro: A plain English guide to data analytics
Like any area of tech (or any specialised discipline for that matter), data analytics is rife with jargon. This makes it more difficult or even intimidating for non-specialists to approach the topic. At our City Data Analytics event on May 17, we brought together people with deep knowledge of data science, as well as those interested in applying it to real-world problems for the first time. To support richer discussion and encourage collaboration, we’ve developed this short 'jargon buster' on 15 essential terms and concepts.
First, a bit about the event itself:
- It was an opportunity to rub shoulders with data scientists, city chief execs and many local government innovators working on trailblazing data projects across the UK.
- We held workshops with seasoned experts from ASI Data Science, Behavioural Insights Team, and Nesta on the tools of the trade, evaluation, and information sharing, to help people design better data-informed projects from the ground up.
Data Analytics Jargon Buster:
- API: stands for Application Programme Interface, a code that facilitates communication and access to data between two software programmes.
- Big Data: According to the UK Cabinet Office, refers to both "large volumes of data with high levels of complexity" and the "more advanced techniques and technologies required to gain meaningful information and insights in real time".
- Cognitive Computing: refers to systems that learn at scale (i.e. have the ability to process very large volumes of all types of data), in real time, and interact with humans naturally.
- Data Analytics: the extraction of insights and meaning from raw data using specialised tools and techniques.
- Data Analyst vs Data Scientist: In general, a data analyst will help you query, summarise, and process data, and a data scientist will apply analytic tools and techniques to solve specific problems.
- Data Lake: a shared data environment that provides for long‑term storage and management of all types of data. Compared to highly-structured data warehouses, supports both structured and unstructured data and allows for easy access and analysis of raw data.
- Data Visualisation: the art of communicating and making sense of data using images.
- ETL: refers to the process of Extracting, Transforming, and Loading files from silo applications into an index or data warehouse.
- Machine Learning: a branch of Artificial Intelligence that enables systems to learn and improve from experience, without being explicitly programmed. Great at spotting patterns and generalising to other cases based on the data inputs and outputs. (Here’s a short guide from Harvard on how Machine Learning can work for local government.)
- Random Forest: a Machine Learning technique to detect anomalies in large, noisy datasets, frequently used to identify cases of credit card fraud. Good for ‘finding needles in haystacks’ kinds of problems.
- Small Data: refers to data which is small enough to be processed inside a single computer, using simple tools such as spreadsheet applications.
- Spatial Analysis: a technique to understand the relationship between variables linked to a location and patterns in a space. Spatial analysis underpins Geographic Information Systems (GIS), such as those used for satellite navigation in cars or maps on smartphones.
- Structured Data: data which is in a traditional row-column tabular format.
- Unstructured Data: is data that needs to be cleaned and processed before analysis, or where the structure of the data is not tabular. An example of the first type would be text, and of the second, a social network.
- Web Scraping: An alternative to using APIs, web scraping simulates human web surfing in order to manipulate and extract data from a website.
Did we miss anything? What else would you add to this list? Join the discussion.