Using crowdsourcing & AI to generate new archaeological discoveries

There’s history beneath our feet—and finding and recording it is a constant challenge for archaeologists. However, in the age of Google Earth, it’s never been easier for anyone to spot the telltale remnants of Stone Age roundhouses and Roman forts from home. Google Earth technology means that anyone can be an archaeologist and the collective impact of that change is what the Deep Time project sought to harness.

This project’s genesis came during the first Covid-19 lockdowns of 2020, when the social outreach archaeology organisation DigVentures started running “virtual digs”. Not only did it give 8,000 participants from 81 countries a badly needed way to remotely socialise, it was also producing “real” archaeology as some of the participants spotted previously-unidentified sites from satellite imagery.

“I knew that if we could just harness that energy in a productive way we could generate new insights for the discipline,” says DigVentures projects director Brendon Wilkins. “There are huge challenges facing archaeology due to the construction of new homes and infrastructure, as well as climate change. Coastal erosion and flood events can damage archeological sites, but human responses such as reforestation or new peat bogs can also have an impact. Entire sites can be taken away before being identified. So we wondered what would happen if we could combine our crowd with the efficiencies of AI.”

He reached out over Twitter to Iris Kramer—the founder and CEO of ArchAI, which uses machine learning to automatically detect new archaeological sites from laser scanning (LIDAR) and satellite data—to collaborate. Together, they drew up a plan to explore the potential for mixing collective intelligence and AI in archaeology, as well as to address what they saw as a systemic failure within the British planning permission system, where new construction can blast through historical context and ignore the chance to give people a deeper connection with their local landscape.

“Many market-based models of archaeology hugely undervalue participation, when it should be a huge asset,” says Wilkins. “The closest most people get to archaeology is by peering between the cracks of a building site hoarding, or seeing a news report long after construction has begun.” The Deep Time project would be the first to explore collective intelligence within this context of “participatory planning”, which encourages people to become more involved in and aware of the heritage impacts of new construction.

What we did

The Deep Time project was centred around the Historic Environment Records (HERs). These are spatial databases of archaeological sites across the UK, usually managed by local governments and consulted when planning new construction. However, they are often only a “best guess,” says Wilkins. “About 50% of the data has been improved in recent years, but some of it originates from a century ago and could be misleading. A point or polygon in a field might mark a Bronze Age artefact, or it could mean a whole settlement.”

The project had two aims:

  1. to explore how collective intelligence, working with AI, could improve both the quantity and quality of Historic Environment Record data
  2. to see if participants in such a project would feel a more profound connection with their local area’s geography and history after it ended

To test these aims, a six-stage “crowd-in-the-loop” model was devised:

Step one: DigVentures’ team cleans up the Historic Environment Record data
Using existing geographic information system (GIS) software, HER data covering the 220km2 Brightwater area in County Durham was audited in order to create consistent metadata. This generated a baseline dataset of 3,577 known sites, split into 66 different types of monument across 13 time periods.

This was further broken down into a list of 12 different site types for the purposes of the project, representing the most common feature categories across the Brightwater area (2,441 of the sites in the 3,557 identified above): Agricultural, Bank, C20th, Ditch, Enclosure, Industrial, Military, Mound, Pit/Hollow, Railways, Routeway, and Settlement.

Step two: The data is layered with further data from other sources
Further datasets – for example, LIDAR data from the Environment Agency along with Ordnance Survey maps – were also collected for the same geographic area. These layers provided the imagery and metadata for the main experiment, the results of which would be compared with the dataset in step one to measure success.

Step three: DigVentures trains participants to spot and label archaeological features in the composite dataset
DigVentures recruited 100 participants (called “Pastronauts”) via its social channels from a field of 970 applicants, and gave them a five-week online archaeology class. At the same time, they built what they called a “participatory GIS”—an in-browser app where users spent two weeks drawing and labelling polygons around possible archaeological sites in the imagery, while also talking to each other via a chat system. “You wouldn’t have a competition on-site to see who could dig the fastest, so we didn’t want to do that online either,” says Wilkins. “Many crowdsourcing endeavours are gamified for competition, but this was the opposite.”

Step four: DigVentures verifies the crowd labels, and ArchAI uses them to train a neural network
After four weeks of labelling and validating, the crowd data was exported and cleaned for obvious mistakes and that dataset was then used to train an AI to spot the same patterns that the crowd had seen. For this stage, only a subset of six specific types of feature – the ones most commonly labelled – were used for training: ridge & furrow fields, enclosures, mounds, pits, quarries, and deserted mediaeval villages. The trained AI was then run over the unlabelled Brightwater LIDAR dataset to identify entirely new features that the crowd hadn’t labelled.

Step five: The crowd verifies the AI’s labels
The new sites were imported back to the participatory GIS as another map layer. Participants were given a further class on how to validate the quality of the AI’s data—in particular, spotting false positives—and then given the task of actually doing it.

Step six: The final, verified results are added back to the HER
The crucial step. The hope was that not only would the overall six-stage loop lead to better Historic Environment Record data, but the AI would also be further improved with more rounds of training on the crowd’s verified data from the previous step. This would create an automated site detection system that iteratively improved.

What we learned

The crowd found a total of 3,670 features during step three of the model, of which only 1,309 were represented in the original Historic Environment Record dataset. “That’s a 60% uplift compared to before the project,” says Wilkins. The “accuracy” for each identification was measured by both the shape/size of its polygon and the quality of its metadata. They estimated that the HER’s accuracy was 88% overall (based on a random sample of five percent of labels), which meant that adding the crowd labels back into the original poorer-quality HER dataset actually increased its overall accuracy to 94%—“a huge win.”

However, their hopes for crowd-AI collaboration weren’t met to the same degree in the latter of the six stages. “For most classes of sites it didn’t work at all,” says Kramer. The key issue was the “lopsided” crowd-generated dataset. For example, while there were a wide range of different types of site identified by the crowd, more than 90% were ridge and furrow fields (so-called because of how ancient ploughing has shaped farmed land). It meant that for most types of site, the training dataset was actually fairly limited, and with significant inconsistencies in the sizes and shapes of many features, for example circular burial mounds with polygon outlines of more than 100m wide, when experts would know that they’re rarely larger than 10-15m wide.

While the crowd was still “excellent” at weeding out many of the false positives when verifying AI data – such as recognising that a “burial mound” was really just a modern roundabout – there was still “a lot of margin of error”. The crowd data, by not adhering to the kind of consistent, uniform “rules” ArchAI had planned for, trained the AI to generate false positives. “That doesn’t suggest that we can’t solve that problem in the future, but it is still there,” says Kramer.

Conclusions

Despite the “shaky” AI results, both Wilkins and Kramer feel that the experiment delivered valuable insights. Kramer suggests adding more experts to the mix, to train both the “newbies” and the AI, and to have stricter definitions for each type of feature in order to limit crowd-generated false positives. She also points out that this experiment only tested for specifically British sites, when other regions may be easier for AI to handle—such as places where people have been living for fewer generations, for example, which would mean a less complex and diverse range of site types left in the landscape. There are also other ways that AI and the crowd can work together, such as an “AI change detection model” where the local community can be automatically alerted to the adverse effects of climate change, like cliff erosion, and assess any archaeological damage quickly.

Wilkins considers the overall project to be a collaborative success, even if the crowd-in-the-loop model didn’t gel with AI collaboration as hoped. Participants flipped from 75 percent being “basically indifferent” to the landscape pre-experiment to 75 percent saying the opposite after, with 32 percent having a “strong” connection to it. He’s also proud that their participants were “a group of people who looked like the local high street—we’ve seen in the last couple of years, like with the Edward Colston statue in Bristol, that not everybody has an equal lever on heritage narrative.”

“It’s a huge way forward, both for our community and for the challenges we see facing archaeology—and we may be able to bring AI back in down the line,” he says. “This model still created both quality and quantity, and a connection between people and the landscape they live in.”
“I came into this excited, and I still am—but I’m also more realistic,” says Kramer. “While we didn’t have the high accuracy we hoped for, the exciting outcome of this experiment has been these new ways of bringing diverse crowds into archaeology. We need to bring local people into planning processes, without or without AI, but there’s a place for both within the system.”

For more information about this experiment please contact [email protected] or [email protected].

Author

Ian Steadman