About Nesta

Nesta is a research and innovation foundation. We apply our deep expertise in applied methods to design, test and scale solutions to some of the biggest challenges of our time, working across the innovation lifecycle.

More about us

Five questions to ask before building an AI agent

26 March 2026

5 min read

Seth Caldwell

Head of Data Science, Data Science Practice

He/Him

Seth is a Head of Science in the Data Science Practice.

View profile

Simon Wisdom

Resident, Data Science Practice

Simon Wisdom has evolved from product data science in fintech to building AI applications for public benefit.

View profile

"They universally hate the process. It's the least favourite part of their job and they're grinding their teeth every time they have to do it. They don't like doing computer work. They want to be installing stuff."

This is what we heard when we spoke to heat pump installers about the admin involved in their work. Dozens of emails to local electricity network companies, known as distribution network operators (DNOs), different templates for every region, responses that are difficult to understand and delays that stretch to months. If it’s winter and a boiler fails, households often can’t afford to wait, they just get a replacement boiler instead of a heat pump.

This kind of problem turned out to be a good fit for an agentic AI solution. But how do you know when that’s the case?

An agent is an LLM that can use tools to achieve a goal, rather than just generate text, images, or videos. It can open up documents, send or receive emails, query databases, or take a range of other possible actions until the task is done.

At Nesta, we’ve been testing agentic AI and developed a framework of five questions to help you decide if building an agent makes sense. The first three are about whether your problem is right for an agent. The last two are about whether your organisation is ready.

1. Can you check the answer?

Agents can use a wide range of tools and also produce a wide range of outcomes. To make sure an AI agent is working well, you need to check the final outcome, not just the steps it took to get there. Traditional models are usually easier to check because they produce a simple prediction or estimate. For agents, you must carefully plan how you will confirm that the agent's final output is correct.

Think about the problem you’re trying to solve - does it have a verifiable outcome? Can you check that “the correct action happened”?

Sometimes the outcome is straightforward. We built a ‘nutrition auditor agent’ whose job was to look at potentially incorrect nutrition information and determine if the data was suspicious. The outcome was a classification: 'suspicious' or 'benign’. There were various combinations of tools it could use to get there, not one correct path, but the outcome is still checkable: did it correctly flag something as suspicious or not?

But the outcome isn’t always so easy to pin down. With the DNO heat pump applications, the end goal, a successful application to the DNO, sometimes involved weeks of back and forth between the homeowner, the installer and the DNO. We couldn’t easily verify the entire process in one go. So we broke it into steps - at each stage, there was a correct action the agent should take. We hand-labelled 111 real email threads with the right action at each stage, and used that as our test set. If your end-to-end outcome is hard to verify, break the process into steps and verify those instead.

2. Do you know the process?

Beyond understanding the outcome, you need to know what’s involved to get there. Many tasks involve both routine, repeatable work (process) and subjective assessment (judgement/human expertise). An agent will attempt to do both, but generally isn't as good at the judgment part.

While building the ‘nutrition auditor agent’, we shadowed Nesta’s in-house nutritionist while she manually audited products. During that session, we identified both process and judgment. The agent could do most of what the nutritionist did, including checking the names of products, looking up reference numbers, doing calculations and comparisons to known databases. But it sometimes got too focused on one piece of data without being able to take a step back. It would return an irrelevant product lookup, not realise it was irrelevant, and continue comparing as if it was relevant. A human would easily catch that.

Before building, map out your task step-by-step. Which parts are mechanical and repeatable? Which require human expertise and judgement? An agent can handle the first sort pretty well. The second part is where you’ll need guardrails or human involvement.

3. Can you identify a bottleneck?

In the above example, we were looking at emulating a nutritionist. It is not a good use of a nutritionist’s time to be auditing thousands of rows of product data. Realistically, it wouldn’t be done. This is a bottleneck. Not just a slow process, but one where the alternative is not doing it at all.

Keep an eye out for these kinds of bottlenecks. In the case of the DNO heat pump applications, we identified delays of weeks and months in email follow-ups, and there was no good alternative. If a household’s boiler fails in winter, they need heat urgently. If the process isn't fast, they will just get a boiler replacement instead. So fixing communications is a real bottleneck to the installation of more heat pumps.

It’s also worth asking, is anyone else solving this? Some problems are neglected - not because they’re unimportant, but because the people affected don’t have the resources to fix them. For us, this is part of what makes the work worthwhile. Ask yourself where you are uniquely positioned to act on a problem that others aren’t addressing.

4. Have you designed for real human behaviour?

It’s tempting to assume that putting a human in the loop solves your safety problem. If the agent does something wrong, a person will catch it, right?

During user testing of our DNO communications manager, we discovered this logic doesn’t hold up. We wanted a human to review the agent’s actions before they were executed, but we discovered they didn’t have time, the knowledge, or the interest to be monitoring the agent. We tested an early version of our prototype with several heat pump installers in December. It was the end of the year, so they were very busy and didn’t have much time or patience for another tool. So during user testing, they furiously clicked "next, next, okay, what?" and barely read the output.

This was a necessary wake up call. A human in the loop isn’t a guardrail if the human isn’t really looking. When designing these systems, you need to think about how to keep them on track even when things get auto-approved - because they will. The goal is to remove obstacles so that human experts can focus on the high-value work they do best, rather than supervising the AI.

5. Can you get the right people on board?

All of the above is moot if there isn’t buy-in from the people who have the final say. And leaders might be cautious - agents equipped with tools are powerful, and there are real considerations around data privacy, security, and hallucinations that have serious consequences if not deployed thoughtfully. It’s worth looping in senior leadership to discuss the risks and trade-offs.

But buy-in isn’t a ‘yes or no’ binary. It’s trust you build over time. While working with our external partner on the DNO communications manager, they were initially skeptical, going as far as saying the AI we were proposing just wasn’t a good fit for their platform. A lot of terminology from the agent world was unfamiliar, which made them wary. They thought it was going to be some kind of auto-complete.

We reframed agentic AI in terms of software development, which removed their hesitation around new unfamiliar concepts. We built a working demo and connected to their system early, so it felt like part of their platform rather than a separate thing. We put wrong outputs in front of their experts, let them tell us what needed to change, and tweaked agent prompts accordingly. Each iteration made the tool more credible.

A few months later, they took over development and will soon deploy it to their customers.

The right questions

To determine if a task is ready for agentic AI, first assess its technical suitability: is the output verifiable, is the process clearly defined, and is there a specific bottleneck to solve? By targeting routine, repeatable work rather than subjective judgment, you ensure the AI addresses real friction points rather than creating new ones.

Success then shifts to organisational readiness, which requires designing for actual human behaviour and securing stakeholder trust. When you solve these human and technical hurdles, you can successfully automate low-value administrative 'grind', freeing your experts to focus on the high-impact work they do best.

This framework won’t give you the answer, but it will force you to ask the right questions before you start building.

Part of

Data engineering Data science The role of distribution network operators (DNOs) in heat decarbonisation

Seth Caldwell

Head of Data Science, Data Science Practice

He/Him

Seth is a Head of Science in the Data Science Practice.

View profile

Simon Wisdom

Resident, Data Science Practice

Simon Wisdom has evolved from product data science in fintech to building AI applications for public benefit.

View profile

We extend our impact through two specialised units that help people and organisations to solve complex problems and achieve their goals.

BIT

BIT helps clients from government, nonprofits and the private sector to improve people’s lives through our empirical problem solving and deep understanding of human behaviour.

Challenge Works

Challenge Works designs and runs challenge prizes to spark innovation in science, technology and society.

Get our regular newsletter and tailor your updates on our missions, programmes and events

Join our mailing list to receive the Nesta Edit, our regular newsletter showcasing how we design, test and scale solutions to some of society's biggest challenges, with updates from Nesta, BIT, Challenge Works and the wider innovation sector.

* denotes a required field

Sign up for our newsletter

First name:

Last name:

Organisation:

Job title:

Country of residence:

Take a deeper dive by signing up for our sector-specific emails too:

Early years

Environment

Health

You can unsubscribe by clicking the link in our emails where indicated, or emailing [email protected]. Or you can update your contact preferences. We promise to keep your details safe and secure. We won’t share your details outside of Nesta without your permission. Find out more about how we use personal information in our Privacy Policy.

Five questions to ask before building an AI agent

About Nesta

Five questions to ask before building an AI agent

Seth Caldwell

Seth Caldwell

Simon Wisdom

Simon Wisdom