"They universally hate the process. It's the least favourite part of their job and they're grinding their teeth every time they have to do it. They don't like doing computer work. They want to be installing stuff."
This is what we heard when we spoke to heat pump installers about the admin involved in their work. Dozens of emails to local electricity network companies, known as distribution network operators (DNOs), different templates for every region, responses that are difficult to understand and delays that stretch to months. If it’s winter and a boiler fails, households often can’t afford to wait, they just get a replacement boiler instead of a heat pump.
This kind of problem turned out to be a good fit for an agentic AI solution. But how do you know when that’s the case?
An agent is an LLM that can use tools to achieve a goal, rather than just generate text, images, or videos. It can open up documents, send or receive emails, query databases, or take a range of other possible actions until the task is done.
At Nesta, we’ve been testing agentic AI and developed a framework of five questions to help you decide if building an agent makes sense. The first three are about whether your problem is right for an agent. The last two are about whether your organisation is ready.
Agents can use a wide range of tools and also produce a wide range of outcomes. To make sure an AI agent is working well, you need to check the final outcome, not just the steps it took to get there. Traditional models are usually easier to check because they produce a simple prediction or estimate. For agents, you must carefully plan how you will confirm that the agent's final output is correct.
Think about the problem you’re trying to solve - does it have a verifiable outcome? Can you check that “the correct action happened”?
Sometimes the outcome is straightforward. We built a ‘nutrition auditor agent’ whose job was to look at potentially incorrect nutrition information and determine if the data was suspicious. The outcome was a classification: 'suspicious' or 'benign’. There were various combinations of tools it could use to get there, not one correct path, but the outcome is still checkable: did it correctly flag something as suspicious or not?
But the outcome isn’t always so easy to pin down. With the DNO heat pump applications, the end goal, a successful application to the DNO, sometimes involved weeks of back and forth between the homeowner, the installer and the DNO. We couldn’t easily verify the entire process in one go. So we broke it into steps - at each stage, there was a correct action the agent should take. We hand-labelled 111 real email threads with the right action at each stage, and used that as our test set. If your end-to-end outcome is hard to verify, break the process into steps and verify those instead.
Beyond understanding the outcome, you need to know what’s involved to get there. Many tasks involve both routine, repeatable work (process) and subjective assessment (judgement/human expertise). An agent will attempt to do both, but generally isn't as good at the judgment part.
While building the ‘nutrition auditor agent’, we shadowed Nesta’s in-house nutritionist while she manually audited products. During that session, we identified both process and judgment. The agent could do most of what the nutritionist did, including checking the names of products, looking up reference numbers, doing calculations and comparisons to known databases. But it sometimes got too focused on one piece of data without being able to take a step back. It would return an irrelevant product lookup, not realise it was irrelevant, and continue comparing as if it was relevant. A human would easily catch that.
Before building, map out your task step-by-step. Which parts are mechanical and repeatable? Which require human expertise and judgement? An agent can handle the first sort pretty well. The second part is where you’ll need guardrails or human involvement.
In the above example, we were looking at emulating a nutritionist. It is not a good use of a nutritionist’s time to be auditing thousands of rows of product data. Realistically, it wouldn’t be done. This is a bottleneck. Not just a slow process, but one where the alternative is not doing it at all.
Keep an eye out for these kinds of bottlenecks. In the case of the DNO heat pump applications, we identified delays of weeks and months in email follow-ups, and there was no good alternative. If a household’s boiler fails in winter, they need heat urgently. If the process isn't fast, they will just get a boiler replacement instead. So fixing communications is a real bottleneck to the installation of more heat pumps.
It’s also worth asking, is anyone else solving this? Some problems are neglected - not because they’re unimportant, but because the people affected don’t have the resources to fix them. For us, this is part of what makes the work worthwhile. Ask yourself where you are uniquely positioned to act on a problem that others aren’t addressing.
It’s tempting to assume that putting a human in the loop solves your safety problem. If the agent does something wrong, a person will catch it, right?
During user testing of our DNO communications manager, we discovered this logic doesn’t hold up. We wanted a human to review the agent’s actions before they were executed, but we discovered they didn’t have time, the knowledge, or the interest to be monitoring the agent. We tested an early version of our prototype with several heat pump installers in December. It was the end of the year, so they were very busy and didn’t have much time or patience for another tool. So during user testing, they furiously clicked "next, next, okay, what?" and barely read the output.
This was a necessary wake up call. A human in the loop isn’t a guardrail if the human isn’t really looking. When designing these systems, you need to think about how to keep them on track even when things get auto-approved - because they will. The goal is to remove obstacles so that human experts can focus on the high-value work they do best, rather than supervising the AI.
All of the above is moot if there isn’t buy-in from the people who have the final say. And leaders might be cautious - agents equipped with tools are powerful, and there are real considerations around data privacy, security, and hallucinations that have serious consequences if not deployed thoughtfully. It’s worth looping in senior leadership to discuss the risks and trade-offs.
But buy-in isn’t a ‘yes or no’ binary. It’s trust you build over time. While working with our external partner on the DNO communications manager, they were initially skeptical, going as far as saying the AI we were proposing just wasn’t a good fit for their platform. A lot of terminology from the agent world was unfamiliar, which made them wary. They thought it was going to be some kind of auto-complete.
We reframed agentic AI in terms of software development, which removed their hesitation around new unfamiliar concepts. We built a working demo and connected to their system early, so it felt like part of their platform rather than a separate thing. We put wrong outputs in front of their experts, let them tell us what needed to change, and tweaked agent prompts accordingly. Each iteration made the tool more credible.
A few months later, they took over development and will soon deploy it to their customers.
To determine if a task is ready for agentic AI, first assess its technical suitability: is the output verifiable, is the process clearly defined, and is there a specific bottleneck to solve? By targeting routine, repeatable work rather than subjective judgment, you ensure the AI addresses real friction points rather than creating new ones.
Success then shifts to organisational readiness, which requires designing for actual human behaviour and securing stakeholder trust. When you solve these human and technical hurdles, you can successfully automate low-value administrative 'grind', freeing your experts to focus on the high-impact work they do best.
This framework won’t give you the answer, but it will force you to ask the right questions before you start building.