Evidence is essential stuff. It is objective. It answers questions and helps us to solve problems. It helps us to predict. It puts decisions on the right track. Evidence makes sure that decisions are safer. Evidence can turn guesswork into certainty. Evidence tells us what works. It explains why people think and act as they do. It alerts us to likely consequences and implications. It shows us where and when to intervene. We have robust methods for using evidence. Evidence is information; information is abundant. It is the most reliable basis for making policy. Evidence is the most reliable basis for improving practice. There has never been a better time for getting hold of evidence.
Evidence is dangerous stuff. Used unscrupulously it can do harm. It is easily misinterpreted and misrepresented. It is often inconclusive. Evidence is often insufficient or unsuitable for our needs. We will act on it even when it is inadequate or contradictory or biased. We ignore or explain away evidence that doesn’t suit our prejudices. We may not spot where evidence has flaws. It can conceal rather than reveal, confuse rather than clarify. It can exaggerate or understate what is actually known. It can confuse us. Evidence can be manipulated politically. We can be persuaded to accept false correlations. A forceful advocate can distort what the evidence actually says.
The answer is that each statement in each cluster is sometimes true, in particular circumstances.
This is not much help to busy people with serious responsibilities who need to act decisively, who need to deliver benefits, value for money, impacts, targets and outcomes. Or busy people who want to get re-elected, reappointed, promoted: ambitious people want to make a mark, leave a legacy, bolster their reputation, earn recognition. Similarly, there are plenty of people who toil away loyally and conscientiously behind the scenes, giving no thought to headlines or sound bites, and who are dedicated to using evidence to improve the quality and fairness of their efforts and those of their service or enterprise or community. What help are the statements to them? Should we conclude that it is impossible to deal wisely with evidence, because evidence is so contingent on circumstances?
On the contrary, this characteristic of evidence is helpful: it encourages us to learn how to be decisive and at the same time to keep an open mind. It obliges us always to be vigilant and sceptical. It reminds us of the hazards of putting too much trust in any one piece of evidence in isolation from other evidence. It counsels us to strive for accuracy in our explanations of what the evidence is telling us.
Recognising three more characteristics of evidence, in addition to contingency, also helps policymakers and practitioners to deal wisely with evidence to improve an existing policy or practice, or to introduce a new initiative to fill a gap. The three are: attributing causality, time lag and good practice.
An idea, a practical activity, a policy, a piece of legislation, an international treaty, are among the many elements that may contribute to a specific social or economic change that is valuable (or harmful). Which element(s) caused that change? We want to know so that we can repeat (or prevent) similar effects in future. A claim of causality has to be backed by evidence if it is to convince others, typically evidence of outputs or outcomes.
For example, the Freedom of Information (FOI) Act 2000 gave the UK public a new statutory right to access information held by public authorities and obliged those authorities to publish certain information about their activities. The resulting disclosure and publication of evidence in this way may encourage or sometimes embarrass public authorities into changing particular policies and practices, especially if the media report the story.
One high–profile instance occurred in 2009 when The Daily Telegraph obtained and published evidence of MPs’ expenses claims, which led to repayments totalling over £1 million, some resignations, a few criminal convictions, and new rules and supervision of claims. The initial evidence came to the newspaper via a leak, which the paper followed up with FOI enquiries.
The line of causality seems obvious at first glance: the leak led to an FOI request which led to disclosures of previously unpublished information. This embarrassed the authorities, the media coverage generated much interest among the general public, emboldening journalists and other observers to call for punishment of the wrong doers and reform of the expenses system, both of which happened.
However, is this explanation the only one? The most accurate one? We must ask whether the equivalent result could have been achieved without The Daily Telegraph’s actions? Or without FOI legislation? Probably. The authorities might have responded to relentless and increasing media pressure by disclosing the details of the expenses claims anyway. It was not The Daily Telegraph’s actions that caused the punishments, resignations and reforms to happen. It was the actions of MPs, the parliamentary authorities and the law, against a backdrop of popular interest, one part of which was The Daily Telegraph’s campaign. Searching for variant explanations that expose more of the texture and interplay of factors help us to avoid jumping to over-simple judgements or seizing on superficial prescriptions. In most instances of social and economic policy and practice, attributing causality is not straightforward. This is the case irrespective of the specific field or sector, or whether the evidence is quantitative, from, say, large statistical data sets, or qualitative, from, say, service users’ views. Attributing causality is so tempting, and leads some people to claim stronger links than are actually in place, especially where the causality seems, to them, so plausible. Consider the example of a policy for culling badgers to reduce bovine TB, a problem that involves may aspects of social and economic policy, scientific inquiry and analysis, and highly–contested interpretations of evidence.
Defra statistics record a rising recent trend in the incidence of TB in cattle in Great Britain, with over 34,000 animals slaughtered in 2011, although the total number of cattle is declining.1 Cattle get TB from infected cattle and from infected badgers. Randomised controlled trial evidence suggests that (reactive) culling of badgers in and around the farm where cattle had TB increases the levels of TB in that area, while (proactive) culling reduces the incidence of TB within the cull area but increases it in the immediately surrounding area. This is interpreted to indicate that culling changes badgers’ territorial behaviour. Other evidence indicates that any benefits of culling are not sustained after the culling ceases.2 Alternatives to culling badgers include vaccinating badgers and increasing biosecurity controls on farms, which are also of contested efficacy.
In early 2012 the Government concluded that badger culling trials were justified and set the preparations in motion, but later that year postponed the commencement in the light of further advice that the trials would be ineffective. The President of the Zoological Society and 30 scientists criticised the design of the cull in an open letter, arguing that:
"...the government predicts only limited benefits, insufficient to offset the costs for either farmers or taxpayers. Unfortunately, the imminent pilot culls are too small and too short term to measure the impacts of licensed culling on cattle TB before a wider roll-out of the approach. The necessarily stringent licensing conditions mean that many TB-affected areas of England will remain ineligible for such culling."
Defra’s Chief Scientific Adviser and Chief Veterinary Officer responded:
"Government policy is based on sound analysis of 15 years of intensive research. Critics are not able to cite new scientific evidence or suggest an alternative workable solution for dealing quickly with this rising epidemic. Culling is just one of a range of measures the Government is taking to arrest the increase in new bovine TB cases, including intensifying testing to remove infected cattle, tighter cattle movement controls, guidance to farmers on stopping badgers on contacting cattle and further research into vaccination."
Many other interested parties participate vigorously in this continuing contest for evidence and its interpretations, including many established campaigning and lobbying groups as well as ad hoc coalitions of farmers, countryside and wildlife organisations, the media and individual citizens. This and the FOI example remind us that ‘what works’ is always a reflection of the context in which policies are constructed, their content and methods, and the needs, motives and perceptions of the people claiming the attribution. A few words on impact, proxies and failure may help at this point, before looking at time lag and good practice.
Impact, as an indicator of benefit (or harm) associated with an action or policy or research project, has become an increasingly prominent measure of individual and organisational performance, much favoured nowadays by governments, funding agencies and in the public services. In some respects this an understandable (and somewhat late) response to the recognition that value for money matters. But what exactly are impacts? How are impacts best measured? What sort of evidence is relevant to impacts? These seemingly simple questions are not so simple to answer, and give rise to much friction and disagreement. Accountability for delivering value from public money means identifying what ‘value’ different interests are seeking. Target regimes (for hospital waiting lists, for re-offending rates, for example) are meant to focus planners and policymakers and front-line staff on specific priorities, and may use incentives or penalties to encourage compliance. We know that they also prompt gaming and displacement.
As the ‘value’ question can be so complex, assessments of impacts often turn to proxies. One proxy for academic research quality and effectiveness is article publication and citations in peer–reviewed journals. This proxy is widely used by research councils and higher education funding agencies to finance universities. Money is another proxy for impact. For example, the British Library, like many other publicly funded bodies and services, felt it needed harder evidence to support its claim that its existence and activities were causing valuable economic benefits for the UK’s economy. In 2003 it published evidence from a study it had commissioned, which calculated that for every £1 of public funding the Library received each year, it generated £4.40 for the economy; and that if public funding of the Library were to end, the UK would lose £280 million per annum. School students’ test results are a further proxy example, taken to indicate the effectiveness of teaching in schools.
In other words, the mere existence of a role, team, department or service, is no longer sufficient justification for automatically continuing former levels of investment in those people and organisations. Nor are their outputs alone regarded as sufficient guarantee of worth or success in delivering value, whether measured in, say, numbers of FOI requests or exam grades or articles published or administrative actions or policy processes. Impact evidence is also demanded.
Because of negative associations to failure, much more evidence is available claiming that something works and much less about what underperforms or fails. Failure and success are parts of a spectrum of performance. Some proxies for impact are used to argue that policies or activities are failing or underperforming. For example, the percentage of women and members of black and minority ethnic groups recruited to some work roles has been increasing, but not to anything like the extent that many regard as essential.
Here too, the use of evidence to attribute failure or underperformance of a policy or practice depends on the vantage points of the interested parties making that assessment. Reevaluation of past policies invokes hindsight and brings different knowledge, expectations and capabilities into the judgement. This can sometimes be a deliberate ‘versioning’ of the past to promote a current objective. Analysing policy failures or underperformance can provide opportunities for learning lessons about how to do things differently in future, although some of the cautions mentioned below may apply, on the likelihood that good practice will be adopted.
Impact has become a favoured element among paymasters who want to see evidence that will reassure them that they can answer positively the question: ‘What difference did your investment in that particular policy/public service/practical action/research make? If they can point to evidence that impact is happening, they feel their investment is legitimised. Similarly, critics want evidence that will fuel the accusations of failure they seek to make.
This elevation of the role of evidence of impact is embodied, for example, in the Alliance for Useful Evidence’s own title and in its aim to ensure that “...high-quality evidence has a stronger impact on the design and delivery of our own public services.” The impact of social science research can be categorised as follows: instrumental (for example, influencing the development of policy, practice or service provision, shaping legislation, altering behaviour); conceptual (for example, contributing to the understanding of these and related issues, reframing debates); capacity building (for example, through technical/personal skill development).
A coalition of seven third sector organisations, calling themselves ‘Inspiring Impact’, got together in 2012 to develop a programme that will “ensure every pound spent makes the biggest possible difference to beneficiaries.” It aims over the next ten years to:
There can be a significant time lag between making a decision about or acting on a particular policy or practice, idea or research finding, and seeing the evidence that the result (desired or unwanted) has followed as a direct consequence. Evidence about the effects of some decisions and actions can only be seen (or appreciated) years or generations after the decisions or actions were taken. There are very many examples of this, including taxation and benefits policies and their effects on unemployment; using custodial sentences to reduce crime; changing the classification and penalties for using certain illegal drugs to influence drug-taking behaviours; altering land use planning to encourage specific types of urban or rural development; or introducing new curriculum and examination regimes for schools to improve educational attainment.
The attribution of causality in such cases is complicated for two reasons. First, many factors commonly contribute to a (desired or unwanted) change, not all of which can be identified at the time or later, whether or not these can be reliably measured. Second, the time lag introduces new circumstances and conditions, possibly unforeseen originally, which may play a significant part in the subsequent course of events and thus affect the accuracy of attributing cause and effect. The new factors may, for example, conceal previous effects, or reverse them, or multiply them. An example is the claim that the creation of the European Union, a huge social, economic and political initiative, has prevented wars in Western Europe since 1945.
One kind of evidence that many people consider relevant is ‘good practice’ (sometimes ‘best practice’), which often originates elsewhere. Good practice appeals to those who want to avoid re-inventing the wheel locally, to save time, money and effort. They hope that the good practice will equip them to spot potential pitfalls to avoid in advance and identify efficient ways to resolve problems that might well occur. They may also feel more confident about embarking on adopting the good practice because they believe the inventor’s own experience with it has already supplied some kind of test of road worthiness.
Inventors of good practices are very numerous. Individuals and organisations, whether service users or providers, policymakers or practitioners, researchers, teachers or commentators, proudly claim improvements have flowed from their own particular innovations and approaches. If they are keen advocates, articulate enthusiasts, happy to explain what they did and how successful it has been, their proselytising may generate impact points for them too. They may be officially rewarded for spreading the word.
For example, the Northern Rock Foundation was established in 1998 when the Northern Rock Building Society demutualised and became a plc. It has developed its own impact evaluation method and applied it to regional projects it has funded within its ‘Safety and Justice Programme’. It obtained the necessary academic expertise through a Knowledge Transfer Partnership (funded by ESRC and the Technology Strategy Board) with the University of Bristol. It is currently promulgating its approach to others as it believes this will have wide applicability to other trusts and foundations and to those who apply for their funding. It has published a number of resources online and actively presents the information at meetings.
Yet the record of good practice uptake continues to be rather limited. Why is the scale of uptake so much less than it would be if the majority of eligible beneficiaries adopted it? The answer is a mixture of three things: the ‘not invented here’ attitude, opportunity costs, and ownership.
Facts are usually regarded the best kind of evidence. Morgan defines facts as ‘shared pieces of knowledge’ that are ‘short, specific (non-abstract), and reliable. They are non conjectural: they are not hypotheses, theories, fictions, etc. Nor are they matters of mere belief or opinion. They are established according to criteria of evidence existing in a community at a given time. Facts and other kinds of evidence can be used in new contexts.
Some people and organisations consider, for example, that evidence generated by university researchers and published in academic journals has higher trustworthiness and reliability than evidence from, say, newspapers or consultants’ reports, or that randomised controlled trials are the ‘gold standard’ method for generating high–quality research evidence. Whereas some other people take evidence presented in mass media or blogs seriously.
Even where some facts are available, gaps or inconsistencies in a collection of evidence can persist, for example if reliable knowledge has not yet been created or is not yet understood or sufficiently developed, or not yet found, or if access to it is restricted, as some of the examples above illustrate. Other forms of evidence may be all that is available, even if these too contain gaps or inconsistencies. A decision or action may nevertheless be regarded as valuable by some of the interested parties, perhaps ‘better than nothing’, even where the shortcomings are known; whereas others may regard that same policy as unsafe or even damaging. Methodological limitations too can hamper evaluations of what works.
According to Gilligan, a sociologist, “the problem with evidence–based policy is not the existence of evidence, or expertise, but the way in which evidence is employed as a substitute for debate.”Gilligan has examined this in relation to migration policy, arguing that the current state of policy discussion around migration attempts to treat it as an issue of management rather than a matter involving principles or goals, which could and should be publicly debated. He proposes four principles to inform that debate, but he also acknowledges that it is difficult to prescribe actions or arrangements that would ensure the relevant institutions take note and change their ways.
To sum up, evidence is bound to disappoint those who want conclusive proof from it. Evidence alone does not ensure wisdom or deliver something call ‘objectivity’ or ‘the truth’. Evidence alone cannot quickly silence doubts (about climate change and the role of renewable energy sources, for example). Nor does evidence settle once and for all the value of a specific activity or policy (such as support for SMEs to drive economic growth). Evidence is always contingent on context, sources, perceptions and timing. Good evidence may be ignored, bad evidence may be used misleadingly. Knowing all this helps us to use evidence wisely.
This is a paper for discussion.
The author would welcome comments, which should be emailed to: [email protected] or [email protected]
The paper presents the views of the author and these do not necessarily reflect the views of the Alliance for Useful Evidence or its constituent partners.
The Alliance champions the use of evidence in social policy and practice. We are an open–access network of individuals from across government, universities, charities, business and local authorities in the UK and internationally. The Alliance provides a focal point for advancing the evidence agenda, developing a collective voice, whilst aiding collaboration and knowledge sharing, through debate and discussion. We are funded by the BIG Lottery Fund, the Economic and Social Research Council and Nesta. Membership is free. To sign up please visit: www.alliance4usefulevidence.org