How can government make the most of machine learning systems and avoid the pitfalls?
Machine learning algorithms are transforming many aspects of our lives but the biggest changes are yet to come. Greater computing power, larger datasets and improvements to the algorithms are opening up many new opportunities. Personalised medicine, AI assistants and driverless cars are some of the most exciting developments which may soon be possible. Up till now the use of these systems has been fairly limited in government. There is a lot of untapped potential but before this can be realised there are a number of challenges that need to be overcome.
The Government Digital Service (GDS) has been experimenting with different applications such as predicting pageviews to do anomaly detection and so far is focussing on demonstrating the capabilities of machine learning algorithms on a number of products and prototype services. The HMRC have started to employ clustering techniques to segment VAT customers. The UK Government has the opportunity to be a world leader in this area and benefit from the rich academic and industry expertise across the country.
Like many other organisations one of the main challenges is developing internal capacity. There are pockets of ‘data scientists’ around government who have the necessary skills but there is still a large gap to fill. On top of hiring external expertise, there is a great deal of potential internal capacity in the analytical stream of the civil service, particularly in the form of statisticians. There are a few differences in skills between the statistician and data scientist such as knowledge of different coding languages but mostly it’s a cultural difference in the way they approach problems. These skills are easily fostered but creating the right environment is key.
This is why the ONS innovation lab is such an exciting idea. The Innovation Labs were set up as a resource for learning, research and innovation in big data and machine learning tools. The Labs uses open source technologies not available on the standard ONS secure network overcoming the existing barriers infrastructure barriers. This approach has allowed teams to be more experimental and think about new innovative ways of answering questions. Innovation Labs in each government department could be a fantastic way to develop that internal expertise and explore new approaches to public policy issues.
However the issues of skills and understanding doesn’t stop there. The field is moving so quickly everyone needs to keep learning to stay ahead. This is well recognised at Baidu and Andrew Ng is particularly keen on employee development so they can continue to generate the best ideas. They spend a lot of effort to make sure everyone has the space to keep learning and this has formed a big part of their culture. If the UK is to stay ahead it must embed this kind of culture into all areas of data science and machine learning. But this may be difficult for government and the civil service where time is at a premium. Closer ties to both academia and industry where the state of the art research is going on could be made more of. Strong networks with academia and industry have already been built through the development of the data science ethical framework by the cabinet office.
By bringing in external expertise into government and getting them to work on a particular policy problem with civil servants they could develop solutions in the form of specifically designed products or services. This would have an number of benefits; firstly the civil servants could develop their data science and machine learning skills through the project, secondly it would develop the services or products that government needed and lastly the external developers would benefit from creating a product that is likely to be procured by government.
No matter what machine learning algorithm is used, analysis of data will always require some difficult subjective decisions
Finally there is one more important consideration that is often overlooked but is pretty fundamental. Machine learning and data science for that matter is not a silver bullet. Even with good quality unbiased data and a set problem the conclusions drawn by different machine learning approaches can be widely different. This can be enough to determine whether a significant correlation is seen or not. The variation arises in part because algorithms make different assumptions about the function they are modelling (see recent report for more detail) but no matter what machine learning algorithm is used, analysis of data will always require some difficult subjective decisions on which models or variables to use.
So how do you overcome this? Firstly it’s important to recognise that it happens and where those decisions are made. The research paper that highlighted these issues also explored a new methodology to help resolve this variation. The ‘crowdsourcing analytics’ approach began with multiple teams devising their own approach to model the same problem from the same data before coming together to discuss and questioned each other’s approaches. Even after this process of calibration there were variations in the outputs but a much stronger consensus on the answer.
This is a really exciting time and the UK Government is in a key position to become a world leader in this area. Developing and experimenting with approaches like ‘crowdsourcing analytics’ will help build the capacity needed internally and provide invaluable knowledge of methods that work. This could prove to be hugely beneficial as the need for regulation in this area grows. The greater knowledge it has in house and the stronger its external networks the better government regulation is likely to be.
What machine learning is, its capabilities, where it is used and some more of the challenges are discussed in more depth in a recent Nesta report.