Designing trustworthy machine learning systems
The journey towards understanding user trust and how to design trustworthy systems
Many methods have been developed to promote fairness, transparency, and accountability in the predictions made by artificial intelligence (AI) and machine learning (ML) systems. A technical approach is often the focal point of these methodologies, however, to develop truly ethical machine learning systems that can be trusted by their users, it is important to supplement this with a human-centred approach.
A deep understanding of the people, contexts and environments connected to these systems can provide the foundations for designing and presenting information in such a way that the people that use the systems can make balanced decisions on whether or not they should trust them.
In the complex and very technical world of machine learning, User Experience Designers within the Product & Design group and a Social Scientist in the Responsible Innovation Future Science Platform at CSIRO’s Data61 established the key considerations for taking a human-centred approach to building machine learning systems.
Here, Product & Design Lead Georgina Ibarra explores the process they took to understand the components of trust and the development of best practice guidelines for designing trustworthy machine learning systems using the example of a recent project involving Australian law enforcement agencies.
Connecting technology with the people that will use it
As a product and design team working for CSIRO’s Data61, we take a human–centred design approach to developing technology. This means researching the users of the technology we are creating to ensure we understand their context – their problems, needs and jobs to be done. We collaborate with social scientists to further examine the social identities and dynamics associated with these environments, contexts and people, together establishing the key insights needed to create an inclusive system.
This process lifts our focus from deliverables, enabling us to consider the social and ethical consequences of the technology we are creating. We consider it our responsibility to create responsible technology.
Partnered with Australian law enforcement agencies, our brief was to extend agencies’ graph data capabilities and the maturity of their graph data infrastructure by developing a graph analytics software system driven by machine learning.
High security restrictions meant that working with and talking to experts in law enforcement had many limitations, restricting the teams’ ability to piece together a full picture of what we needed to solve with complete context.
What we could establish is that domain and data experts building criminal investigation cases use data to create intelligence insights with the goal of identifying connections and patterns within the data that can form the basis of further action by investigators.
We also learned that many of their data-assisted decisions have high consequences.
A human centred approach to user trust
Important frameworks have been established in recent years that outline the risks and weaknesses of artificial intelligence and machine learning, which has contributed to the dialogue around the ethical implications of this technology.
The user experience (UX) component of this case study takes a domain-focused and human-centred design approach to facilitating human interaction with a specific machine learning system.
This enables the dissection of critical components that a particular user needs to build their trust in the model outcomes and ultimately the trustworthiness of a particular system.
The Product and Design team at Data61 work closely with stakeholders, users and beneficiaries of the system to design these components so they can be interpreted, evaluated and trusted by the people that use the system.
This lens is what led us to our key research question: For experts making high-risk and high-consequence decisions, what are the conditions required for them to accept and use predicted data from a machine learning model to assist decision-making in their work?
Exploring the social identities and dynamics of our users
To build on our knowledge regarding the social context of the humans that needed to interact with our machine learning system for this project, we collaborated with the Responsible Innovation Future Science Platform to explore concepts in machine learning, trust in automation and criminal investigation. This culminated in the recently published report “Machine Learning and Responsibility in Criminal Investigations”.
The report explains the responsibilities of criminal investigators and how these responsibilities could be affected when using ML. It identifies the factors that influence the level of trust users place in automated systems and how this trust can be influenced by different social and environmental factors.
Through a combination of user experience and social science approaches, we examined how the level of trust investigators place in the findings of ML systems can be calibrated to reflect the trustworthiness of those systems.
Three interwoven concepts of trust
The concept of trust, trustworthiness, and calibrated trust is crucial in the human use of ML systems for high-risk decision making. These concepts of trust are interdependent, making it vital to clarify their terms:
- User Trust – ‘I trust the system to perform this goal.’
This trust is embodied by the user and can be triggered by multiple factors, such as their general willingness to trust automation, training and experience, situational factors such as their mood, workload, and the complexity of the task.
- System Trustworthiness – ‘The system is trustworthy enough to perform this function.’
The onus is on the system to demonstrate how worthy it is of a user’s trust, usually for a particular function in a specific context.
- Calibrated Trust – ’The user’s trust is calibrated to the trustworthiness of the system.’
Ideally, the user’s trust in an ML system corresponds with the system’s capabilities. Regardless of how trustworthy the system is, the user is able to make a judgement on the best use of its predictions.
Like any good design challenge, the issue of trust in machine learning is much easier to comprehend when it is in context. Who needs to trust the outcomes from the machine learning system and why do they need to trust it? What do they need to do with the information, how well can they evaluate it, and what goal are they seeking to achieve by using it?
The data users
There are two primary users that employ data to understand networks as part of criminal investigations:
- Intelligence analysts are experts in understanding business requirements of the data, but are non-technical and rely on data scientists to use big data.
- Data scientists are experts in creating data solutions but rely on domain experts for business requirements.
The main reason these people use data is to build intelligence that can influence investigations and ultimately lead to prosecution. Many data assisted decisions have high consequences, making the trustworthiness of the system critical for user trust.
There is another, more secondary user who we call the ‘intermediary, usually a project manager or business analyst managing data projects and processes, often championing new tools and methodologies.
The data ecosystem
The following diagram is a visual representation of where data commonly originates in criminal investigations, the transformation it underwent when applied to our machine learning system, and some of the pathways that may result from the data insights produced by the system.
By overlaying the data ecosystem with some of the issues that surfaced during our research (eg, the information needed to accurately communicate the prediction data, the need for machine learning literacy in the industry’s workforce, etc), these mappings provided us with a way to promote the dialogue around developing user trust, system trustworthiness and calibrated trust.
They also provided a starting point for understanding who is responsible for each part of the system and the role they play in mitigating challenges that arise.
To understand more about trust in machine learning, a literature review was undertaken to explore the methods and practices currently in use to build trust in machine learning algorithms. Written by The Partnership on AI, “Report on Algorithmic Risk Assessment Tools in the U.S. Criminal Justice System” provided a robust assessment of the ‘paradigmatic example of the potential social and ethical consequences of automated AI decision-making.’ It highlights three key challenges:
- The ability to evaluate the accuracy, bias and validity of ML tools.
- Accepted standards in governance, transparency and accountability.
- The interface between the tools and the humans that interact with them.
These challenge areas reflect many of the important issues surfaced in our review of other literature, which are detailed in the report mentioned earlier in this article. As user researchers and designers focused on the interaction component of ML systems, the third challenge area was our cornerstone. However, it’s important to note that considerations must be made across all these areas for technology teams building ML systems that are safe for production deployment.
Guiding best practice
After completing our research on visualising predicted data and our users’ needs, context, and social dynamics, the team intended to dive straight into designing data scenarios and visualisations to test how well users understood and trusted the properties of predicted data in different situations.
But our research uncovered a gap; best practice did not yet appear to have been established. As a result, the team designed a set of practical guidelines (see footnote) to enable informed decisions at each stage of the design and development process, and ensure a human centred approach was always front of mind.
Unfortunately, high security restrictions prevented the application of our guidelines to designing realistic scenarios with real data used for criminal investigations.
During this project, we progressed our knowledge of this complex area significantly, developing our understanding of key issues and considerations in this emerging space of designing for trust.
It is now time for the team at Data61 to put this knowledge to the test with our next research project, and to apply this knowledge in practice. We’ll be constructing realistic scenarios using data related to a specific domain, designing the critical components needed for users of a machine learning system to build their trust in the model outcomes and ultimately the trustworthiness of a particular system.
Stay tuned for our next blog post where we will share the guidelines we developed: “Guidelines for the Creation of AI Tools for Humans,” followed by a case study for how we have applied the guidelines in practice
CSIRO’s Data61 Investigative Analytics program was a three year technology program funded by the Australian Government’s Modernisation Fund with the mission of “increasing the graph analytics capabilities of Australia”. Over the course of the program an interdisciplinary team of data scientists, engineers, user experience (UX) designers and product managers developed a graph analytics software system driven by machine learning