neural networks, bias, interactive visualization
Artificial Intelligence (AI) methods are used in increasing range of aspects of our daily lives. As the adoption of these AI applications widens, the need for transparency and accountability becomes more pressing. Governments have started to require companies to be transparent about AI applications with social significance. For example, profil-ing models, a widely used method to model certain aspects of a person [15] (e.g., financial creditworthiness), raised public concerns. Reports show biased profiling models can produce devastating consequences for those it unfairly models, when for example deciding the risk of recidivism for parole [12,28] or which patients receives extra care [18].
Currently, the general public relies on AI experts to discover these biases. However, this approach cannot easily scale up with the rapid adoption rate of AI applications. More important, technical experts may not be fully aware of the needs of the communities about which AI algorithms are making decisions. As a result, there is a recent call to empower non-experts to open the blackbox of AI and better understand its decision making process [4,31,33].
Table 1: The application features in our dataset and used in CEB’s NN.
In this paper, we present an interactive visualization for AI non-experts to explore a semantic Neural Network’s (NN) decisions, in the context of profiling models for loan applications, to reveal potential bias. Among AI methods, NNbased ones are notorious for its low interpretability [36]. Our tool, Counterfactual Examples for Bias (CEB), is designed for non-experts to discover potential biases. It explores the approach of 1) visualizing activation patterns of the NN to increase its interpretability, 2) using counterfactual example to facilitate non-experts to discover biases in the algorithm. By visualizing how counterfactual examples may impact the decision made by an NN, CEB aims to facilitate non-experts to decide if bias is present. To our knowledge, this work is among the first tools that support non-experts to find bias in NN algorithms.
Employing an iterative and human-centered approach, we have built a prototype of CEB and reviewed its design through interviews with AI, HCI, UX, and Sociology experts. In the rest of the paper, we present CEB and the results of our expert panel with six experts. Overall, we found the experts believed CEB would be an intuitive tool for non-experts. Experts’ believed the counterfactual examples highlighted bias while the abstraction of datapoints into clusters allowed users not to be overwhelmed by the sample size.
Bias occurs in a variety of domains such as emotion recognition [17], word embeddings [6, 14], and object classifica-tion [35]. Research addressing algorithmic bias typically alters or supplements algorithms to correct bias [3,5,10,17, 19, 20]. These automated approaches can reduce the unfairness of algorithms; however, they can trade-off accuracy and still do not guarantee complete fairness [20].
We argue that supporting users to detect bias, instead of algorithms, is an alternate approach when automation is not available or feasible. This approach supports the General Data Protection Regulation (GDPR), highlighting people’s right to algorithmic explanations and non-discriminating algorithms1. Research on Interactive Machine Learning (IML) and eXplainable AI (XAI) often designs for algorithmic explanations as well. When designing CEB, we first looked towards these fields’ findings. Research on IMLs has developeded techniques helpful to both experts and non-experts in understanding a Machine Learning (ML) model. Interacting with a ML model in an IML can assist non-experts in learning data requirements of a model and develop more realistic expectations of its capabilities [13]. Research on AI education for non-experts focuses on increasing their understanding of how specific models work to empower their use as a design material [11, 32, 34]. We argue that educating non-experts on the social implications of AI, specifically bias, is another pressing issue that is currently under-researched. Projects in XAI develop various UI explanations to aid experts, and occasionally non-experts, in understanding AI decisions [1,2,36]. UI techniques such as natural language explanations [25] and comparative and normative examples for image classification [7] have been found to help non-experts understand a NN’s decision. Similar to IML, interactive explanations have been found to
increase non-expert’s objective and self-reported understanding of the profiling model but require more of a user’s time [9]. Tools similar to CEB aim to explain ML models to children [16] or game designers [31], but do not focus on potential bias.
We emphasize abstraction and counterfactual examples to facilitate the discovery of NN bias. Abstracting the ML process has been found to assist the understanding of non-experts [11, 33]. Our abstraction is based on reducing and plotting the hidden node activations of a NN; a technique used in tools to reveal the “black-box” of image classifica-tion NNs [8, 24]. We further abstract these activations by clustering them, a technique commonly used in data visualization to improve interpretability [21–23]. Employing counterfactual examples is a technique seen in developing more fair models [20, 27, 30]. CEB focuses on illustrating the potential bias of a NN through counterfactual examples since they have been shown to improve a non-experts understanding of AI concepts [26,30].
Figure 1: Example of the natural language description and score
Figure 2: Example of path score from original Purple Group to flipped Pink Group.
Figure 3: Still frame of animated datapoints switching from original to flipped groups.
We selected a pre-existing loan application dataset2 since this data already suffered from sampling bias (with a disproportionately higher amount of men represented than women). It contains datapoints about features of each load application (Tab. 1) and its outcome (accept or reject). To prepare the training data, we first removed datapoints with missing information, reducing the dataset from 614 to 480. We then randomly divided the data into 2/3rds for training and 1/3rd for testing. The employed NN is a Fully Connected Neural Network (FCNN) with three hidden layers. The network has seven inputs (Tab. 1) and one output neuron (a loan application is recommended for approval if the neuronâ ˘A´Zs output is higher than a threshold of 0.5). The
activation function used for each layer of the network is the ReLu function except for the final output, which employs a sigmoid function. The NN was trained and modified until performance reached an accuracy of 79%. This accuracy is competitive compared to other public models working with the same dataset3.
In order not to overburden a non-expert user’s cognitive load, CEB abstract the dataset into clusters instead of directly visualizing individual datapoints (Fig. 4(b)) 4. To do this, we first reduce the dimensionality of the NN activations to two dimensions with the T-distributed Stochastic Neighbor Embedding (t-SNE) [29] and then cluster the activations with k-means. With activations we mean the output that each neuron in the NN calculates, which is based on applying a non-linear function to the sum of the each neuron’s inputs. It is worth highlighting that we cluster based on the activations instead of the data itself, to gain insights into how similar or dissimilar the NN interprets different groups of applicants. Since NN’s activiation patters are correlated to the decision it makes, our technique of clustering activiations can reveal how the NN treats different applications. For instance, if women with high income are grouped with men with low income based on the NN’s activation pattern, we can expect these two groups to have similar loan acceptance rates and hence investigate potential bias of the NN.
The NN’s output is a score, on a scale of 0-100%, to each application. A score of 50% or higher means the loan application is accepted. Due to the low interpretability of NNs,
we used counterfactual examples to help non-experts com- pare how similar applications may be treated by the algorithm. In particular, we explored the counterfactual examples created by changing one feature of an existing data point in the dataset. This allows users to ask questions such as “If the same application is from someone of a different race or gender, would the NN make the same decision about the loan?” For our early prototype, we used binary gender as our focus. A user can compare clusters of applicants with their equivalent counterfactual examples where the genders of applicants are “flipped.”
Using CEB: Users are guided through four views of CEB: Total, Groups, Compare, and Single. Total (Fig. 4(a)) shows users a summary of all datapoints in the original dataset and their gender breakdown. We breakdown gender since this is the focus of the counterfactual example in this prototype of CEB. Groups (Fig. 4(b)) visualizes these datapoints splitting into clusters that the NN considers similar. Users can hover over the clusters to see a summary of their prototypical datapoints and the cluster’s average score (Fig. 1). The clusters’ y-coordinates correspond with the average score of all applications in the cluster. Compare (Fig. 4(c)) presents the counterfactual example that flips the gender feature of all datapoints in a cluster. This side-by-side view shows the original dataset clusters (seen in the Group view)
and the flipped dataset clusters . Users can compare the clusters’ average scores and see if flipping the gender feature impacts said score or cluster size. Finally, in Single (Fig. 4(d)), users can click on an original cluster and see which cluster its datapoints move to, after the gender feature is flipped. To highight the movement, we animated the datapoints and show how they move from the original to the new clusters (Fig. 3). Users can hover over the original and flipped clusters to read their descriptions, or see datapoint path scores (Fig. 2) by hovering over the arrows to see how many datapoints moved, their genders, and average score.
We invited experts of related domains to interact with a prototype of CEB and to determine if the NN was biased. In the beginning of the interview, we gave experts the false information that there were two versions of the prototype, assigned at random: one that was biased and the other was not. This was done to avoid influencing the experts so that they can make their own judgement if there were bias in the NN or not. Each session was conducted separately and began with a pre-session survey gathering expertise, data literacy, and demographics. Experts were allowed to explore the tool for a maximum of 20 minutes with the think-aloud protocol. After experts were satisfied with their conclusion (if the NN was biased or not), they were directed to a post-
session survey and semi-structured interview asking if they believed their version of CEB’s NN was biased and to provide evidence. Experts were encouraged to go back to the tool to refer to their evidence when speaking about it. Each session was also recorded and transcribed. To analyze the data, researchers reviewed the transcripts and survey data.
A breakdown of our experts is shown in Tab. 2. We included two AI/ML experts in order to verify the scientific accuracy of our tool. The remaining experts we interviewed belong to our target user group - non-AI experts (non-experts in short). Overall, CEB was well received by the experts. Experts commented on how this tool would help users to get a quick intuition on if bias was present. “It is a good visualization... it helps create intuitions in your head. Now you actually want to test those intuitions, right?” [E4] Experts also commented on wanting more tools embedded in the visualization to analyze what other features may influence the NN’s score. All experts were able to identify bias through the counterfactual example. This identification was made easier by the abstraction of datapoints into clusters. However, how the datapoints were clustered, based on activations or features, confused some experts.
Table 2: Expert reference numbers, position, and expertise.
Counterfactual Example: Identifying bias through the comparison of the original and flipped clusters was facilitated using the y-coordinates of the clusters. All experts were able to isolate the change in the clusters’ scores and conclude the presence of bias. Experts commented that the design choice to see NN scores go up or down on this axis was intuitive and provided jumping-off points to build a hypothesis for further exploration. Experts who skipped through first views and quickly went to the Single view reported a better mental model of the redistribution of the datapoints from the original to the flipped clusters [E3, E5,
E6] than those who spent more time on the first views [E1, E2, E4]. These latter experts were confused about whether the datapoints stayed in their original clusters with their feature flipped or the datapoints flipped and moved to different clusters. Seeing the Single view’s animation of the original datapoints being redistributed into the counterfactual clusters assisted experts in this understanding. Experts who spent more time on the first views without the animation of datapoints were unclear on what differences the counterfactual example presented. The confusion was resolved for all experts in the Single view.
After using the y-axis to hypothesize bias, experts would rely on the clusters’ score from the NN as more concrete evidence of bias. Second to this, experts relied on the path scores (Fig. 2) to isolate the specific score changes for men and women in these clusters. Unfortunately, the path scores were not noticed by all experts immediately [E2] or ever [E6] since the scores only appeared when hovering over the arrows between clusters in the Single view. For the experts who did find it, they heavily relied on the scores as evidence of bias as well. The counterfactual clusters showed the averaged score of reclustered datapoints. E4 commented that the path scores allowed users to see more specific scores of the datapoints being reclustered to isolate bias. For example, the path scores allow a user to see a group of men being flipped to women and then seeing their original score was higher as men than as women.
Abstraction: Experts found abstraction individual datapoints to activation clusters necessary in exploring and comparing the number of datapoints. However, since clustering features is a common approach, some experts were confused about how the clusters were formed. Experts with an AI background [E4, E5] were more likely to identify the clusters were based on the NN’s activations. Experts with-
out this expertise took longer to identify how these clusters were formed. E1 desired more explanation on why these
were the most prominent clusters and wanted more context on how they were made.
This issue was exacerbated by the natural language descriptions highlighting the cluster’s average datapoint (Fig. 1). This natural language was an important handle for experts to refer to the clusters to compare them. However, since the descriptions refer to features, this strengthened the confusion on whether clustering was based on activations or features. It is unclear if non-experts without this exposure to automated clustering would experience this same confusion. E1 suggested to add more explanation as to why these features were selected to demonstrate their impact on the cluster formation, if any.
Other Comments: Experts did enjoy the design and UX of CEB and felt non-AI-experts would find it engaging and not overwhelming. Experts felt building tools such as this were crucial and "highly necessary...to work on explainability of neural networks and also to make tools for understanding bias." [E5] E5 commented that the UX felt like a guided exploration mimicking working with data. Experts provided several other comments on CEB as well. E4 pointed out that CEB does not inform users from where users the bias comes. For example, if the bias comes from the dataset or the NN’s model. A majority of experts [E1, E4, E5, E6] requested control over what feature the counterfactual example presented to explore other biases.
Overall, we found experts believed our tool – using abstraction and counterfactual examples – was a feasible approach to assist non-experts in detecting biased algorithms. The next step in our iterative design process is to
revise CEB’s design and evaluate it with non-experts. To strengthen CEB, we will clarify 1) how clusters are formed, and 2) how datapoints are flipped and redistributed. First, to clarifying clusters, CEB can further emphasize that the clusters are based on what datapoints the NN “sees” as being similar (activation). Tools visualizing activations are typically in the image classification domain and leverage this visual component to convey similarity [8,24] (e.g., users can see an image of a cat looking similar to a small dog). Our semantic domain does not have the same advantage of being inherently visual. CEB can instead rely on metaphors of the NN “believing” datapoints are similar to help overcome this. The features can be presented as what the NN was trained with to come to these beliefs. Animation can be leveraged to demonstrate grouping datapoints based on this “belief” when introducing the clusters. Second, to clarify datapoint distribution, the animation seen in the Single view can be used in the Compare view to build in the counterfactual clusters and show how the clusters are formed. Lastly, we believe that the clusters can include more visual information to help users understand the characteristics of the datapoints it entails. For example, similar to Ma et al. [23], the clusters’ design can embed pie or radar charts to show a high-level distribution of datapoints’ values across the features. This approach could also help lessen the negative impact of the natural language descriptions.
In summary, we presented our approach for facilitating non-AI-experts to discover diases in NN through our tool CEB. It attempts to do so by counterfactual examples and abstraction through clustering NN’s activations. Current limitations are that it only presents one biased dataset, one feature change, and a relatively small dataset. Future versions of CEB will present both bias and non-biased datasets. Important future work also includes studying how to guide non-experts’ exploration of large datasets with more features.
[1] Ashraf Abdul, Jo Vermeulen, Danding Wang, Brian Y. Lim, and Mohan Kankanhalli. 2018. Trends and Trajectories for Explainable, Accountable and Intelligible Systems. (2018), 1–18.
[2] Amina Adadi and Mohammed Berrada. 2018. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 6 (2018), 52138–52160.
[3] Mohsan Alvi, Andrew Zisserman, and Christoffer Nellåker. 2019. Turning a blind eye: Explicit removal of biases and variation from deep neural network embeddings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11129 LNCS (2019), 556–572.
[4] Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the People: The Role of Humans in Interactive Machine Learning. AI Magazine 35, 4 (12 2014), 105.
[5] Alexander Amini, Ava Soleimany, Wilko Schwarting, Sangeeta Bhatia, and Daniela Rus. 2019. Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure. Aaai (2019).
[6] Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. Debiasing Word Embedding. 30th Conference on Neural Information Processing Systems NIPS 2016 (2016), 1–9.
[7] Carrie J Cai, Jonas Jongejan, and Jess Holbrook. 2019. The effects of example-based explanations in a machine learning interface. In Proceedings of the 24th
International Conference on Intelligent User Interfaces - IUI ’19. 258–262.
[8] Shan Carter, Zan Armstrong, Ludwig Schubert, Ian Johnson, and Chris Olah. 2019. Exploring Neural Networks with Activation Atlases. (2019).
[9] Hao-Fei Cheng, Ruotong Wang, Zheng Zhang, Fiona O’Connell, Terrance Gray, F. Maxwell Harper, and Haiyi Zhu. 2019. Explaining Decision-Making Algorithms through UI: Strategies to Help Non-Expert Stakeholders. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems - CHI ’19 (2019), 1–12.
[10] Abhijit Das, Antitza Dantcheva, and Francois Bremond. 2019. Mitigating bias in gender, age and ethnicity classification: A multi-task convolution neural network approach. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11129 LNCS (2019), 573–585.
[11] Graham Dove, Kim Halskov, Jodi Forlizzi, and John Zimmerman. 2017. UX Design Innovation: Challenges for Working with Machine Learning as a Design Material. CHI ’17 Proceedings of the 2017 annual conference on Human factors in computing systems (2017), 278–288.
[12] Julia Dressel and Hany Farid. 2018. The accuracy, fairness, and limits of predicting recidivism. Science Advances 4, 1 (2018), 1–6.
[13] Rebecca Fiebrink, Perry R. Cook, and Dan Trueman. 2011. Human model evaluation in interactive supervised learning. (2011), 147.
[14] Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and James Zou. 2018. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences of the United States of America 115, 16 (2018), E3635–E3644.
[15] Bryce Goodman and Seth Flaxman. 2016. European Union regulations on algorithmic decision-making and a "right to explanation". (2016), 1–9.
[16] Tom Hitron, Yoav Orlev, Iddo Wald, Ariel Shamir, Hadas Erel, and Oren Zuckerman. 2019. Can Children Understand Machine Learning Concepts? The Effect of Uncovering Black Boxes. Proceedings of ACM Conference on Human Factors in Computing Systems - CHI ’19 (2019).
[17] Ayanna Howard, Cha Zhang, and Eric Horvitz. 2017. Addressing bias in machine learning algorithms: A pilot study on emotion recognition for intelligent systems. Proceedings of IEEE Workshop on Advanced Robotics and its Social Impacts, ARSO (2017).
[18] CY Johnson. 2019. Racial bias in a medical algorithm favors white patients over sicker black patients. (2019).
[19] Byungju Kim, Hyunwoo Kim, Kyungsu Kim, Sungjin Kim, and Junmo Kim. 2018. Learning Not to Learn: Training Deep Neural Networks with Biased Data. (2018), 9012–9020.
[20] Matt Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. Advances in Neural Information Processing Systems 2017-Decem, Nips (2017), 4067–4077.
[21] Hongsen Liao, Yingcai Wu, Li Chen, and Wei Chen. 2018. Cluster-Based Visual Abstraction for Multivariate Scatterplots. IEEE Transactions on Visualization and Computer Graphics 24, 9 (2018), 2531–2545.
[22] Yuxin Ma, Anthony K.H. Tung, Wei Wang, Xiang Gao, Zhigeng Pan, and Wei Chen. 2018. ScatterNet: A Deep Subjective Similarity Model for Visual Analysis of Scatterplots. IEEE Transactions on Visualization and Computer Graphics 14, 8 (2018), 1–14.
[23] Yuxin Ma, Tiankai Xie, Jundong Li, and Ross Maciejewski. 2019. Explaining Vulnerabilities to Adversarial Machine Learning through Visual Analytics. IEEE Transactions on Visualization and Computer Graphics (2019), 1–1.
[24] Chris Olah, Arvind Satyanarayan, Ian Johnson, Shan Carter, Ludwig Schubert, Katherine Ye, and Alexander Mordvintsev. 2018. The Building Blocks of Interpretability. Distill 3, 3 (3 2018).
[25] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. (2016).
[26] Mireia Ribera and Agata Lapedriza. 2019. Can we do better explanations? A proposal of user-centered explainable AI. CEUR Workshop Proceedings 2327 (2019).
[27] Kacper Sokol and Peter A Flach. 2019. Counterfactual Explanations of Machine Learning Predictions: Opportunities and Challenges for AI Safety.. In SafeAI@ AAAI.
[28] Sarah Tan, Rich Caruana, Giles Hooker, and Yin Lou. 2017. Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation. 1 (2017).
[29] Laurens Van Der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Technical Report. 2579–2605 pages.
[30] Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR. Ssrn (2017), 1–52.
[31] Jiachi Xie, Chelsea M. Myers, and Jichen Zhu. 2019. Interactive Visualizer to Facilitate Game Designers in Understanding Machine Learning. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems - CHI EA ’19. ACM Press, New York, New York, USA, 1–6.
[32] Qian Yang. 2017. The Role of Design in Creating Machine-Learning-Enhanced User Experience. The AAAI 2017 Spring Symposium on Designing the User Experience of Machine Learning Systems Technical Report SS-17-04 March (2017), 406–411.
[33] Qian Yang. 2018. Machine Learning as a UX Design Material: How Can We Imagine Beyond Automation, Recommenders, and Reminders? 2018 AAAI Spring Symposium Series March (2018).
[34] Qian Yang, Jina Suh, Nan-Chen Chen, and Gonzalo Ramos. 2018. Grounding Interactive Machine Learning Tool Design in How Non-Experts Actually Build Models. Proceedings of the 2018 on Designing Interactive Systems Conference 2018 - DIS ’18 March (2018), 573–584.
[35] Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai Wei Chang. 2017. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings (2017), 2979–2989.
[36] Jichen Zhu, Antonios Liapis, Sebastian Risi, Rafael Bidarra, and G. Michael Youngblood. 2018. Explainable AI for Designers: A Human-Centered Perspective on Mixed-Initiative Co-Creation. IEEE Conference on Computatonal Intelligence and Games, CIG 2018-Augus (2018).