A number of methods of removing bias from datasets have been devised, however they generally fall short at removing non-linear, non-binary and/or multivariate biases. To address the problems of the current methods this paper introduces a novel concept of Fair Adversarial Networks [FANs]. Like Generative Adversarial Networks [GANs] [9] this method consists of two networks with different objectives trained iteratively. FANs are a system that creates an unbiased version of a dataset that can subsequently be used with any analytical tools. There are two main components to this system 1) an autoencoder [16] function ) that provides y a reconstruction of data x given autoencoder weights
that provides estimate ˆr of the true protected characteristics ¯r (race in this example) from y.
The measure of bias we wish to minimizeis the true performance, such as the cross-entropy function D, of the Racist Network after full training ¯D given the autoencoder weights at given epoch
problem of this approach is the complexity of such a bias measure. The
Figure 1: Architecture of a Fair Adversarial Network. Two networks are trained iteratively: the Racist network minimizes error on predicting race (or other protected characteristic of interest) while the autoencoder jointly minimizes its reconstruction error and the predictive capability of the Racist network.
measure of bias is required at each epoch on which the autoencoder is trained, but it is not easy to obtain. Unlike the discriminator accuracy in GANs, the only way find the bias of encoded data is to fully train the racist network - to measure the true potential of the data to reveal the protected characteristic. Without full training, there is no guarantee that a failure of the racist network to predict the protected characteristic (detect bias) is not simply due to recent changes in the encoding that the racist network has not been able to adapt to yet. Therefore, to find the bias of the data exactly, we would have to perform lengthy full-training of the racist network on each epoch of the autoencoder training, clearly making the algorithm impractical. Furthermore, a guarantee of optimality for the network hyperparameters would be needed.
We have developed a method for approximating the fully-trained performance from a single forward pass through the network ˆwhich is the core of the autoencoder loss function specified by the equation 1. Further developments to this measure that are required include a general mechanism to stabilise the adversarial training and a number of regularisers R that ensure quality of the final data encoding. While these developments cannot be shared publicly as they are core to the intellectual property of illumr Ltd., standard approaches are enough to replicate debiasing process on a one-off basis given sufficient hyperparameter tuning.
Formally, autoencoder minimizes loss function
where MSE(y, x) is the reconstruction error (Mean Square Error) of the autoencoder, the Racist Network weights, and c is a constant balancing the individual terms of loss function. The Racist Network optimizes the appropriate loss function such as
which is a cross entropy function of two vectors ¯tioned before this cost function (D) is different to the estimate of the performance of the Racist Network as minimized by the autoencoder ( ˆD). This is because D is a bad approximation of performance after full training ¯D.
This system of two networks is trained in an iterative adversarial fashion similar to GANs:
Given the success of this training procedure, the end result should be a dataset that is as similar to the original as possible while it should be harder/impossible to detect the undesirable protected characteristic. However, GANs are notoriously hard to train [13].
1.1 Convergence
Adversarial training often leads to very unstable or even run-away behaviour [13]. Here we demonstrate acceptable convergence of our algorithm on five real-world datasets. While the convergence may seem still fairly sub-optimal, we argue that it is sufficiently good for our purpose. Crucially, we implement a ratchet mechanism which always preserves the state of the network with the lowest , therefore run-away behaviour after a period of convergence is not particularly problematic.
While, we optimize the loss functions , they are not of interest for our purposes. Instead what we truly aim to achieve is to bring the predictive performance of the racist network after full training ¯
) down to random. For this task we operationalized ¯
) as the best performance of the Racist Network on the validation data from 3 random initializations, and subsequent 10,000 epochs of training. The Racist Network used had a single hidden layer of the same width as the dataset. The random benchmark we are using is ¯
) i.e. always picking the most likely category. The other value of importance to us is MSE(x, y) as we aim to scramble the data as little as possible. Therefore these will be the focus of the analysis of convergence.
Further interesting values to observe include ), the loss function of the racist network, but this does not provide a good indication of ¯
), and therefore is fundamentally unsuitable to be a part of
) is also of interest which while being noisy provides good gradients for training as a part of
Figures 2 to 6 clearly show that the ¯) has decreased to random performance and also an orderly behaviour of MSE(x, y) correctly approaching minimum distortion of the data. The actual impact on Data Analytical outcomes will be discussed in another paper that is currently in preparation.
Figure 2: The Absenteeism at Work data includes personal information and total absenteeism time over 3 years for employees at a courier company in Brazil. We have selected age as the undesirable protected characteristic and successfully removed it. Source: UCI Machine Learning Database. Creators: Andrea Martiniano, Ricardo Pinto Ferreira, and Renato Jose Sassi. ¯D values denote proportion of correct predictions of race, while all other values are arbitrarily scaled.
Figure 3: Performance data from schools in New York. The bias removed was a variable called ’Majority Black/Hispanic’. Source: Kaggle. Creators: PASSNYC. ¯D values denote proportion of correct predictions of race, while all other values are arbitrarily scaled.
Figure 4: The Heart Disease Dataset consists of blood measurements from a set of patients with and without heart disease. The bias variable removed was ’Sex’. Source: UCI Machine Learning Database. Creators: Hungarian Institute of Cardiology, University Hospital Zurich, University Hospital Basel, V.A. Medical Center Long Beach and Cleveland Clinic Foundation. ¯D values denote proportion of correct predictions of gender, while all other values are arbitrarily scaled.
Figure 5: The COMPAS Dataset consists of profiles of criminals from Broward County, Florida. The bias variable removed was ’Race’. Source: Kaggle. Creators: ProPublica. ¯D values denote proportion of correct predictions of race, while all other values are arbitrarily scaled.
Figure 6: The Communities and Crime Dataset includes statistics on various communities i ¯D values denote proportion of correct predictions of race, while all other values are arbitrarily scaled.n the U.S. Bias variable removed was ’Majority Black’, indicating whether the community has a majority black population. Source: UCI Machine Learning Database. Creators: Michael Redmond.
It is apparent that the outcomes of Data Analytics [DA] and Machine Learning [ML] are often perpetuating the human bias present in the datasets and therefore enacting illegal discrimination. Constraining the DA/ML outcomes to be fair is problematic as there is no universally accepted definition of fairness while at the same time many of the notions of fairness are very hard to implement, disrupting DA pipelines and putting a significant extra load on DA resources. One apparent solution is to remove bias from the data before proceeding with DA as usual; however, these methods generally, cannot account for non-linear, non-binary and/or multivariate relationships between race (or other biasing factor) and the rest of the data [3]. This paper introduced Fair Adversarial Networks [FANs] as a method that compensates for these shortcomings and provides a very significant improvement in both fairness and ease of use.
There is no universally accepted, or even legally binding, notion of fairness that can be used for optimisation, while at the same time many definitions of fairness are mutually exclusive. Unless a definition of fairness is provided by regulatory bodies it seems unlikely optimising for parity between groups on a fairness measure can be a useful bias-removal approach. Even if a mathematical notion of fairness becomes agreed there is no guarantee that optimizing for parity on a training set achieves parity on the population-level.
Furthermore, the need to optimise for fairness introduces an extra term into the cost function of any optimisation procedure, which is not compatible with current DA tools. Data analysts are currently not required to write their own loss functions or optimisation procedures, therefore including such a requirement would damage their ability to perform their jobs. Even if the data analysts become comfortable with this requirement, the huge time overhead of this task makes it unlikely it will be performed in practise.
Removing information about protected characteristics from the data is an attractive alternative. It’s philosophically very simple -without knowledge of membership to a protected group (such as race) it should be impossible to discriminate based on it - therefore it removes the subjective nature of treating bias. It can also be made very simple, a single preprocessing step can remove bias from the data while the rest of DA/ML pipeline can remain exactly the same.
However many methods removing bias from data fail. Removing the column containing the protected characteristic is clearly insuffi-cient due to the presence of proxy variables. A number of methods go beyond removing the column containing the protected characteristic and attempt to de-correlate the other characteristics from the protected ones. However, these approaches generally cannot account for non-linear, non-binary, and/or multivariate relationships between the characteristics [3]. To counter these problems this paper has introduced FANs.
FANs are a version of adversarial networks with two main components 1) an autoencoder that encodes a fair representation of data, and a Racist Network which is the adversary predicting the protected characteristics (e.g. race) from the data, the performance of which needs to be minimised.
The autoencoder’s cost function consist of reconstruction error of the transformed data, and also the performance of the racist network, both of which are to be minimised. The Racist Network simply tries to achieve the best predictive performance on the protected characteristic, using the autoencoder’s output as its input. This system produces a data representation that is most similar to the original data, but at the same time from which the protected characteristics cannot (or at least are harder) be predicted. Any analytical methods can be subsequently used with such a representation.
This paper describes the principles of FANs and demonstrates on five real-world, disparate datasets that FANs can indeed achieve their goal of removing the ability to predict the protected characteristic from the data, while minimising the difference between the original data and its fair reconstruction.
The problematic part of training FANs is that we aim to remove possibility to predict protected characteristic from reconstructed data. Possibility to predict implies full training, not just the current state of the adversarial process. The success of our algorithm across five disparate datasets crucially relies on our approximation of full-training performance of a neural network from a single forward pass. While this approximation will remain our trade secret, it is possible to replicate our success on a one-off basis using conventional approaches and heavy parameter-tuning.
This paper has demonstrated that FANs can consistently succeed at removing bias from datasets while keeping the necessary alterations of the data to the minimum. FANs are particularly valuable because they can be used as a generic and easy to use data pre-processing step, allowing all Data Analysts to account for biases in their datasets without significant overheads.
Limitations and Future Directions
It is necessary to mention that under special circumstances FANs have the potential to make things worse for discriminated groups. FANs will remove all kinds of discrimination including the positive one, which might be a desirable way of breaking-up vicious cycles of deprivation in some areas.
Optimising for two metrics at the same time is a compromise. The present paper has not attempted to analyse the residual discrimination which, while statistically insignificant, is likely still present. On the other hand the statistical insignificance can be seen as the criterion for success. Either way it is apparent that FANs provide a step in the right direction in respect to increasing the fairness.
Lastly, it is unclear what the correct time to stop the training of Neural Networks is, and what the right hyper-parameters are. It is certain that our neural architecture, neither hyper-parameter choice is optimal. Especially with GANs, one wishes to have interactive, optimal control of hyper-parameters throughout training to stabilize the process and ensure convergence. Therefore we are now exploring Reinforcement Learning as a method to control these factors interactively throughout training.
[1] Solon Barocas and Andrew D Selbst. Big data’s disparate impact. Cal. L. Rev., 104:671, 2016.
[2] Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. Fairness in criminal justice risk assessments: the state of the art. arXiv preprint arXiv:1703.09207, 2017.
[3] Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in Neural Information Processing Systems, pages 4349– 4357, 2016.
[4] Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. Seman- tics derived automatically from language corpora contain humanlike biases. Science, 356(6334):183–186, 2017.
[5] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153–163, 2017.
[6] Equal Employment Opportunity Commission et al. Uniform guidelines on employee selection procedures. Fed Register, 1:216– 243, 1990.
[7] Virginia Eubanks. Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin’s Press, 2018.
[8] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Schei- degger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 259–268. ACM, 2015.
[9] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
[10] Sara Hajian, Josep Domingo-Ferrer, and Antoni MartinezBalleste. Discrimination prevention in data mining for intrusion and crime detection. In Computational Intelligence in Cyber Security (CICS), 2011 IEEE Symposium on, pages 47–54. IEEE, 2011.
[11] Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of oppor- tunity in supervised learning. In Advances in neural information processing systems, pages 3315–3323, 2016.
[12] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 35–50. Springer, 2012.
[13] Naveen Kodali, Jacob Abernethy, James Hays, and Zsolt Kira. On convergence and stability of gans. arXiv preprint arXiv:1705.07215, 2017.
[14] Cathy O’Neil. Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books, 2016.
[15] Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. Discrimination-aware data mining. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 560–568. ACM, 2008.
[16] Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot, and Yoshua Bengio. Contractive auto-encoders: Explicit invariance during feature extraction. In Proceedings of the 28th International Conference on International Conference on Machine Learning, pages 833–840. Omnipress, 2011.
[17] Margery Austin Turner. Mortgage lending discrimination: A re- view of existing evidence. 1999.
[18] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In International Conference on Machine Learning, pages 325–333, 2013.
[19] James Zou and Londa Schiebinger. Ai can be sexist and racist—it’s time to make it fair, 2018.