Measuring Algorithmic Fairness

Article — Volume 106, Issue 4

106 Va. L. Rev. 811
Download PDF
*D. Lurton Massee, Jr. Professor of Law and Roy L. and Rosamond Woodruff Morgan Professor of Law at the University of Virginia School of Law. I would like to thank Charles Barzun, Aloni Cohen, Aziz Huq, Kim Ferzan, Niko Kolodny, Sandy Mayson, Tom Nachbar, Richard Schragger, Andrew Selbst, and the participants in the Caltech 10th Workshop in Decisions, Games, and Logic: Ethics, Statistics, and Fair AI, the Dartmouth Law and Philosophy Workshop, and the computer science department at UVA for comments and critique. In addition, I would like to thank Kristin Glover of the University of Virginia Law Library and Judy Baho for their excellent research assistance. Any errors or confusions are my own.Show More

Algorithmic decision making is both increasingly common and increasingly controversial. Critics worry that algorithmic tools are not transparent, accountable, or fair. Assessing the fairness of these tools has been especially fraught as it requires that we agree about what fairness is and what it requires. Unfortunately, we do not. The technological literature is now littered with a multitude of measures, each purporting to assess fairness along some dimension. Two types of measures stand out. According to one, algorithmic fairness requires that the score an algorithm produces should be equally accurate for members of legally protected groups—blacks and whites, for example. According to the other, algorithmic fairness requires that the algorithm produce the same percentage of false positives or false negatives for each of the groups at issue. Unfortunately, there is often no way to achieve parity in both these dimensions. This fact has led to a pressing question. Which type of measure should we prioritize and why?

This Article makes three contributions to the debate about how best to measure algorithmic fairness: one conceptual, one normative, and one legal. Equal predictive accuracy ensures that a score means the same thing for each group at issue. As such, it relates to what one ought to believe about a scored individual. Because questions of fairness usually relate to action, not belief, this measure is ill-suited as a measure of fairness. This is the Article’s conceptual contribution. Second, this Article argues that parity in the ratio of false positives to false negatives is a normatively significant measure. While a lack of parity in this dimension is not constitutive of unfairness, this measure provides important reasons to suspect that unfairness exists. This is the Article’s normative contribution. Interestingly, improving the accuracy of algorithms overall will lessen this unfairness. Unfortunately, a common assumption that anti-discrimination law prohibits the use of racial and other protected classifications in all contexts is inhibiting those who design algorithms from making them as fair and accurate as possible. This Article’s third contribution is to show that the law poses less of a barrier than many assume.

Introduction

At an event celebrating Martin Luther King, Jr. Day, Representative Alexandria Ocasio-Cortez (D-NY) expressed the concern, shared by many, that algorithmic decision making is biased. “Algorithms are still made by human beings, and those algorithms are still pegged to basic human assumptions,” she asserted. “They’re just automated. And if you don’t fix the bias, then you are automating the bias.”1.Blackout for Human Rights, MLK Now 2019, Riverside Church in the City of N.Y. (Jan. 21, 2019), https://www.trcnyc.org/mlknow2019/ [https://perma.cc/L45Q-SN9T] (interview with Rep. Ocasio-Cortez begins at approximately minute 16, and comments regarding algorithms begin at approximately minute 40); see also Danny Li, AOC Is Right: Algorithms Will Always Be Biased as Long as There’s Systemic Racism in This Country, Slate (Feb. 1, 2019, 3:47 PM), https://slate.com/news-and-politics/2019/02/aoc-algorithms-racist-bias.html [https://perma.cc/S97Z-UH2U] (quoting Ocasio-Cortez’s comments at the event in New York); Cat Zakrzewski, The Technology 202: Alexandria Ocasio-Cortez Is Using Her Social Media Clout To Tackle Bias in Algorithms, Wash. Post: PowerPost (Jan. 28, 2019), https://www.washingtonpost.com/news/powerpost/paloma/the-technology-202/2019/01/28 /the-technology-202-alexandria-ocasio-cortez-is-using-her-social-media-clout-to-tackle-bias-in-algorithms/5c4dfa9b1b326b29c37­78cdd/?utm_term=.541cd0827a23 [https://perma.cc/ LL4Y-FWDK] (discussing Ocasio-Cortez’s comments and reactions to them).Show More The audience inside the room applauded. Outside the room, the reaction was more mixed. “Socialist Rep. Alexandria Ocasio-Cortez . . . claims that algorithms, which are driven by math, are racist,” tweeted a writer for the Daily Wire.2.Ryan Saavedra (@RealSaavedra), Twitter (Jan. 22, 2019, 12:27 AM), https://twitter.com/RealSaavedra/status/1087627739861897216 [https://perma.cc/32DD-QK5S]. The coverage of Ocasio-Cortez’s comments is mixed. See, e.g., Zakrzewski, supra note 1 (describing conservatives’ criticism of and other media outlets’ and experts’ support of Ocasio-Cortez’s comments).Show More Math is just math, this commentator contends, and the idea that math can be unfair is crazy.

This controversy is just one of many to challenge the fairness of algorithmic decision making.3.See, e.g., Hiawatha Bray, The Software That Runs Our Lives Can Be Biased—But We Can Fix It, Bos. Globe, Dec. 22, 2017, at B9 (describing a New York City Council member’s proposal to audit the city government’s computer decision systems for bias); Drew Harwell, Amazon’s Facial-Recognition Software Has Fraught Accuracy Rate, Study Finds, Wash. Post, Jan. 26, 2019, at A14 (reporting on an M.I.T. Media Lab study that found that Amazon facial-recognition software is less accurate with regard to darker-skinned women than lighter-skinned men, and Amazon’s criticism of the study); Tracy Jan, Mortgage Algorithms Found To Have Racial Bias, Wash. Post, Nov. 15, 2018, at A21 (reporting on a University of California at Berkeley study that found that black and Latino home loan customers pay higher interest rates than white or Asian customers on loans processed online or in person); Tony Romm & Craig Timberg, Under Bipartisan Fire from Congress, CEO Insists Google Does Not Take Sides, Wash. Post, Dec. 12, 2018, at A16 (reporting on Congresspeople’s concerns regarding Google algorithms which were voiced at a House Judiciary Committee hearing with Google’s CEO).Show More The use of algorithms, and in particular their connection with machine learning and artificial intelligence, has attracted significant attention in the legal literature as well. The issues raised are varied, and include concerns about transparency,4.See, e.g., Danielle Keats Citron, Technological Due Process, 85 Wash. U. L. Rev. 1249, 1288–97 (2008); Natalie Ram, Innovating Criminal Justice, 112 Nw. U. L. Rev. 659 (2018); Rebecca Wexler, Life, Liberty, and Trade Secrets: Intellectual Property in the Criminal Justice System, 70 Stan. L. Rev. 1343 (2018).Show More accountability,5.See, e.g., Margot E. Kaminski, Binary Governance: Lessons from the GDPR’s Approach to Algorithmic Accountability, 92 S. Cal. L. Rev. 1529 (2019); Joshua A. Kroll et al., Accountable Algorithms, 165 U. Pa. L. Rev. 633 (2017); Anne L. Washington, How To Argue with an Algorithm: Lessons from the COMPAS-ProPublica Debate, 17 Colo. Tech. L.J. 131 (2018) (arguing for standards governing the information available about algorithms so that their accuracy and fairness can be properly assessed). But see Jon Kleinberg et al., Discrimination in the Age of Algorithms (Nat’l Bureau of Econ. Research, Working Paper No. 25548, 2019), http://www.nber.org/papers/w25548 [https://perma.cc/JU6H-HG3W] (analyzing the potential benefits of algorithms as tools to prove discrimination).Show More privacy,6.See generally Frank Pasquale, The Black Box Society: The Secret Algorithms That Control Money and Information (2015) (discussing and critiquing internet and finance companies’ non-transparent use of data tracking and algorithms to influence and manage people); Anupam Chander, The Racist Algorithm?, 115 Mich. L. Rev. 1023, 1024 (2017) (reviewing Frank Pasquale, The Black Box Society: The Secret Algorithms That Control Money and Information (2015)) (arguing that instead of “transparency in the design of the algorithm” that Pasquale argues for, “[w]hat we need . . . is a transparency of inputs and results”) (emphasis omitted).Show More and fairness.7.See, e.g., Aziz Z. Huq, Racial Equity in Algorithmic Criminal Justice, 68 Duke L.J. 1043 (2019) (arguing that current constitutional doctrine is ill-suited to the task of evaluating algorithmic fairness and that current standards offered in the technology literature miss important policy concerns); Sandra G. Mayson, Bias In, Bias Out, 128 Yale L.J. 2218 (2019) (discussing how past and existing racial inequalities in crime and arrests mean that methods to predict criminal risk based on existing information will result in racial inequality).Show More This Article focuses on fairness—the issue raised by Ocasio-Cortez. It focuses on how we should assess what makes algorithmic decision making fair. Fairness is a moral concept, and a contested one at that. As a result, we should expect that different people will offer well-reasoned arguments for different conceptions of fairness. And this is precisely what we find.

The computer science literature is filled with a proliferation of measures, each purporting to capture fairness along some dimension. This Article provides a pathway through that morass. It makes three contributions: one conceptual, one normative, and one legal. This Article argues that one of the dominant measures of fairness offered in the literature tells us what to believe, not what to do, and thus is ill-suited as a measure of fair treatment. This is the conceptual claim. Second, this Article argues that the ratio between false positives and false negatives offers an important indicator of whether members of two groups scored by an algorithm are treated fairly, vis-à-vis each other. This is the normative claim. Third, this Article challenges a common assumption that anti-discrimination law prohibits the use of racial and other protected classifications in all contexts. Because using race within algorithms can increase both their accuracy and fairness, this misunderstanding has important implications. This Article’s third contribution is to show that the law poses less of a barrier than many assume.

We can use the controversy over a common risk assessment tool used by many states for bail, sentencing, and parole to illustrate the controversy about how best to measure fairness.8.See Julia Angwin et al., Machine Bias, ProPublica (May 23, 2016), https://www.pro­publica.org/article/machine-bias-risk-assessments-in-criminal-sentencing [https://perma.cc/BA53-JT7V].Show More The tool, called COMPAS, assigns each person a score that indicates the likelihood that the person will commit a crime in the future.9.Equivant, Practitioner’s Guide to COMPAS Core 7 (2019), http://www.equivant.com/wp-content/uploads/Practitioners-Guide-to-COMPAS-Core-040419.pdf [https://perma.cc/LRY6-RXAH].Show More In a high-profile exposé, the website ProPublica claimed that COMPAS treated blacks and whites differently because black arrestees and inmates were far more likely to be erroneously classified as risky than were white arrestees and inmates despite the fact that COMPAS did not explicitly use race in its algorithm.10 10.See Angwin et al., supra note 8 (“Northpointe’s core product is a set of scores derived from 137 questions that are either answered by defendants or pulled from criminal records. Race is not one of the questions.”).Show More The essence of ProPublica’s claim was this:

In forecasting who would re-offend, the algorithm made mistakes with black and white defendants at roughly the same rate but in very different ways. The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants. White defendants were mislabeled as low risk more often than black defendants.11 11.Id.Show More

Northpointe12 12.Northpointe, along with CourtView Justice Solutions Inc. and Constellation Justice Systems, rebranded to Equivant in January 2017. Equivant, Frequently Asked Questions 1, http://my.courtview.com/rs/322-KWH-233/images/Equivant%20Customer%20FAQ%20-%20FINAL.pdf [https://perma.cc/7HH8-LVQ6].Show More (the company that developed and owned COMPAS) responded to the criticism by arguing that ProPublica was focused on the wrong measure. In essence, Northpointe stressed the point ProPublica conceded—that COMPAS made mistakes with black and white defendants at roughly equal rates.13 13.See William Dieterich et al., COMPAS Risk Scales: Demonstrating Accuracy Equity and Predictive Parity, Northpointe 9–10 (July 8, 2016), http://go.volarisgroup.com/rs/430-MBX-989/images/ProPublica_Commentary_Final_070616.pdf [https://perma.cc/N5RL-M9RN].Show More Although Northpointe and others challenged some of the accuracy of ProPublica’s analysis,14 14.For a critique of ProPublica’s analysis, see Anthony W. Flores et al., False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias: There’s Software Used Across the Country To Predict Future Criminals. And It’s Biased Against Blacks.”, 80 Fed. Prob. 38 (2016).Show More the main thrust of Northpointe’s defense was that COMPAS does treat blacks and whites the same. The controversy focused on the manner in which such similarity is assessed. Northpointe focused on the fact that if a black person and a white person were each given a particular score, the two people would be equally likely to recidivate.15 15.See Dieterich et al., supra note 13, at 9–11.Show More ProPublica looked at the question from a different angle. Rather than asking whether a black person and a white person with the same score were equally likely to recidivate, it focused instead on whether a black and white person who did not go on to recidivate were equally likely to have received a low score from the algorithm.16 16.See Angwin et al., supra note 8 (“In forecasting who would re-offend, the algorithm made mistakes with black and white defendants at roughly the same rate but in very different ways.”).Show More In other words, one measure begins with the score and asks about its ability to predict reality. The other measure begins with reality and asks about its likelihood of being captured by the score.

The easiest way to fix the problem would be to treat the two groups equally in both respects. A high score and low score should mean the same thing for both blacks and whites (the measure Northpointe emphasized), and law-abiding blacks and whites should be equally likely to be mischaracterized by the tool (the measure ProPublica emphasized). Unfortunately, this solution has proven impossible to achieve. In a series of influential papers, computer scientists demonstrated that, in most circumstances, it is simply not possible to equalize both measures.17 17.See, e.g., Richard Berk et al., Fairness in Criminal Justice Risk Assessments: The State of the Art, Soc. Methods & Res. OnlineFirst 1, 23 (2018), https://journals.sagepub.com/doi/­10.1177/0049124118782533 [https://perma.cc/GG9L-9AEU] (discussing the required trade­off between predictive accuracy and various fairness measures); Alexandra Chouldechova, Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments, 5 Big Data 153, 157 (2017) (demonstrating that recidivism prediction instruments cannot simultaneously meet all fairness criteria where recidivism rates differ across groups because its error rates will be unbalanced across the groups when the instrument achieves predictive parity); Jon Kleinberg et al., Inherent Trade-Offs in the Fair Determination of Risk Scores, 67 LIPIcs 43:1, 43:5–8 (2017), https://drops.dagstuhl.de/opus/volltexte/2017/8156/pdf/LIPIcs-ITCS-2017-43.pdf [https://perma.cc/S9DM-PER2] (demonstrating how difficult it is for algorithms to simultaneously achieve the fairness goals of calibration and balance in predictions involving different groups).Show More The reason it is impossible relates to the fact that the underlying rates of recidivism among blacks and whites differ.18 18.See Bureau of Justice Statistics, U.S. Dep’t of Justice, 2018 Update on Prisoner Recidivism: A 9-Year Follow-up Period (2005–2014) 6 tbl.3 (2018), https://www.bjs.gov/­content/pub/pdf/18upr9yfup0514.pdf [https://perma.cc/3UE3-AS5S] (analyzing rearrests of state prisoners released in 2005 in 30 states and finding that 86.9% of black prisoners and 80.9% of white prisoners were arrested in the nine years following their release); see also Dieterich et al., supra note 13, at 6 (“[I]n comparison with blacks, whites have much lower base rates of general recidivism . . . .”). Of course, the data on recidivism itself may be flawed. This consideration is discussed below. See infra text accompanying notes 33–37.Show More When the two groups at issue (whatever they are) have different rates of the trait predicted by the algorithm, it is impossible to achieve parity between the groups in both dimensions.19 19.This is true unless the tool makes no mistakes at all. Kleinberg et al., supra note 17, at 43:5–6.Show More The example discussed in Part I illustrates this phenomenon.20 20.See infra Section I.A.Show More This fact gives rise to the question: in which dimension is such parity more important and why?

These different measures are often described as different conceptions of fairness.21 21.For example, Berk et al. consider six different measures of algorithmic fairness. See Berk et al., supra note 17, at 12–15.Show More This is a mistake. The measure favored by Northpointe is relevant to what we ought to believe about a particular scored individual. If a high-risk score means something different for blacks than for whites, then we do not know whether to believe (or how much confidence to have) in the claim that a particular scored individual is likely to commit a crime in the future. The measure favored by ProPublica relates instead to what we ought to do. If law-abiding blacks and law-abiding whites are not equally likely to be mischaracterized by the score, we will not know whether or how to use the scores in making decisions. If we are comparing a measure that is relevant to what we ought to believe to one that is relevant to what we ought to do, we are truly comparing apples to oranges.

This conclusion does not straightforwardly suggest that we should instead focus on the measure touted by ProPublica, however. A sophisticated understanding of the significance of these measures is fast-moving and evolving. Some computer scientists now argue that the lack of parity in the ProPublica measure is less meaningful than one might think.22 22.See Sam Corbett-Davies & Sharad Goel, The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning (arXiv, Working Paper No. 1808.00023v2, 2018), http://arxiv.org/abs/1808.00023 [https://perma.cc/ML4Y-EY6S].Show More The better way to understand the measure highlighted by ProPublica would be to say that it suggests that something is likely amiss. Differences in the ratio of false positive rates to false negative rates indicate that the algorithmic tool may rely on data that are themselves infected with bias or that the algorithm may be compounding a prior injustice. Because these possibilities have normative implications for how the algorithm should be used, this measure relates to fairness.

The most promising way to enhance algorithmic fairness is to improve the accuracy of the algorithm overall.23 23.See Sumegha Garg et al., Tracking and Improving Information in the Service of Fairness (arXiv, Working Paper No. 1904.09942v2, 2019), http://arxiv.org/abs/1904.09942 [https://perma.cc/D8ZN-CJ83].Show More And we can do that by permitting the use of protected traits (like race and sex) within the algorithm to determine what other traits will be used to predict the target variable (like recidivism). For example, housing instability might be more predictive of recidivism for whites than for blacks.24 24.See Sam Corbett-Davies et al., Algorithmic Decision Making and the Cost of Fairness, 2017 Proc. 23d ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining 797, 805.Show More If the algorithm includes a racial classification, it can segment its analysis such that this trait is used to predict recidivism for whites but not for blacks. Although this approach would improve risk assessment and thereby lessen the inequity highlighted by ProPublica, many in the field believe this approach is off the table because it is prohibited by law.25 25.See id. (“[E]xplicitly including race as an input feature raises legal and policy complications, and as such it is common to simply exclude features with differential predictive power.”).Show More This is not the case.

The use of racial classifications only sometimes constitutes disparate treatment on the basis of race and thus only sometimes gives rise to strict scrutiny. The fact that some uses of racial classifications do not constitute disparate treatment reveals that the concept of disparate treatment is more elusive than is often recognized. This observation is important given the central role that the distinction between disparate treatment and disparate impact plays in equal protection doctrine and statutory anti-discrimination law. In addition, it is important because it opens the door to more creative ways to improve algorithmic fairness.

The Article proceeds as follows. Part I develops the conceptual claim. It shows that the two most prominent types of measures used to assess algorithmic fairness are geared to different tasks. One is relevant to belief and the other to decision and action. This Part begins with a detailed explanation of the two measures and then explores the factors that affect belief and action in individual cases. Turning to the comparative context, Part I argues that predictive parity (the measure favored by Northpointe) is relevant to belief but not directly to the fair treatment of different groups.

Part II makes a normative claim. It argues that differences in the ratio of false positives to false negatives between protected groups (a variation on the measure put forward by ProPublica) suggest unfairness, and it explains why this is so. This Part begins by clarifying three distinct ways in which the concept of fairness is used in the literature. It then explains both the normative appeal of focusing on the parity in the ratio of false positives to false negatives and, at the same time, why doing so can be misleading. Despite these drawbacks, Part II argues that the disparity in the ratio of false positive to false negative rates tells us something important about the fairness of the algorithm.

Part III explores what can be done to diminish this unfairness. It argues that using protected classifications like race and sex within algorithms can improve their accuracy and fairness. Because constitutional anti­discrimination law generally disfavors racial classifications, computer scientists and others who work with algorithms are reluctant to deploy this approach. Part III argues that this reluctance rests on an overly simplistic view of the law. Focusing on constitutional law and on racial classification in particular, this Part argues that the doctrine’s resistance to the use of racial classifications is not categorical. Part III explores contexts in which the use of racial classifications does not constitute disparate treatment on the basis of race and extracts two principles from these examples. Using these principles, this Part argues that the use of protected classifications within algorithms may well be permissible. A conclusion follows.

  1. * D. Lurton Massee, Jr. Professor of Law and Roy L. and Rosamond Woodruff Morgan Professor of Law at the University of Virginia School of Law. I would like to thank Charles Barzun, Aloni Cohen, Aziz Huq, Kim Ferzan, Niko Kolodny, Sandy Mayson, Tom Nachbar, Richard Schragger, Andrew Selbst, and the participants in the Caltech 10th Workshop in Decisions, Games, and Logic: Ethics, Statistics, and Fair AI, the Dartmouth Law and Philosophy Workshop, and the computer science department at UVA for comments and critique. In addition, I would like to thank Kristin Glover of the University of Virginia Law Library and Judy Baho for their excellent research assistance. Any errors or confusions are my own.
  2. Blackout for Human Rights, MLK Now 2019, Riverside Church in the City of N.Y. (Jan. 21, 2019), https://www.trcnyc.org/mlknow2019/ [https://perma.cc/L45Q-SN9T] (interview with Rep. Ocasio-Cortez begins at approximately minute 16, and comments regarding algorithms begin at approximately minute 40); see also Danny Li, AOC Is Right: Algorithms Will Always Be Biased as Long as There’s Systemic Racism in This Country, Slate (Feb. 1, 2019, 3:47 PM), https://slate.com/news-and-politics/2019/02/aoc-algorithms-racist-bias.html [https://perma.cc/S97Z-UH2U] (quoting Ocasio-Cortez’s comments at the event in New York); Cat Zakrzewski, The Technology 202: Alexandria Ocasio-Cortez Is Using Her Social Media Clout To Tackle Bias in Algorithms, Wash. Post: PowerPost (Jan. 28, 2019), https://www.washingtonpost.com/news/powerpost/paloma/the-technology-202/2019/01/28 /the-technology-202-alexandria-ocasio-cortez-is-using-her-social-media-clout-to-tackle-bias-in-algorithms/5c4dfa9b1b326b29c37­78cdd/?utm_term=.541cd0827a23 [https://perma.cc/ LL4Y-FWDK] (discussing Ocasio-Cortez’s comments and reactions to them).
  3. Ryan Saavedra (@RealSaavedra), Twitter (Jan. 22, 2019, 12:27 AM), https://twitter.com/RealSaavedra/status/1087627739861897216 [https://perma.cc/32DD-QK5S]. The coverage of Ocasio-Cortez’s comments is mixed. See, e.g., Zakrzewski, supra note 1 (describing conservatives’ criticism of and other media outlets’ and experts’ support of Ocasio-Cortez’s comments).
  4. See, e.g., Hiawatha Bray, The Software That Runs Our Lives Can Be Biased—But We Can Fix It, Bos. Globe, Dec. 22, 2017, at B9 (describing a New York City Council member’s proposal to audit the city government’s computer decision systems for bias); Drew Harwell, Amazon’s Facial-Recognition Software Has Fraught Accuracy Rate, Study Finds, Wash. Post, Jan. 26, 2019, at A14 (reporting on an M.I.T. Media Lab study that found that Amazon facial-recognition software is less accurate with regard to darker-skinned women than lighter-skinned men, and Amazon’s criticism of the study); Tracy Jan, Mortgage Algorithms Found To Have Racial Bias, Wash. Post, Nov. 15, 2018, at A21 (reporting on a University of California at Berkeley study that found that black and Latino home loan customers pay higher interest rates than white or Asian customers on loans processed online or in person); Tony Romm & Craig Timberg, Under Bipartisan Fire from Congress, CEO Insists Google Does Not Take Sides, Wash. Post, Dec. 12, 2018, at A16 (reporting on Congresspeople’s concerns regarding Google algorithms which were voiced at a House Judiciary Committee hearing with Google’s CEO).
  5. See, e.g., Danielle Keats Citron, Technological Due Process, 85 Wash. U. L. Rev. 1249, 1288–97 (2008); Natalie Ram, Innovating Criminal Justice, 112 Nw. U. L. Rev. 659 (2018); Rebecca Wexler, Life, Liberty, and Trade Secrets: Intellectual Property in the Criminal Justice System, 70 Stan. L. Rev. 1343 (2018).
  6. See, e.g., Margot E. Kaminski, Binary Governance: Lessons from the GDPR’s Approach to Algorithmic Accountability, 92 S. Cal. L. Rev. 1529 (2019); Joshua A. Kroll et al., Accountable Algorithms, 165 U. Pa. L. Rev. 633 (2017); Anne L. Washington, How To Argue with an Algorithm: Lessons from the COMPAS-ProPublica Debate, 17 Colo. Tech. L.J. 131 (2018) (arguing for standards governing the information available about algorithms so that their accuracy and fairness can be properly assessed). But see Jon Kleinberg et al., Discrimination in the Age of Algorithms (Nat’l Bureau of Econ. Research, Working Paper No. 25548, 2019), http://www.nber.org/papers/w25548 [https://perma.cc/JU6H-HG3W] (analyzing the potential benefits of algorithms as tools to prove discrimination).
  7. See generally Frank Pasquale, The Black Box Society: The Secret Algorithms That Control Money and Information (2015) (discussing and critiquing internet and finance companies’ non-transparent use of data tracking and algorithms to influence and manage people); Anupam Chander, The Racist Algorithm?, 115 Mich. L. Rev. 1023, 1024 (2017) (reviewing Frank Pasquale, The Black Box Society: The Secret Algorithms That Control Money and Information (2015)) (arguing that instead of “transparency in the design of the algorithm” that Pasquale argues for, “[w]hat we need . . . is a transparency of inputs and results”) (emphasis omitted).
  8. See, e.g., Aziz Z. Huq, Racial Equity in Algorithmic Criminal Justice, 68 Duke L.J. 1043 (2019) (arguing that current constitutional doctrine is ill-suited to the task of evaluating algorithmic fairness and that current standards offered in the technology literature miss important policy concerns); Sandra G. Mayson, Bias In, Bias Out, 128 Yale L.J. 2218 (2019) (discussing how past and existing racial inequalities in crime and arrests mean that methods to predict criminal risk based on existing information will result in racial inequality).
  9. See Julia Angwin et al., Machine Bias, ProPublica (May 23, 2016), https://www.pro­publica.org/article/machine-bias-risk-assessments-in-criminal-sentencing [https://perma.cc/BA53-JT7V].
  10. Equivant, Practitioner’s Guide to COMPAS Core 7 (2019), http://www.equivant.com/wp-content/uploads/Practitioners-Guide-to-COMPAS-Core-040419.pdf [https://perma.cc/LRY6-RXAH].
  11. See Angwin et al., supra note 8 (“Northpointe’s core product is a set of scores derived from 137 questions that are either answered by defendants or pulled from criminal records. Race is not one of the questions.”).
  12. Id.
  13. Northpointe, along with CourtView Justice Solutions Inc. and Constellation Justice Systems, rebranded to Equivant in January 2017. Equivant, Frequently Asked Questions 1, http://my.courtview.com/rs/322-KWH-233/images/Equivant%20Customer%20FAQ%20-%20FINAL.pdf [https://perma.cc/7HH8-LVQ6].
  14. See William Dieterich et al., COMPAS Risk Scales: Demonstrating Accuracy Equity and Predictive Parity, Northpointe 9–10 (July 8, 2016), http://go.volarisgroup.com/rs/430-MBX-989/images/ProPublica_Commentary_Final_070616.pdf [https://perma.cc/N5RL-M9RN].
  15. For a critique of ProPublica’s analysis, see Anthony W. Flores et al., False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias: There’s Software Used Across the Country To Predict Future Criminals. And It’s Biased Against Blacks.”, 80 Fed. Prob. 38 (2016).
  16. See Dieterich et al., supra note 13, at 9–11.
  17. See Angwin et al., supra note 8 (“In forecasting who would re-offend, the algorithm made mistakes with black and white defendants at roughly the same rate but in very different ways.”).
  18. See, e.g., Richard Berk et al., Fairness in Criminal Justice Risk Assessments: The State of the Art, Soc. Methods & Res. OnlineFirst 1, 23 (2018), https://journals.sagepub.com/doi/­10.1177/0049124118782533 [https://perma.cc/GG9L-9AEU] (discussing the required trade­off between predictive accuracy and various fairness measures); Alexandra Chouldechova, Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments, 5 Big Data 153, 157 (2017) (demonstrating that recidivism prediction instruments cannot simultaneously meet all fairness criteria where recidivism rates differ across groups because its error rates will be unbalanced across the groups when the instrument achieves predictive parity); Jon Kleinberg et al., Inherent Trade-Offs in the Fair Determination of Risk Scores, 67 LIPIcs 43:1, 43:5–8 (2017), https://drops.dagstuhl.de/opus/volltexte/2017/8156/pdf/LIPIcs-ITCS-2017-43.pdf [https://perma.cc/S9DM-PER2] (demonstrating how difficult it is for algorithms to simultaneously achieve the fairness goals of calibration and balance in predictions involving different groups).
  19. See Bureau of Justice Statistics, U.S. Dep’t of Justice, 2018 Update on Prisoner Recidivism: A 9-Year Follow-up Period (2005–2014) 6 tbl.3 (2018), https://www.bjs.gov/­content/pub/pdf/18upr9yfup0514.pdf [https://perma.cc/3UE3-AS5S] (analyzing rearrests of state prisoners released in 2005 in 30 states and finding that 86.9% of black prisoners and 80.9% of white prisoners were arrested in the nine years following their release); see also Dieterich et al., supra note 13, at 6 (“[I]n comparison with blacks, whites have much lower base rates of general recidivism . . . .”). Of course, the data on recidivism itself may be flawed. This consideration is discussed below. See infra text accompanying notes 33–37.
  20. This is true unless the tool makes no mistakes at all. Kleinberg et al., supra note 17, at 43:5–6.
  21. See infra Section I.A.
  22. For example, Berk et al. consider six different measures of algorithmic fairness. See Berk et al., supra note 17, at 12–15.
  23. See Sam Corbett-Davies & Sharad Goel, The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning (arXiv, Working Paper No. 1808.00023v2, 2018), http://arxiv.org/abs/1808.00023 [https://perma.cc/ML4Y-EY6S].
  24. See Sumegha Garg et al., Tracking and Improving Information in the Service of
    Fairness (arXiv, Working Paper No. 1904.09942v2, 2019), http://arxiv.org/abs/1904.09942 [https://perma.cc/D8ZN-CJ83].
  25. See Sam Corbett-Davies et al., Algorithmic Decision Making and the Cost of Fairness, 2017 Proc. 23d ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining 797, 805.
  26. See id. (“[E]xplicitly including race as an input feature raises legal and policy complications, and as such it is common to simply exclude features with differential predictive power.”).

Click on a link below to access the full text of this article. These are third-party content providers and may require a separate subscription for access.

  Volume 106 / Issue 4  

Measuring Algorithmic Fairness

Algorithmic decision making is both increasingly common and increasingly controversial. Critics worry that algorithmic tools are not transparent, accountable, or fair. Assessing the fairness of these tools has been especially fraught as it requires …

By Deborah Hellman
106 Va. L. Rev. 811

Manipulating Opportunity

Concerns about online manipulation have centered on fears about undermining the autonomy of consumers and citizens. What has been overlooked is the risk that the same techniques of personalizing information online can also threaten equality. When …

By Pauline T. Kim
106 Va. L. Rev. 867

Designing Business Forms to Pursue Social Goals

The long-standing debate about the purpose and role of business firms has recently regained momentum. Business firms face growing pressure to pursue social goals and benefit corporation statutes proliferate across many U.S. states. This trend is …

By Ofer Eldar
106 Va. L. Rev. 937

Transatlantic Perspectives on the Political Question Doctrine

On September 24, 2019, the Supreme Court of the United Kingdom (UKSC) unanimously invalidated U.K. Prime Minister Boris Johnson’s attempt to suspend (or “prorogue”) Parliament. The UKSC’s decision, R (Miller) v. Prime Minister (Miller/Cherry), was a …

By Jackson A. Myers
106 Va. L. Rev. 1007