Ridding the judicial system of human subjectivity

January 23, 2019

Algorithmic sentencing, using machine learning to assess recidivism risk, has demonstrated consistent outcomes. But is not without flaws, sometimes reflecting human biases. Despite imperfections, I believe algorithms can introduce objectivity and be fine-tuned to reduce biases, making them more reliable than human judgment.

This article was first published in The Mint. You can read the original at this link.

If it is to be completely fair, a legal system must be consistent. For justice to be meted out, a decision handed down by one judge should not be very different from that pronounced by another in a case with largely similar facts. In reality, however, this is rarely the case.

To ascertain whether there is such a thing as judicial consistency, 47 district court judges from the state of Virginia, US, were asked to participate in a survey. Each judge was given five different hypothetical cases and asked to adjudicate on them. Far from displaying consistency, their decisions could not have been more widely divergent. In one case, of those who voted guilty, 44% recommended probation, 22% imposed a fine, 17% imposed probation and a fine while the rest suggested jail time. If a group of sitting judges, adjudicating on the same set of facts could come up with such widely disparate results, how can we hope for any measure of consistency when they rule on real cases?

With this in mind, a number of countries have put in place prescriptive systems designed to take human subjectivity out of sentencing. These systems are designed to ensure that individuals convicted of the same crime always receive the same sentence. However, by removing judicial discretion, they sometimes fail to appropriately consider important mitigating circumstances that help establish whether or not the person convicted of the offence has any chance of being rehabilitated. It, therefore, becomes important to find a way to empirically establish what the likelihood is that a convicted criminal will commit a crime again.

In 1928, Ernest Burgess came up with the concept of unit-weighted regression and applied it to the evaluation of recidivism risk in prison populations. He identified 21 measures and assigned to each of the convicts in his sample set a score of either zero or one against each parameter. When the scores were added up, he predicted that convicts with scores of between 14 and 21 had a high chance of parole success, while those with scores of four or less were likely to have a high rate of recidivism. When he tested his prediction against what actually happened, 98% of his low-risk group made it through parole without incident while 76% of his high-risk group did not.

By 1935, the Burgess method was being used in prisons in Illinois and variants of this mathematical approach began to be used around the world. As computers got more advanced, the algorithms designed to assess recidivism risk were able to take into consideration a significantly larger number of factors. With advances in machine learning, they could spot patterns that humans could not hope to see. Not only was this approach producing consistent results every time the same set of facts were presented, given the vast volumes of data these systems could process, their ability to accurately establish recidivism risk was far better than any human could hope to deliver.

That said, algorithmic sentencing is not perfect. When 19-year-old Christopher Brooks was convicted of statutory rape for having consensual sex with a minor, the judge relied on an algorithm for sentencing. This particular algorithm used the age gap between the victim and the accused to evaluate risk—the greater the age gap, the lower the risk. This meant that, had Christopher been 36 years old, the age gap would have been so large that algorithm would have recommended that he serve no jail time. It is outcomes like this that have resulted in a public backlash against the use of algorithms in situations where such use could affect life and personal liberties.

The fact of the matter is that algorithms build their models on historical data sets, precedents that are themselves the outcome of decades of choices made by humans who are far from objective. We created objective algorithms because we knew that humans were inherently irrational in the decisions that they made. However, the solution we created seems to be infected with the same biases that we were aiming to eradicate.

Where does this leave us? Do we scrap algorithmic decision-making entirely and go back to relying on our uniquely human sense of justice? Most people will say they are more comfortable having their futures decided by a flesh and blood human being than by an inscrutable, soulless algorithm.

However, I am reluctant to go down that path. I find it impossible to ignore the evidence of decades of flawed human decision-making that characterises our judicial system. Our biases run so close to the surface that based on what we now know, it is clear that human decision can never be completely rational. Recent studies have shown that judges with daughters are more likely to issue decisions favourable to women while judges coming back from a recess are more likely to grant bail and those heading into one are unlikely to do so. The entire science of behavioural economics proves that as much as we might think we are behaving rationally, more often than not, we are just responding to unconscious biases.

Algorithms, while not perfect, can at least help introduce objectivity and eliminate random errors. Any flaws we discover in their outcomes can be fine-tuned to eliminate all the human biases that might have crept in. As long as we are mindful of their limitations, algorithms are likely to be more objective than either you or me.