Do Your Algorithms Need a Performance Review?

Do Your Algorithms Need a Performance Review?

In today’s digital organizations, HR departments are increasingly using algorithms to aid in their decision-making, by predicting who is a retention risk, who is ready for a promotion, and whom to hire. For the employees and candidates subjected to these decisions, these are important, even life-changing, events, and so we would would expect the people making them to be closely supervised and held to a set of known performance criteria. Does anyone supervise the algorithms in the same way?

Algorithms don’t monitor themselves. Replacing a portion of your recruiting team with AI doesn’t obviate the need to manage the performance of that AI in the same way you would have managed the performance of the recruiter. To ensure that the decisions of an AI-enhanced HR function are fair, accurate, and right for the business, organizations must establish performance criteria for algorithms and a process to review them periodically.

A recent special report in The Economist illustrates the significant extent to which AI is already changing the way HR works. The report covers eight major companies that are now using algorithms in human resource management, which they either developed internally or bought from a growing field of vendors for use cases including recruiting, internal mobility, retention risk, and pay equity. These practices are increasingly mainstream; 2018 may mark the year of transition between “early adopters” and “early majority” in the life cycle of this technology.

At this point in time, it is essential that leaders ask themselves whether their organizations have management practices in place to supervise the decisions of these algorithms. The Economist concludes their piece with a reminder about transparency, supervision, and bias, noting that companies “will need to ensure that algorithms are being constantly monitored,” particularly when it comes to the prevention of bias.

The concept of algorithmic bias is neither new nor especially counterintuitive, so HR leaders need to anticipate it and consider how to mitigate it in any algorithmic processes they adopt. Cathy O’Neil, the author of Weapons of Math Destruction, and Gideon Mann, head of data science at Bloomberg, stressed this point in an article at the Harvard Business Review back in 2016, when O’Neil’s book was published:

While there are good algorithms that have been properly calibrated to efficiently and accurately measure results, such success doesn’t happen by accident. We need to audit and modify algorithms so that they do not perpetuate inequities in businesses and society. Consider assigning a team or hiring outside professionals to audit key algorithms.

The idea of an algorithm audit is critically important as we enter a more AI-driven age. Someone needs to supervise the ongoing performance of the algorithm. This need, however, raises two important questions: Who has the authority to implement this monitoring, and who has the expertise to do it?

Furthermore, monitoring the performance of algorithms requires a measure of transparency on the part of the companies that use them. Currently, the two greatest barriers to this transparency are “it’s complicated” and “it’s proprietary.”

In an interview with The Guardian at the time, O’Neil encouraged non-technical leaders and laypeople whose lives are affected by algorithms not to accept “it’s complicated” as an answer:

[S]ometimes it’s hard for non-statisticians to know which questions to ask. O’Neil’s advice is to be persistent. “People should feel more entitled to push back and ask for evidence, but they seem to fold a little too quickly when they’re told that it’s complicated,” she says.

To understand how an algorithm is affecting their lives and work, non-data scientists may need to enlist a specialized team or outside expert to examine exactly what an algorithm is doing at the mathematical level.

That’s where the second barrier—“it’s proprietary”—arises. While vendors are required to test their algorithms against a set of statutory requirements, generally they do not offer any testing or performance monitoring beyond the legal minimum, nor do they allow customers to view the source code. This leaves companies blind on questions such as:

  • Is the algorithm still performing as well as when it was implemented X months ago, or has its performance degraded?
  • Is the algorithm biased in areas that are not explicitly tested for?
  • Could adherence to the algorithm be shifting the composition of our workforce in a way that will create other issues?

A final challenge in monitoring algorithms comes from the choice of math used to create them. While some algorithms can be written out on a sheet of paper or lines of code for a trained eye to examine, others cannot. Deep neural networks are beyond examination even to those who control the source code. In a recent article at Fast Company, psychology professor Thomas T. Hills made several useful comparisons between deep neural networks and the human brain:

Their problems … are embedded in the way that they represent information. That representation is an ever-changing high-dimensional space, much like walking around in a dream. Solving problems there requires nothing less than a psychotherapist for algorithms.

Algorithms also make mistakes because they pick up on features of the environment that are correlated with outcomes, even when there is no causal relationship between them. In the algorithmic world, this is called overfitting. When this happens in a brain, we call it superstition.

This algorithmic death spiral is hidden in nesting dolls of black boxes: black-box algorithms that hide their processing in high-dimensional thoughts that we can’t access are further hidden in black boxes of proprietary ownership..

The inscrutable processes and proprietary ownership of these complex algorithms, which Hills describes as “nesting dolls of black boxes,” have motivated lawmakers in places like New York City to legislate accountability for algorithmic decision-making and discrimination. These demands for accountability are encouraging, and come as the result of ongoing efforts by data scientists to educate corporate leaders and governments, as well as each other, on the risks of algorithmic bias.

In the meantime, organizations embedding AI and algorithms into their HR processes need to ask themselves and their data teams some practical questions about their nature and purpose. O’Neil discussed the most critical of these questions in an interview at Wired last month, in which she advocated for a data science analogue to the Hippocratic Oath:

The first question is, are the algorithms that we deploy going to improve the human processes that they are replacing? Far too often we have algorithms that are thrown in with the assumptions that they’re going to work perfectly, because after all they’re algorithms, but they actually end up working much worse than the system that they’re replacing. …

The second question is to ask, for whom is the algorithm failing? We need to be asking, “Does it fail more often for women than for men? Does it fail more often for minorities than for whites? Does it fail more often for old people than for young people?” Every single class should get a question and an answer. … The third category of question is simply, is this working for society?