No one ever intends to create a biased algorithm and there are huge downsides for using one, so why do these algorithms keep appearing, and whose fault is it when they do? The simplest explanation for why algorithmic bias keeps happening is that it is legitimately hard to avoid. As for the second question, there is no consensus between algorithm developers and their customers about who is ultimately responsible for quality. In reality, they are both to blame.
Vendors and in-house data science teams have a lot of options for mitigating bias in their algorithms, from reducing cognitive biases, to including more female programmers, to checklists of quality tests to run, to launching AI ethics boards. Unfortunately, they are seldom motivated to take these steps proactively because doing so lengthens their timeframes and raises the risk of an adverse finding that can derail a project indefinitely.
At the same time, clients are not asking for more extensive oversight or testing beyond what the developer offers them. The client usually doesn’t know enough about how these algorithms work to ask probing questions that might expose problems. As a result, the vendor doesn’t test or take precautions beyond their own minimum standards, which can vary widely.
In a recent interview with Employee Benefit News, HireVue’s Chief IO Psychologist Nathan Mondragon discussed a situation in which his company built a client an employee selection algorithm that failed adverse impact tests. The bias, Mondragon said, was not created by HireVue’s algorithm, but rather already existed in the company’s historical hiring data, skewing the algorithm’s results. In his description, they told the customer: “There’s no bias in the algorithm, but you have a bias in your hiring decisions, so you need to fix that or … the system will just perpetuate itself.”
In this case, Mondragon is right that responsibility for the bias identified in the adverse impact test began with the client. However, I would argue that vendors who do this work repeatedly for many clients should anticipate this outcome and accept some responsibility for not detecting the bias at the start of the project or mitigating it in the course of algorithm development. Finding out that bias exists in the historical data only at the adverse impact testing phase, typically one of the last steps, is the developer’s fault.
It is good that vendors conduct adverse impact tests that detect gender, age, and racial biases in pre-employment screening and selection algorithms, but buyers also need to recognize that this is the minimum Equal Employment Opportunity testing standard in the United States. Furthermore, adverse impact ratios of up to 80 percent are allowed under the Uniform Guidelines on Employee Selection Practices, so a passing result is not the same as an unbiased algorithm.
Additionally, an algorithm that passes the three checks for gender, age, and racial bias could still be perpetuating historical bias against overweight applicants, or giving an edge to physically attractive candidates who are known to receive higher performance ratings. This is a chicken-and-egg problem because organizations would need to have fields such as “weight” or “attractiveness” in their HR information system or recruiting systems to even be able to run such a test. If the data doesn’t exist, the bias cannot be assessed, much less corrected.
Another problem that could arise from insufficient checks is that the algorithm ignores whole categories of candidates, even when the organization is actively looking for them. This is a problem vendors don’t typically test for or design against proactively, and one clients often don’t think to ask about.
For example, most HRIS software does contain a field for veteran status. Many large organizations are intentionally trying to attract and hire more employees from this underserved cohort, such as T-Mobile, which recently announced a plan to hire 10,000 veterans. If an organization with a goal of hiring more veterans uses an algorithm at any point in their sourcing, screening, and selection process, they should ask the developer to run tests specific to veteran status. Algorithms that rely heavily on resume content and interview responses can easily be blind to veterans: a somewhat different issue than being biased.
The idea of a blind algorithm requires some explanation. In most cases where natural language processing is used, words that appear with very low frequency are ignored. One could imagine such a system examining resumes, where less common past positions such as “prosthetist,” “confectioner,” or “translator” are never considered by the computer model, which instead operates on more common terms such as “waiter,” “driver,” “mechanic,” or “nurse.” Through that lens, consider the resume of a long-serving military veteran. It will contain terms such as “squadron,” “lieutenant,” and “commendation,” which are uncommon in the general applicant population and are therefore unlikely to be a part of the model.
Imagine further a simplified algorithm where the candidate gets one point each time they mention “team,” “director,” and “award.” The military applicant will score a zero even if they have won awards while directing a team, simply because their vocabulary is different. Blindness to the overall language of the military is a different concern than Mondragon’s example of bias in the historical training data, but it leads to the same result: fewer veterans receive job offers. The same is true for an educational setting, where the lexicon contains terms such as “classroom,” “instructor,” and “principal,” which are not common in other fields.
When an algorithm develops a bias or blind spot, whether as a result of a design flaw or a pre-existing problem within the organization, anyone involved in the procurement, development, and deployment of that algorithm needs to accept responsibility for preventing or mitigating its effect. The essential first step in addressing bias is knowing—and knowing requires actively seeking the truth.
Business leaders must educate themselves about how AI and machine learning can go astray and be willing to ask the tough questions. Data scientists need to proactively test for bias and follow up on hunches, even if clients don’t ask them to. As Bain partner Chris Brahm put it in a recent Forbes piece on the unintended consequences of AI: “It is on business leaders and those investing in the technology to understand and begin to act on six serious risks,” which he identifies as hidden errors, loss of skill, critical thinking and understanding, new hazards, institutional bias, loss of empathy, and loss of control.
By no means, however, does that let the data scientists off the hook. Algorithm developers, whether they are acting as vendors, independent contractors, or employees, should “protect the client from relying and making decisions based on bad or uncertain data quality” and “inform the client of all data science results and material facts known to the data scientist,” according to Section 8 of the Data Science Association’s Professional Code of Conduct.
When business leaders or developers encounter inconvenient truths in an algorithm, they should celebrate the fact that they found it and have the opportunity to intervene before allowing the algorithm to make (more) life-altering decisions on the basis of bad judgment. Once these problems become known, the fault rests on the shoulders of any party that is unwilling to act to correct them.