David Gingell
David joined Seal in February 2017 after 25 years of experience in hi-tech sales and marketing. His most recent position was as CMO of TeamViewer GmbH, the remote access and control software specialist. Prior to that role he was the VP Marketing for EMEA for major tech brands EMC, Adobe and NetApp over a ten-year period. Gingell was also an early employee of Documentum which was later acquired by EMC. He was a key player in the formation of the Enterprise Content Management market, and helped Documentum reach the number 1 vendor position prior to its acquisition. Early in his career, he held technical and sales roles at Ingres and Oracle. He holds an honors degree from the University of Wales and an MBA from Henley Business School, UK.

Preventing Bias in Machine Learning Models

David Gingell | Nov 08, 2018

It is no secret that the word “bias” has a negative connotation. Although not as strong as “prejudiced,” it has the same implication – taking a particular side in favour of or against a particular person or group. We are all biased to some extent, even if we don’t realize or wish to admit it. We all have some of what is termed “unconscious bias”.

recent article in The Wall Street Journal reported the case of a senior technology executive who, though open and collaborative in nature, discovered that he held unconscious biases after undergoing company training. While well-meaning and held in high regard by his peers, he unknowingly showed biases in favour of males. He was shocked and strove to do something about it. From the contents of his address book to his LinkedIn contacts, the bias towards males became evident as he dug deeper into his professional life. He took steps to redress his unconscious bias by proactively extending his network amongst females and minorities and seeking out input from the fairer sex online. In that way, he began to rebalance the disparity and reduce the unconscious bias that might have been seen in his recruiting, meetings and general business life.

This concept of unconscious bias has been particularly newsworthy this past month with Reuters’ recent revelation that, a year ago, Amazon abandoned an AI recruiting tool it had developed due to an inbuilt bias. The system, which was running in the eCommerce division, had an unconscious bias against females. Given the tech-driven nature of many roles at Amazon, it’s not surprising that the majority of applicants have traditionally been male. The AI tool worked like many other Machine Learning (ML) systems in that it looked for patterns in the historical data, in this case the many thousands of résumés received over the years. The more data used in training, the more accurate and precise the model became. The ML algorithms were coded to look for certain attributes necessary for the open position and to discount others. It appears then that the algorithm, replete with training data, was discriminating against female candidates by weeding out résumés with strong female indicators (e.g. playing for a women’s hockey team or being an alumna of an all-girls college). Females were not represented in the model due to their lack of frequency in historical hires. Male applicants who tended to pepper their résumés with strong, male-oriented language were favored because most past new hires were male. This is what is meant by unconscious bias. The model learns based on the data it examines, so if the data is skewed one way or another, the model cannot counter that bias. Unconscious bias has been introduced.

This is one of the criticisms of machine learning – it is only as good as the data it is given to train, and if that data, looked at collectively, creates a bias, then the AI tool will accentuate it. The old adage, “garbage in, garbage out” could be considered appropriate. Therefore, the training data and how it is applied must be evaluated carefully when considering an AI-powered solution.

Seal uses AI to extract key pieces of information out of corporate contracts. The ML models that Seal has developed are taught by feeding them training data and tuning the models to only retrieve the information requested. This creates a model that can then be run against a corpus of thousands upon thousands of contracts. But a key part of a successful outcome for a contract analytics model is to understand the training data before training the model.

The following best practices can help reduce bias in your ML models:

Use a Large Dataset

In many situations the bias can be lessened by ensuring a very large dataset is used to train the model. It will allow more generic patterns to surface and a more accurate tenor of the data to be unearthed. Usually this is enough. However, in Amazon’s case, a large dataset did not solve the problem. The answer lies in really understanding the data and what you are trying to achieve before you even think of bringing the data and model together. The data scientist needs to understand the nature of the data before introducing it and be on the lookout for any likely areas of bias.

Conscious Bias

It is possible to correct bias by adding weight to the specific items that you wish the model to focus on. This requires a detailed understanding of the data you have and the realization that you are adding a “conscious bias” to the model.  Often this is used when there is not enough training data and it is necessary to shorten the time it takes to train. Or you what to change a historical bias – like in the Amazon case. In essence, you are giving the model “a helping hand”. Generally, this is not ideal unless you really know your data and are clear on the expected outcomes.

Seal supports the use of “conscious bias” if it is deemed necessary. Seal allows users to add weighted items and create weighted natural language processing (NLP). The system also supports the creation of strong negative NLP which rejects items based on the inclusion of a particular negatively-weighted item. This can lead to better insight and may well produce more control. However, it is critical for customers to understand the data and expected outcomes, as blindly providing data and trusting a system to learn effectively is akin to giving a teenager a lot pictures of bridges and asking them to design and build a new one for a major thoroughfare. It might look the same, but the structural integrity will be slightly off!

In summary, when we undertake a project with one of our clients we can work hand-in-hand with them to not only agree on the outcomes of the project but to help them understand the data used to train the models and ensure that the intended outcomes are achievable with the training data. Our clients are then able to eliminate as much potential unconscious bias as possible and if they choose to use some “conscious bias”, they are fully aware of the implications.  

To learn more about AI technology read our white paper: AI for Everyone