Recap: The Man, The Machine, and The Black Box
Arize AI CPO Aparna Dhinakaran recently took the virtual stage at Re-Work. In her presentation, Aparna talked about the growing importance of responsible AI.
“Before Arize, I was working on a Ph.D. in Computer Vision, and I began to look at things like ML fairness and how bias got introduced into models. At the time, it struck me that as a researcher, I couldn’t even answer basic questions about model performance and model service metrics, let alone complex and hard questions about things like bias and fairness.”
According to Aparna, looking at ML fairness in the real world has a clear set of challenges that most organizations face, including:
- Lack of access to protected attributes for modelers
- No easy or consistent way to check for model bias
- The tradeoff between fairness and business impact
- Responsibility diffused across individuals, teams, and organizations.
“Frequently, data is handed to modelers stripped of protected attributes, such as gender, age, and ethnicity. There’s a whole theory that if you don’t include that information, then your model by definition can’t be biased, and we’re seeing that that’s wrong. It’s one of the first steps that lead models astray.”
“Beyond that, there’s very little training in our industry to teach data teams how to check for bias once the model is built. There are no standards or guidelines that are universal or can be applied case-by-case to specific industries. A lot of best practices are still coming out in research.”
How ML bias enters the real world
“One of the common errors we see is in skewed samples. For example, in policing, departments increasingly rely on models to predict crime, and these models are based on historical data. So you see more officers being dispatched to certain neighborhoods and you end up building on and perpetuating historical data in a way that disproportionately impacts people of color.”
Human bias is also introduced into data, according to Aparna. In hiring, it’s common for managers to label resume data that informs models based on the success or failure of previous hires. If there’s been a historical human bias towards hiring men at the company, the models may recognize and view as “good” certain words that men use on their resumes more frequently than women. This leads to more men’s resumes being tagged for review by the system.
Another issue is proxy information. Even if you don’t have protected class information within models, there are other sensitive attributes such as neighborhood that allow the model to learn about things like race and ultimately use these inferences to make decisions that are biased.
Sample size data also comes into play. If the training data coming from the minority group is much smaller than the data coming from the majority group, it is less likely to model perfectly the minority group.
Optimizing model fairness in your organization
According to Aparana, there are 3 steps every organization can take to optimize models for fairness:
- Increase organizational investment
Today, many organizations have a chief privacy officer to handle questions like what the department structure is, how often they check for data security risks, etc. Similarly, for fairness, we see companies starting to think about it on the same level structurally, as well as creating incentives to identify or catch these risks to the organization.
- Define an ethical framework
Companies should consider creating a data and AI ethical framework that is tailored to their industry. Even though every business has a unique set of challenges they are trying to address, it’s important to think about metrics, what they are optimizing for and how to frame the problem across the organization.
- Establish tools for visibility
No model stays perfect. Having tools for visibility that allow you to monitor so you can measure and improve models when bias is detected.
ML observability: the guardrails for model fairness
The ability to surface problems or identify issues is the last step in determining not only if your model is performing well, but if it’s doing the right thing. Using ML observability, it’s possible to not only identify problems once models are in production, but also before they are deployed.
Using observability tools, you can answer whether models are good enough from a performance perspective, but also whether they are free of bias and not just catering to certain groups.
Once deployed, it’s important to be able to surface model issues via tools and platforms and then have the ability to troubleshoot them and get to their root cause. Ultimately, this allows you to bring to light the key feature(s) that drove the model’s decision, not just for one individual person or one individual group, but across the board.
Once you know there is an issue, you can determine the best way to fix it, whether it’s through tackling the bias in the data through pre-processing, by examining how you’re training the model, or by looking at a model’s decision and modifying that output.
Improving model fairness is a crucial part of making machine learning successful, not just delivering better-performing models, but also making sure models work for the people.
To learn more, watch Aparna’s presentation in its entirety.