you should filter out irrelevant details like names before any evaluation step
Unfortunately, doing this can make things worse. It’s not a simple problem to solve, but you are generally on the right track. A good example of how it’s more than just names, is how orchestras screen applicants - when they play a piece they do so behind a curtain so you can’t see the gender of the individual. But the obfuscation doesn’t stop there - they also ensure the female applicants don’t wear shoes with heels (something that makes a distinct sound) and they even have someone stand on stage and step loudly to mask their footsteps/gait. It’s that second level of thinking which is needed to actually obscure gender from AI, and the more complex a data set the more difficult it is to obscure that.
Interesting read, thanks! I’ll finish it later, but already this bit is quite interesting:
Without access to gender, the ML algorithm over-predicts women to default compared to their true default rate, while the rate for men is accurate. Adding gender to the ML algorithm corrects for this and the gap in prediction accuracy for men and women who default diminishes.
Unfortunately, doing this can make things worse. It’s not a simple problem to solve, but you are generally on the right track. A good example of how it’s more than just names, is how orchestras screen applicants - when they play a piece they do so behind a curtain so you can’t see the gender of the individual. But the obfuscation doesn’t stop there - they also ensure the female applicants don’t wear shoes with heels (something that makes a distinct sound) and they even have someone stand on stage and step loudly to mask their footsteps/gait. It’s that second level of thinking which is needed to actually obscure gender from AI, and the more complex a data set the more difficult it is to obscure that.
Interesting read, thanks! I’ll finish it later, but already this bit is quite interesting: