By Sarah Stone
Note: At the risk of adding yet another article complaining about the tech sector, the focus of this piece is the importance of understanding potential bias within data, it’s product, and the resulting need to invest in actionable solutions.
I love data. I romanticize it. Or rather, I did romanticize it. I started out as an artist, and after some time parsing through subjective truths and variances of “what is art” and “what is truth,” I wanted the hard stuff. The scientific method, objective and proven, a rigorous approach to finding robust solutions. To no longer “intuit” a truth but rather, unthread it from tangles of data.
When we utilize the power of analytics with respect to people, however, it is no longer objective. Because people are not objective, people are biased and results are only as good as the data they stem from. An algorithm then or the data set it functions for, may have a structure that is, in its collection or structure, biased.
I was one attendee of several hundred at Lesbians Who Tech + Allies Summit in New York,this past September. Of all the speakers, Cathy O’Neil, a renowned mathematician and data scientist was particularly memorable. In her talk, “Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy,” O’Neil described how, after receiving her PhD in math, she joined a hedge fund. During this time the financial crisis hit the US and upon examination, O’Neil recognized a corruption in what she calls, “the machine.” Not wanting any part of it, O’Neil left her position and became a data scientist, which at the time, seemed the more ethical path in the simple sense that numbers don’t lie. Again however, O’Neil recognized the root of the problem which in her words were, that algorithms are just “opinions embedded in code,” and at the very least, a scoring system in which “the person building the algorithm defines success.” This inherent bias behind an algorithm can cause unfair repercussions to those being tested.
In O’Neil’s talk, she introduces one case study of this bias, that is “The Current War on Teachers,” or the contemporary problem with education. The solution to improve the educational system is to get rid of the bad teachers and this was to be determined by the Value Added Model (VAM). VAM is a family of algorithms used to determine how much “value” a teacher brings to the classroom. The original definition of a “bad teacher” is based on students’ growth, specifically with respect to proficiency standard test, yet what wasn’t taken into account when first introduced were students’ poverty levels, domestic backgrounds, and health at the time of the test. More so, the SAT scores from which the data was collected measures a students preparedness for college, not how the function within the classroom over the course of a school year.1
To improve upon this latter point, the VAM model was adjusted to calculate the difference between an individual student’s scores at two disparate points, such as the beginning of fourth grade and the end of fifth grade. The “value” then, depended on a student’s growth over time. However, this set of data suffered from what O’Neil called a “noise error,” usually defined as an unexplained variance in a sample. The problem in this case was one of the sample size. The model might work had the teacher been scored on a collection of some 20,000 students but instead it was closer to forty and the data was “very, very noisy.” Students’ scores were wildly inconsistent, and in the case of educators in Tennessee, teachers were scored on classes they hadn’t taught.2 Inaccuracies like this persisted in DC where a missing suffix in the VAM code resulted in a teacher losing her job and bonus.3
The second example O’Neil posits is one of hiring algorithms. If “metrics of success” are based on the historical data of what a company considers a “successful employee,” there is room for bias. A successful employee at Fox News, for example, “stays for three years and gets promoted at least once.” In this case, the longevity of the work term and the event of a promotion are the metrics of a “successful employee.” However, if a work environment filters out certain demographics, the “historical” data on which the hiring algorithm is based, is flawed. In the the case of Fox News, in which the CEO, Roger Ailes, was pushed out following a slew of sexual harassment allegations, it’s no great leap to assume the work environment filtered out a lot of women. In this hypothetical example, the historical data of successful employees filters out women too.
Now in a turn of irony, I must reference the stats behind workplace demographics, specifically those of the higher tech industry. According to the US Equal Employment Opportunity Commission (EEOC), the high tech sector employs 63.5 to 68.5% whites versus that in the private sector. Men make up 52 to 64% and “in the tech sector nationwide, whites are represented at a higher rate in the Executives category (83.3 percent).” 4 Moreso, while there are more jobs in STEM than qualified applicants, “sources note that stereotyping and bias, often implicit and unconscious, has led to underutilization of the available workforce.” This bias within the workspace perpetuates itself in the form of the work being done itself, that is, the tech initiatives speak to a certain class. The industry focuses on the needs of their own market as opposed to those of the global community.
Juicero, for example, provides a $400 machine with wifi capabilities that squeezes accompanying fruit packets to produce fresh juice. Not too long ago, the Silicon Valley investors discovered that their $400 machine was superfluous to squeezing the packets. In fact, it was completely unnecessary. What is noteworthy here however, is that the product speaks to one kind of market: those individuals that would spend $400 on a machine that squeezes fruit packets. Innovation is not lacking within tech sector, but an appropriate sense of a larger cultural perspective is.
A recent case study is Bodega. Founded by two ex-Googlers, Paul McDonald and Ashwath Rajan, this glorified vending machine was created with the intention of providing convenience but instead sparked outrage nationwide. In New York, a “bodega,” spanish for “grocery store,” generally refers to the corner shops littered around the boroughs and sell everything from toilet paper to fresh sandwiches and lottery tickets. The source of outrage over Bodega spurred both from the misappropriation of the term and the general fear of putting local shop owners out of a job. Paul McDonald, CEO and co-founder of Bodega responded to the storm of anger that despite “speaking to New Yorkers, branding people, and even running some survey work asking about the name and any potential offense it might cause…it’s clear that [they] may not have been asking the right questions of the right people.” 5 In the case of Bodega, the issue is not of bias so much as a cultural cluelessness that is often manufactured from a lack of sociocultural diversity.
It’s sensible to create products for the young professional markets typically found in San Francisco and New York. And it makes sense to automate systems that would at some point, replace people. That’s how the Industrial Revolution happened. But many of these startups create solutions for the problems a guy in tech might have to deal with, leaving the larger market a few steps behind. Allison Arieff aptly described the situation in her New York Times op-ed, What Tech Hasn’t Learned from Urban Planning, “the further the tech sector gets from the reality of the problems it’s engaging with, the smaller piece of the problem they’ll end up actually fixing.” 6
Women-run businesses make up just 15% of venture capitalist investments today. This has improved from the 4% calculated in 1991 but the gap between those run by just men is still significant. Stonewalling this evolution has been sexism. Recent studies reveal that “all-men teams are four times more likely to receive funding from venture capital investors than companies with even one woman on the team.” 7 Some companies have gone so far in their attempt to overcome investor bias as to create a fake male cofounder. 8
Beyond the working market, one extreme case of bias would have to be that of Microsoft’s Twitter bot, Tay. The AI chatbot was released into the wild with the intent to mimic and respond to users. Within a day, Tay was spouting homophobic, sexist, racist, hate speech along with the best of the worst of the internet.9 Recent news reports have been picking up stories of AI mimicking bias 10 and this is because pattern recognition is often incorporated into machine learning. In the instance of Tay, speech patterns and the resulting “human-like language” therefore, exhibited human-like sexist and racist sentiments. In this particular example, the algorithm wasn’t biased but the data collection, the lines of speech, were.
The tech industry is shifting into a more diverse landscape but the progress has been slow. However, with niche communities galvanizing a call for action and Lesbians Who Tech + Allies is but one. Allison Esposito’s Tech Ladies provides a job board that promotes diversity in the workplace. Black Girls Code, founded by Kimberly Bryant, teaches young women of color how to code and perhaps more importantly, forges an environment in which tech is inclusive and accessible. Women Who Tech is quite literally disrupting VC backers propensity for businesses run by men by showcasing early-stage women-run startups. Women, Queer, and non-mainstream communities address problems that affect them, those problems that exist in a reality a white male wouldn’t be able to relate to and couldn’t cater to. This is why it’s so crucial to pay attention. Great ideas are born from seeking solutions to existing problems.
There can be bias in data but it can also be actionable. If an algorithm is created based on hiring history, this changes when more women, non-binary, and individuals of color are brought into the workspace. They change the hiring history for the next decade. But more than that, they change the perspective of a company and bring awareness to its voice. When chatbots like Tay start spouting homophobic slurs as a result of mimicking the general online audience, it’s a social obligation to educate that audience. Now naturally, it would be hard to stalk down every member of Reddit’s Red Pill forum (a platform for the MRA), but by actively investing in initiatives from the LGBT and other niche communities, and not just within tech, the conversation is forced to shift. The struggles specific to these groups stop being emotionally abstract. It’s easy to have a conversation about the social climate and even easier to share some “woke” post circulating Facebook. But actual solutions don’t come from instant gratification. Real change means real investment; emotional, physical, and monetary. We need to change the data.
Sarah is a fine artist with a background in physics. She currently works for Dexibit, applying big data to museums and cultural institutions. She lives in DC with her cat and aloe vera plants. You can view her work at sarahbonne.com.