156 | Catherine D’Ignazio on Data, Objectivity, and Bias

How can data be biased? Isn't it supposed to be an objective reflection of the real world? We all know that these are somewhat naive rhetorical questions, since data can easily inherit bias from the people who collect and analyze it, just as an algorithm can make biased suggestions if it's trained on biased datasets. A better question is, how do biases creep in, and what can we do about them? Catherine D'Ignazio is an MIT professor who has studied how biases creep into our data and algorithms, and even into the expression of values that purport to protect objective analysis. We discuss examples of these processes and how to use data to make things better.

Support Mindscape on Patreon.

Catherine D'Ignazio received a Master of Fine Arts from Maine College of Art and a Master of Science in Media Arts and Sciences from the MIT Media Lab. She is currently an assistant professor of Urban Science and Planning and Director of the Data+Feminism Lab at MIT. She is the co-author, with Lauren F. Klein, of the book Data Feminism.

[accordion clicktoclose="true"][accordion-item tag="p" state=closed title="Click to Show Episode Transcript"]Click above to close.

0:00:00.0 Sean Carroll: Hello, everyone, welcome to the Mindscape Podcast. I'm your host, Sean Carroll. Everyone knows, I think, that even though words like "data" and "algorithm" carry a certain patina of objectivity with them, in the real world, it's often the case that neither the collection of data, nor the analysis of data, nor the use of algorithms are completely objective. They have biases built into them, because all of these facts about the world or ideas about how data should be analysed are created by human beings. And human beings have their foibles, right? And we see this in action in ways both profound and trivial. There are algorithms that decide who people should hire, who should be suspected of committing crimes.

0:00:44.7 SC: Something we'll talk about in this podcast is crash test dummies. When car crashes are done by car companies to test them for safety, it used to be that all of the crash test dummies were modeled after men. None of them were in the shape or sizes of women, and as a result, you could actually figure out, ex post facto, the designs of seat belts and things like that for cars were noticeably less effective for women than for men. So, as objective as we might try to be, we're going to fall a little bit short.

0:01:17.6 SC: Think of it this way, this is one of the ways I like to think about it. You're standing somewhere right now, you're in a room, or you're outside, or you're in your car, look around and imagine trying to describe your immediate environment to somebody else in a completely objective way. You can imagine doing that, maybe you think you can do that, but the fact is you can't. You can say objectively true things, there are true things to say about the world, you're in a car, it's a Toyota, whatever it is, but you're making choices along the way. There're an infinite number of things you could say that are objectively true, but it's you [chuckle] who are always gonna be a little bit fallible and have your biases, have your history and your interests, and so forth, that choose for you what features of the environment matter, how to divide up the environment into the interesting facts, the uninteresting facts, etcetera. That right there is a way that non-objectivity creeps into how we characterise the world around us.

0:02:20.8 SC: Catherine D'Ignazio, today's guest, is a graduate of the MIT Media Lab, and is currently an assistant professor in MIT's Urban Science and Planning Department. And she's written a book with Lauren Klein called "Data Feminism." And it's not just feminism, it's really the intersection of data and algorithms, and how we are biased, and how we can fight it. Catherine is someone who is pro-data, pro-algorithms. Her message is not that science and technology are tools of the oppressor, or anything like that. They can be used to make the world a better place, but they're not always used in that way, and not necessarily even for pernicious reasons. Our biases, it's sort of a negative connotation word, but our individuality about who we are and therefore how we see and conceptualise the world creeps into how we talk about it and what we do about it, whether we like it or not. So being a little bit more conscious, being a little bit more cognitive, being a little bit more aware of what's going on can help us understand the world better. That's something we all wanna do. So, let's go.

[music]

0:03:42.2 SC: Catherine D'Ignazio, welcome to the Mindscape Podcast.

0:03:43.5 Catherine D'Ignazio: Thank you. Thank you for having me.

0:03:45.8 SC: So, "Data Feminism," the title of your book, and I'm sure we'll get into both of those, and I'm also sure that most of the audience, their eyes focus on the word "feminism," right? That's the thing that is gonna get people vibrating with either positive or negative valence. But I'd like to start with the word "data" a little bit. 'Cause I'm a physicist, we have something that we have in mind when you talk about data, coming from experiments, finding the Higgs boson, but in the modern world, the big data world where we're constantly being surveilled and tracked and things like that, data means something a little bit different, or at least the connotations are a little bit different. In your idea, what should our audience have in mind when you just say data something? What are the things that are flying around in this sphere of ideas?

0:04:32.4 CD: Sure. Yeah, yeah. Yeah. No, great question. So the data that we are referring to can include data from scientific experiments, of course, and obviously, most people, when you say data, their minds go to quantitative information. And so, of course, data includes that. Our definition is pretty expansive. It's information that's collected in a systematic way. It's a collection of similar things at some level, if you think about just anything that you can put in a spreadsheet basically, and also includes things you can't put in a spreadsheet. Many of the most interesting big data things are image data, for example. And so thinking about things that are not necessarily just rows and columns as we encounter them in databases, but also images, videos, audio, different kinds of things like that, which ultimately can be analysed both quantitatively and qualitatively, and decomposed into various kinds of parts, and then deployed to do different things, and even create new things, so images that are generated from other images and things like that.

0:05:57.4 CD: And then, of course, in Data Feminism, when we're bringing a feminist lens, we also argue for thinking about qualitative data, including qualitative data as a very equal counterpart to quantitative data, so not creating a hierarchy out of quantitative and qualitative data, and also really arguing for the value... For really valuing lived experience as a form of empirical data. And that's something that really comes from feminist theory and thinking, and thinking about like, "Well, for people that have been excluded from the historical canon, how do we share our stories and our data and our evidence that we bring to the table?" It's often been through stories and personal experiences and lived experiences, so really kind of thinking about those as empirical data, obviously of a different nature. It's not the same thing to say, "This is my story, and that's your physics experiment." Those two things are different. [chuckle] But yet at the same time, not denigrating it because it's not some kind of generalisable thing that everyone in the world has also experienced or something.

0:07:13.8 SC: And if I'm remembering correctly, you work at the MIT Media Lab, is that right?

0:07:18.3 CD: So I graduated from the MIT Media Lab, but I'm actually now a professor in the Department of Urban Studies and Planning at MIT, so different department.

0:07:28.0 SC: Are you a data scientist? There are people who will call themselves data scientists. It almost seems redundant to me, but it's clearly a growing field.

0:07:36.5 CD: Yeah, I would say I'm a data scientist. I would also say my... I'm really trained less as a data scientist and more as a software programmer, so I come out of software and database programming. And so more from the realm of systems development and application development is where I spend a long, long time, and came into data science through doing all this database programming, but also always being interested in art and design. And so for me, those two things always came together actually in maps. I'm a long-time cartographer and map maker. And, in fact, that's the course that I teach for our... That's the main course I teach in Urban Studies and Planning, is our GIS and Spatial Analysis course.

0:08:25.4 SC: I just had a podcast a couple of weeks ago with Jordan Ellenberg, who is a high-powered mathematician, geometer, and we talked about gerrymandering and the mathematics of figuring out whether a map is gerrymandered or not. Maps are surprisingly science-y, I think. It's a very important topic, and it's a very human...

0:08:41.0 CD: Yes. I love maps for exactly this reason. They bring together science, they bring together art and design, they bring together... I'm a very visual person, so they bring together the visual side. But, yeah, that's why I love maps, is 'cause I feel like they're the integration of these two sides of the brain, that they don't often talk to each other, they don't often encounter each other professionally, but then we find them together in maps. And so that, for me, is very exciting 'cause it can cause both, I don't know, friction but also brilliance, I think, at the same time.

0:09:16.2 SC: So when we say the word "data" as an adjective, there's data as a noun, which you helped define, but I guess what I'm getting at is, 'cause you know it already, I'm trying to get for the non-expert, what are the steps involved in collecting, analysing, presenting data? I think you got into the whole Data Feminism thing through the issues surrounding data visualisation, if that's not wrong. What does a data scientist do to go from the raw stuff of reality to some presented data?

0:09:48.6 CD: Sure. Yeah. There's this exercise I do with students where we start off... And it's an exercise that's about thinking carefully about the ways that we take the world and capture the world, is what Johanna Drucker would say, is instead of taking data, we actually are capturing data. It's super simple. I'd go out and I tell them to just go walk around for 15 minutes, and your job is that you're gonna classify people's shoes. And so you have to develop some taxonomy or classification scheme for shoes, and you have to collect at least 10 or 15 rows of data about those shoes. The interesting thing is, it's ostensibly a very simple topic, most people in the American universities have shoes, you can find lots of them, they come back with their data, but everybody has a different categorisation scheme, they have different ways that they have classified shoes. And so this starts to open up some of the complexities, I think, once we start to look at the ways in which we count things, the ways in which we aggregate things. The things that we find as being meaningful are not the same as the things that another person would find as being meaningful.

0:11:09.7 CD: So this idea... I often find myself in the position of teaching newcomers about data science and about data analysis methods, and trying to help them. I'm really... The whole thread of my work is about data literacy, so trying to expose people who don't consider themselves to be technical to start to understand how to use some of these methods, and that, in fact, they are not that complicated, they're just... As my colleague, Rahul Bhargava, says, "They're fancy ways of counting." [chuckle] And so this is what becomes interesting though, is that just the humanness of data. So even when we think that we're being so precise and so objective, there's still an infinite number of ways to classify shoes. You know what I mean? And that's not to say we should never classify shoes and it's never useful to classify shoes, 'cause it certainly is, and in particular if you're a shoe company. [chuckle]

0:12:07.5 CD: But it's more just so that we can start to be aware of some of the limitations of what we can and can't do with data, and understand that data, in their essence, are really... They are a reduction of the world. No row of data about a shoe is ever gonna describe the rich complexity of a shoe. And that's okay. We don't need data to do that, but it's just important that we remember that it's a reduction in complexity and that we don't confuse the data for the thing itself, because that's where we then get into all sorts of troubles, is when we think like, "Oh, these data that we just went out and downloaded from the web just represent these raw facts," because, in fact, they're not. They've been shaped and formed by the institutions that have set out resources and ways of collecting them, that had developed some classification scheme, that may have done their own analysis and aggregation on them, and so on.

0:13:06.8 SC: I love this example of the shoes, because it's exactly what got me interested in your book in the first place, the recognition that there is this tension between a discourse of describing the world with perfect objectivity and rigor, so that none of our biases creep in, with the reality that that's not a possible thing to do. And so we might strive for it, we might aspire to it, but even when we do something as innocent as collecting data on shoes, well, who classifies what shoe as what is an immediate choice, the choice to look at shoes rather than socks was a choice. And so there's all these choices that are flavouring what we are choosing to talk about, what of the infinite number of facts about the world we're choosing to collect and then display.

0:13:50.6 CD: Exactly. That's exactly right, that's exactly right. And it's sort of like, with any classification scheme, it could have been done differently. The important thing is to think about, "Well, how could it have been done differently, and why was it done the way that it was?" And understanding some of those motivations. Not necessarily because those motivations are gonna always be nefarious or something, often it's not that there's some evil institutional person behind it, it's just that it was done in a certain way for a certain purpose, which means there were things that were left out and there were these paths that weren't pursued, there were these other data that were not collected. And so drawing our attention to... In a way, the paths not taken. We talk a lot in "Data Feminism" about missing data, for example, as a way of reflecting on what are the structures and the powers that are shaping the data that we inherit and the data that we ourselves collect.

0:14:46.1 SC: Well, and one of the points you make is that not only does this human choice about what to do come into the data we collect, but then how to analyse the data, what to do about the data, how to visualise the data. These are all involving human choices.

0:15:00.4 CD: Yes, exactly. Yeah, every stage of the process, it's like all of these... The pipeline, as it were, at all of these stages of that process, there's these very intentional choices, there's this particular set of actors that are working with the data, there are certain goals, there are certain audiences that they wanna reach. So, yeah, it's sort of like... A lot of the book is deconstructing some of the myths about data science, which often are things that data scientists themselves they know this really well.

[chuckle]

0:15:33.7 SC: Yeah, they're familiar.

0:15:33.8 CD: It's more like in the popular perception of data science or of statistics or of algorithms that you have to deconstruct that, that they're these perfect black box systems that are gonna always perfectly predict certain things. It's almost more the popular narratives that we are trying to deconstruct because any data scientist, I feel like, who is worth their salt, they know their data intimately, and they also know the limitations of their data intimately. And if they are responsible data scientists, they're not gonna be going out and making these wild claims with data that have all these limitations.

0:16:15.6 SC: There's actually a joke within physics that nobody believes a theory that comes from a theoretical physicist except the person who proposed it. And everybody believes an experimental result, except the person who did it because they're very familiar with all of the important things that came along the way.

[laughter]

0:16:31.3 CD: That's awesome. That's fabulous.

0:16:36.1 SC: Do you think it's generally true, that people on the street are a little bit overly trusting of the data that they see presented to them?

0:16:44.3 CD: Yeah. Yes, I think so. And, in fact, I encountered this in my own classes. Prior to arriving at MIT, I taught at Emerson College in the Journalism Department, and so I taught data analysis and data visualisation to journalists. And the... Journalists are a group of folks, they would come into the class saying, "I'm not good at math, I can't do numbers," which, I actually looked at their standardised test scores, was completely a lie, they all did great on standardised tests, they're fine. [chuckle] But they had this image of themselves as, like, "I'm a word person and not a math person." And one of the... Having that image, honestly, of themselves, inhibited their ability to be skeptical of numbers because they were overly... They'd download some data on the internet and immediately just believe that the data were true, first of all, not to understand that they need to do in the same way journalists are taught to do a verification process for what kind of quotes and facts that they put in their articles, the same process needs to be done on a data set that you're inheriting from another actor or institution that you're using. So a lot of those I kind of teaching them about that process.

0:18:01.2 CD: And just having not worked with numbers intimately, there was a kind of over-trust or an over-placing of confidence in the numbers that they inherited, and a kind of slippage where they imagine that... I could see it in their writing when they would write with numbers, they were often throwing in, if there's a decimal point, they would give all of the decimal eight places. [chuckle]

0:18:29.5 SC: Too many significant thinkers.

0:18:30.3 CD: It's like you really don't need to know 54.3.756%, whatever. But there is this need, they felt, to assert themselves with numbers and precision and things like that, but it was precisely 'cause they were insecure about it. So, I think these are ways that, yes, there is this placement of faith, particularly when people feel a level of insecurity or underexposure to data or to math or statistical ways of thinking, that they're like, "Oh, I'm... " It triggers this, "Oh, I'm not that kind of person, I need to place my trust in the people that are that kind of person."

0:19:08.2 CD: And so a lot of what I did was breaking that down and saying, "No, in fact, you can do this work, particularly basic descriptive statistics are within everybody's reach," and teaching kind of a skepticism, which is a healthy skepticism, but a deeply important one right now, particularly in the climate of misinformation and, I don't know, bad information actors on the internet, that is also happening with data as well.

0:19:40.0 CD: So thinking about how do we have a healthy skepticism and a kind of a citizen interrogation of data sets and being able to do a kind of a power analysis of a particular issue to understand how data may have been impacted by structural bias, by issues of power and things like that, so... And that's knowledge they can draw from that's worlds they've been exposed to, but those are pathways that they're not just gonna come naturally, I think those are sort of muscles and skills that need to be taught.

0:20:10.6 SC: Well, and you have examples in the book of famous mistakes that people made in the media with data, for example. The one that struck me was 538 doing a story, I think it was on mass shootings in Nigeria, they plotted the number of them, secretly what they were plotting was just the number of stories about mass shootings in Nigeria, even if they were about the same shooting. But I guess everyone makes mistakes, that's fine, we all misread things, but there's something about putting it in a chart and making it look all objective that makes us just more likely to say, "Well, that's just the fact, there's no mistakes there."

0:20:44.9 CD: Exactly, exactly. And so this is sort of one of the things we talk about in regards to data visualization, which is that our methods for data visualization are almost dangerously seductive, right? Because we can do these really dazzling things, we use these very precise lines, we use these geometric shapes, and then it looks by all accounts to be quite true, it's like, how could we ever question that, but in fact, exactly this, there is this fundamental misstep in that project where the journalists sort of confused the object of analysis basically, is instead of actual kidnappings, we were actually talking about media reports...

0:21:29.9 SC: Kidnappings.

0:21:31.3 CD: Kidnappings which are really quite different things, and unfortunately led to the retraction of the article and things like that. So yeah.

0:21:40.0 SC: So let's move our way into slightly less innocent mistakes about... Or not mistakes, but ways in which we think about and present the data, you have a wonderful quote that you sort of secretly agree with yourself... Yourselves, both of you, which is the data is the new oil. Explain what that means to the people who generally say it and also to you.

0:22:03.1 CD: Sure, yeah, so this is such an interesting metaphor, and this metaphor was circulating a lot when we first started writing the book, and I feel like I hear it now less, so I don't know, I should do some Google Trends thing on it or something. [chuckle] But probably, anyone who's following conversations about big data has probably heard this phrase. It was first said, we actually traced its history, so it's first said in the mid-2000's, around 2006, '07, and then it was really boosted. I think of it as the meme, so it's boosted as this sort of meme by The Economist Magazine and I think 2011-2012 type time, when they did this whole issue on data and all of the great profits that can result from extracting data, using data from social media, to infer various things to kind of feel what you might call business intelligence these days in the corporate world, and the metaphor is really interesting because we can think about oil and we can think about data, and in fact, the verbs that we use about both of those things really line up.

0:23:14.5 CD: So we extract, we mine, we clean, we refine, we process. All of these are metaphors for oil, or they're not metaphors, they're things we do with oil and they're metaphors that we use for working with data.

0:23:32.3 SC: Pipeline.

0:23:32.8 CD: Which is very interesting because oil is quite extractive, right? Like oil is an industry of extraction, and so the interesting thing with the metaphor when folks like The Economists are using it is like they're using it in a very positive way, meaning profit, we can extract this natural resource, sort of "natural resource in the case of data," and then kind clean process, analyze, deploy, whatever, and it will yield great riches and profits, but then the question is always sort of like for whom? We raised that a lot in the book, is that we call them who questions, and that's often what feminism is good for, is good for asking who questions, so it's like, even in the case of oil, you look at like, who benefits from oil?

0:24:21.1 CD: If we're saying data is the new oil, and that's a good thing, who has benefited from oil in the past, and what kind of externalities have been sort of created in the process and who has borne the brunt of those? So yeah, we sort of unpack that metaphor and don't disagree in the sense that I think it's an apt metaphor to describe what a lot of corporations are doing with data, if you look around the companies that are making the most money with data right now are the companies that are... They have the resources to collect or analyze, deploy, etcetera. These data-driven products and services. But it's again, I think that metaphor of the extraction is something we need, really need to think about in the process.

0:25:10.5 SC: Well I guess to me it's quite obvious that oil as a concept is a double edge sword. It has many good things about it has helped industrialize the world, increase standards of living, it's pretty clear that there's also bad things about it, climate change. And I would argue that it also has led to massive inequality, some people get rich, others don't, and the data is the new oil metaphor lines those up. And it sounds exactly right. Is it exactly right? Do you think that the data has this property that it will have just as many bad... I shouldn't even say just as many, I don't know what the metric or the measure is, but very noticeable, bad properties, bad effects, just like it will have good effects.

0:25:55.5 CD: Yeah, yeah, I think so. I think in the sense of thinking about what are the negative externalities created by this, the data economy, which some people, I don't know if you've heard this, the other metaphor people have been using is the fourth industrial revolution.

0:26:11.6 SC: I didn't know that, okay.

0:26:14.2 CD: Around the sort of big data economy. I don't know, I scoff a little bit of that, I have to say. [chuckle] 'Cause again, for who? Who is benefiting? And one of the interesting things with data and the negative externalities that are created by the big data economy is that the externalities are very similar in a sense, like the negative externalities are very similar to the ones that oil create. So if you think about even just one of Facebook's data centers that was located in New Mexico cost something like 30 million dollars a month in electricity to operate like this is a tremendous ecological energy intensive, sort of earth-exploiting thing to be basically just fueling us to talk to each other. Do you know what I mean?

[laughter]

0:27:10.2 CD: And so like thinking about there are environmental costs to the cloud as it were, but then there's also all of these social and inequality generating costs as well, and we go into some of those in the book too, wherein if the power of data is really centralized in the hands of large actors, 'cause that's really who have the resources to be able to mobilize it, they're gonna use it to their benefit, and then they're not necessarily gonna use it to the benefit of sort of human health and wellness more generally speaking. And so I think that's a big problem if it's really the corporations, and to some extent, the elite governments and universities that can mobilize data, and the rest of us are just kind of left in the dust.

0:28:05.9 SC: And that's a concern centered on, I guess, data plutocrats, if that's a category of people we can point to. But you also make the point in the book that even among data scientists, people who we would imagine are trying to be objective in finding true things about the world, there are value systems that have sort of sneaky choices built into them. So there are values that are valorized, I guess I'm running out of vocabulary words here, but things like ethics and fairness and accountability, and all of these sound great, but they can be used to actually silence voices that are being slightly contrary and so forth, whereas other values like fairness or equality... Sorry, fairness was the other category, justice or equity are not talked about as much. So maybe say something about how it's not just the plutocrats, it's even the scientists who sort of maybe even innocently stumble their way into putting these values in a hierarchy.

0:29:06.2 CD: Totally. Yeah. Thanks for bringing that up. One of the observations that we make... So first of all, I think, one, it's encouraging that there are... There's a lot of computer scientists right now, and a lot of folks in the technical community are talking about what it commonly goes by is fairness, accountability and transparency, are the things... There's kind of a whole set of conferences that are organized on these topics. What that work tends to cover is work that's looking at algorithmic discrimination, discrimination and bias in large data sets, in training data sets, running from text, natural language processing to images and things like this, and then thinking about how do we make these things more fair.

0:29:57.3 CD: But then there's been some interesting pushback on that work, and we're not the only ones doing that, saying, "Well, why are these particular values, this fairness, accountability and transparency, why are these the set of things that we're organizing around?" And in particular, because a lot of work... And this is not all the work in the space, I don't wanna be caricaturing it or something, but a good amount of the work in this space, when you take a concept like fairness, often what these things try to do is say, "Okay, well, let's say we have some discriminatory system like credit lending," or something like that, and what we're gonna do with our speculative algorithm is we're going to just tune these knobs and levers so that any kind of racial bias or any kind of gender bias is sort of excluded from the system. You kinda get the bias out of the system.

0:30:54.1 CD: And so it's not just saying that those approaches have no value, but at the same point, one of the things we would say in response is that they're actually not addressing... They don't come with a full kind of conception of the problem. So they're not coming with a kind of root cause analysis of how did we get to this point, and how do we get to the point that lending to Black folks in the United States is perceived as higher risk? This is through centuries of really deliberate sort of design and discrimination that's been built in from kind of the very beginning, post Civil War. So we've had all sorts of... We have big data tools like redlining, racial covenants, both informal and formal legal tools that have worked to keep that kind of system in place.

0:31:51.4 CD: And so just looking at it, let's say, mathematically, within a kind of a closed system, unfortunately, it fails to account for these hundred of years of history. And almost invariably, the work that starts with this fairness lens is starting from time equals... T equals zero or something. And rather than saying, "Actually, we need to take the past 200 years into account." That needs to be part of the model. [chuckle]

0:32:19.4 SC: That's part of the data. Yeah.

0:32:21.8 CD: And so that would be the approach that's about equity and justice. And so that's, in a way, the challenge, is like, How do we do that? And we can't do that in a race blind and gender blind way. It just doesn't work like that, because history is not race blind and gender blind. We have to acknowledge that past, and we have to account for it somehow in our systems, so... Yeah.

0:32:47.6 SC: Maybe we can get some good examples on the table here to ground people's thoughts, 'cause I think that there is an extremely naive point of view, which would just say, "Look, if you have an algorithm or if you have a data set, by definition, it can't be biased, it's just a computer." I say that's a very naive point of view, but I know people who have it.

0:33:05.6 CD: Totally.

0:33:06.4 SC: And of course in the real world, algorithms are trained on data sets and who chooses which data set and what history is the data set reflecting. One that struck me very vividly recently was a fun thing going around Twitter, I think it came out after your book, but take a sentence like, "She won the Nobel Prize," and put it in Google Translate into a language that does not have gendered pronouns, and then translate it back to English, and it's always, "He won the Nobel Prize," when it comes back, right?

0:33:33.1 CD: Yes. Yes, exactly.

0:33:35.0 SC: So maybe explain more about how, and examples of how these biases from history or from ignoring history creep into purportedly objective algorithms.

0:33:45.1 CD: Exactly. Yeah. This is great because I think this goes back to the other topic we were talking about, was that... Is that kind of faith, like an over-placed faith in the systems that they are objective and that they are gonna work like that. So yeah, so these things happen because again, it goes back to the human-constructed nature of the data sets, if we're going to make... Let's take facial recognition, which we talk about in the book, so if we're gonna make a system that recognizes a face and is able to distinguish a face as opposed to the background in a photograph, we need training data 'cause a system needs to be able to look at a bunch of images and have ones that have already been annotated and say, "This one's a face and here's the exact location of the face." Looking at many, many, many of those is gonna teach the system where the face is and what faces look like? And so on. But one of the things, for example, I look at the work of Joe Bwalamwini, who's a colleague at MIT, she's a Ghanaian American, she has very dark skin, and she sat down in front of her. She just got a off-the-shelf facial detection library cause she wanted to use it in a class project, and it just wouldn't see her. It wouldn't put that little box around her head. So just like that is weird.

0:35:14.4 SC: The invincible man. Yeah.

[chuckle]

0:35:15.9 CD: Yeah. She got her White friend, it recognized the White friend, she got an Asian friend, it recognized the Asian friend. And then she took a... She had this white theater mask and she put on this white theater mask and then it recognized her face. So we have to think about what's happening behind that system, is it that the engineers are racist? No, the engineers were not sitting there being like, "Yeah, we're definitely gonna discriminate against Black women." But what happened is they had... She's a very technical person, so she dug into the guts of the system, and she ended up writing a really important paper called Gender Shades with Timony Gabriel where they audited the training data libraries that are used to train these kinds of systems, and in the resulting analysis that they looked at, it's sort of like the problem of what data are available is often like the biggest source of bias.

0:36:12.1 CD: 'Cause in the case of face data, these are celebrity and political people profiles, so these are often the data that are used to then train the data sets, but then we can think about, "Well, who's a celebrity, who's a political person, what kinds of racial and gender biases are those, is it gonna be mostly white people, mostly men." And in fact, what they found out is something like... And I'm forgetting the exact figure, but something around 88% of the faces in this benchmarking training data set were what they call pale and male, and actually they didn't write their race, they were looking just at the skin color. And so, of course, a system that's trained with that kind of training data, is gonna fail really badly. It works great for White men, 'cause that's the kind of user group that's really centered in that kind of system, and then it works really terribly for Black women because there are so few images of Black women in the training data, and so it's sort of like thinking about... It's not really that the algorithm is biased, it is that we are biased. [chuckle]

0:37:23.4 CD: And it's even also that even in building the system, so it's like the engineers weren't sitting there being like, "Ha, ha, ha, let's be racist and sexist." But also they didn't have mechanisms to seek out the inherent racism and sexism that will show up. Inevitably, they didn't have the tools to look for it, they didn't have the checks and balances to be able to check for it before it ends up that the Black woman discovers it on the tail end of things and then sort of exposes it from there. And so that's sort of like how we end up with these things is 'cause as we build human systems, we're pulling from the human and social world to train those systems. And that world is not a... Hopefully, one day we'll live in a world where we are not racists and sexist but right now the world is racist and sexist. Garbage in, garbage out.

0:38:24.9 SC: Exactly right, yeah.

0:38:25.3 CD: That's what we have to deal with. So we have to look for the garbage basically. [chuckle]

0:38:28.0 SC: This really highlights, I think what to me is the big looming philosophy question here, epistemology question or whatever you wanna call it about objectivity. I can very crudely distinguish between three attitudes that we might have towards objectivity, one is, objectivity is good, and we have it basically, science and computers and data are pretty objective, even if we humans are flawed. Another attitude is, objectivity is something we should aspire to, but we don't have it, we should be extra careful in trying to get there, and a third attitude is, the goal of being objective is just misplaced in the first place, we shouldn't even try... We should recognize our individual non-objective goals, so where do you sit in that classification scheme?

0:39:16.8 CD: Yeah, that's helpful. So I think what I would say, and this is where I think some really interesting feminist theory comes in too, so folks like Donna Haraway and Sandra Harding who have thought through these questions really deliberately and specifically, we can draw a lot from them, so they have the ideas of something called feminist objectivity, Donna Haraway specifically talks about situated knowledge. And so I think where we can't end up... And in fact, I don't think any... No, I shouldn't say that. [chuckle] I don't think any feminists would say but I'm not... I guess I won't speak for all feminists, [chuckle] because maybe some of them would say this.

0:40:05.5 SC: Exactly.

0:40:05.6 CD: I think where we can end up... And I find it untenable a position of like, to everyone their own individualistic truth and truth is super subjective and everything is relative, I think that's untenable because we have to be able to arrive at some collective shared understanding about the world, and I think it's very dangerous when truth is just fiction. [chuckle] We've seen that happen recently. So I think that's a very dangerous world to me. And yet at the same time, I think we have to recognize that the current conception of objectivity as it exists has also been exclusive. It hasn't been including all people, all genders, all races, all communities into this fold of objectivity.

0:41:00.1 CD: And this is something feminists have critiqued for ages. A more recent critique is Ruha Benjamin who calls this imagined objectivity. And it's not to say that... We've done some just incredible scientific achievements following these kind of tenets of scientific method. And yet in particular, when it comes to how science and objectivity meet the human and social world, our disciplinary-ness is really constraining for us. It's like this is what's leading to I think some of the confusion on the part of the technical community for how do we do this better? And often when you're trained in computer science, you're not given... You haven't really been trained in gender studies. [chuckle] You know what I mean?

0:41:48.6 SC: Typically no.

0:41:48.8 CD: You haven't been trained in history of race in the United States. And so we're finding these places where our knowledges haven't been able to... These bodies of knowledge need to meet each other and they haven't met each other and been incorporated. So I guess in your scheme, I would be, I think somewhere in the middle. I think one of the things that Haraway would tell us is that all knowledge is situated. So we're all in a particular context. We're in a particular country, geography. We have a particular value system. And so the strategy that feminists would advocate for, what you would call strong objectivity or feminist objectivity, which was something we pull in more people from more places and we try to recognize more of our own blind spots, more of the ways in which our rigid conception of objectivity may be excluding people. And so how do we bring people to the table to understand the kind of cultural and social boundaries of our knowledge better? So maybe that's a little bit long-winded. [laughter]

0:43:00.5 SC: Yeah, that's what we're here for.

0:43:00.6 CD: But I think that's... It's like the feminist or my feminist heritage would be we don't like to throw objectivity out the window. And we definitely don't throw away collective knowledge making. But we do have to include more people. And we do have to break down some of the exclusionary norms that end up pushing people out of the "objectivity."

0:43:28.0 SC: I guess my temptation upon hearing things like that, which was actually very eloquently said, and a lot of me wants to agree with it. But then there's a part of me that wants to, say, also make it even more complicated by saying, "Look, the charge of the electron is equal in magnitude and opposite in sign to the charge of the proton." And that's not situated anywhere in particular. That's everywhere. It's absolutely universal. So I guess I want to imagine a spectrum of situated-ness of knowledge where there are some raw physical facts that are pretty much universal, and some social statements we could make that are pretty obviously situated. Is that a fair way of thinking?

0:44:07.2 CD: Yeah, I think so. I think it's like the closer we get to the human and social world, I think this is where when we try to take physical science laws, for example, and then apply those to human circumstances, I just feel like it's a recipe for disaster. [laughter]

0:44:28.4 SC: Yup. I agree on that.

0:44:28.9 CD: I think it's like these kind of laws will not... And people try to do that. They've tried to find like what's the underling grammar of a city or what's the underlying law that guides this particular human behavior? And the thing is that humans are just more complex than electrons. Maybe one day we can get there. You know what I mean? In terms of having some laws that are gonna be like, "Oh, Katherine, right after this podcast is gonna go pat her cat." We just know that about her. But I think this is what happens in the human world, is that just things are messy and there are so many variables that the scientific method way of like, "Well, let's take this one model and exclude all this other stuff and just pursue this one model of reality, down this particular path."

0:45:20.3 CD: I think that can work for certain kinds of things, but it really can't generalize and apply as a method to all these different human and social circumstances. So those are things that we really need to understand as being contingent and/or we may discover some laws as it were a fact that might apply in more contingent circumstances. So this applies, generally speaking, in Western cities that have this particular mode of transportation. You know what I mean? And that's still really useful knowledge. But it's also really useful to know that it's only in this kind of city. It's not through all kinds of cities. So I think that's where the proponents of unqualified objectivity can be irritating to feminists.

[laughter]

0:46:11.4 CD: Is, where you're trying to make these somewhat absurd universalist statements or generalizations and imagining that it's always about generalizing and universalising. And so understanding when is universalisation and generalization appropriate. And when is it more appropriate to just understand something that's deeply situated and contingent in its own circumstances. And that's where a lot of a more qualitative social science research comes in and things like that.

0:46:41.2 SC: I guess what... My response to that, and I apologize for talking too much myself, but I'm working through this in real time because it's interesting, it goes back to the shoes that we started with. And I guess one could say even with the electron and the proton, you claim that's an objective fact about the world, but someone chose to describe the electron as one particle and the proton as one particle rather than different combinations or different agglomerations, there was some carving of nature that was human, that we turned... That we showed was useful after the fact, it's not completely arbitrary, there are good reasons to do it, but it's still done by human beings, the difference being that when it comes to fundamental physics, it's pretty easy to do that, it's pretty straightforward to agree on how to carve up nature that way, and when it comes to human beings or even shoes, that's gonna be much less easy, much more prone to sneaking in our biases as some objective measure of something.

0:47:44.3 CD: Yeah, yeah, and I think variation. And so it's sort of like with the proton and the electron it's like, it's gonna be able to behave in a particular way, and that's super repeatable, and then you can kind of demonstrate, just even through repetition, that this is pretty much always how this particular thing works, whereas with shoes, like if I do the shoe experiment in my neighborhood versus in a different place versus with a different group of students, it's gonna be different every single time. And it's gonna be different actually in a similar way, I should say, that's why I use it as a learning exercise, but that's what gets interesting to talk about is that, that sort of variation in that, but it's not repeatable, you don't always come back with the same result, and that's for me what makes sort of data in the social and political world the most interesting thing because they are contingent and yet they're still useful, we can still do meaningful things with them, like I said, they're a reduction of the world, but hopefully they're a helpful reduction that we can still use in some meaningful way.

0:48:51.0 SC: Well, thank you for indulging my descent into the philosophical rabbit hole there, but I do wanna come back to the data and the feminism. Let's get some more examples on the board, because we talked about how you make an algorithm for facial recognition or for translation, it can be biased. You make the point in your book that even the choice of which data to collect, smuggles in all sorts of presuppositions, and there's a whole bunch of data sets that either should exist or that do exist, but don't include data that they should exist. How should the people collecting the data be thinking about these issues?

0:49:26.1 CD: Yeah, absolutely. And I think people collecting data have a special... It's almost like a special kind of responsibility, right, because you really do have to be thinking about not only sort of, how are we gonna collect the data and store the data, but how are we gonna steward the data, how are we gonna provide information like metadata about the data, in terms of who's gonna come later. One of the, I think, really complicated things... I find things like open data really interesting and useful, and then also complicated because it's sort of like, you have governments that are collecting data and they're collecting for a really specific institutional purpose, and then they're publishing it, often with really bad metadata, [chuckle] so like you don't know where it's coming from or like what the columns represent, and you have to do all sorts of legwork to figure that out, but then imagine that we can then use that to do something else that the data were not intended to do.

0:50:44.5 CD: So in fact, this is why I have a lot of, I think, admiration, I would say, for people in sort of library sciences and in fields that are about stewarding information, because it's really about thinking about like, how do we become good caretakers of information? And not knowing, at the time of collection, all of the possible ways that the data might be used in the future, and so I think that's sort of what is complicated about it, 'cause of course, you can't anticipate all those possible future uses and other data sets that people might wanna combine it with to infer something else or whatever. So yeah, I think... But that's not to say we shouldn't collect data, it's just to say that we have to do it with some of those caveats and things in mind. And then I think in particular for data feminism, we think in particular about ways that structural forces of inequality, like sexism and racism can enter that process, so really tuning into like, what are those ways that they enter?

0:51:38.3 CD: A really obvious one is when you collect gender data, for example, and so when we are collecting gender data, thinking about, first of all, why are we collecting gender data? But then also thinking about how it's... 95% of the time when I'm filling out a form that is asking for my gender, there's only a binary choice, but there're far more than two genders, and so kind of thinking very carefully about how some of our categories that we've sort of naturalized, so we inherit this received wisdom of there's two genders, when in fact there aren't, like empirically speaking, there are not two genders, same with race, sort of thinking about how do we ask people to enter their race or how do we racially identify people, all of these are I think really fraught, but then even things that are not overtly related to, say, race and some sexism, gender, race, identity categories, but then that can often be used to infer those things.

0:52:43.6 CD: So for example, in the United States, things like zip code, you can infer somebody's race with something like 80% accuracy from their zip code because our country is so segregated, and so thinking about ways that even collecting something like a zip code can be racialized, like somebody could use that or a system, not even intentionally, could end up differentiating people and sorting people racially by proxy through the zip code. And so these are the things to think about as we're collecting data. It's like what data may reveal about us or about these different group-based identity categories that may have implications downstream for whatever system you're building.

0:53:32.7 SC: On the specific issue of the two genders, I just wanna, for the audience's benefits, say two things. Number one, one of the very first podcast I did was with Alex Drager, and we talked about all the different ways one can fail to fit into the natural categories of the two genders that we're most familiar with. But then also in your book, you have this lovely chart that I had never seen before. I don't know if it was a brand new with you or if you got it from somewhere, that traces all the different ways you can end up in between the traditional poles of a biologically male and female. And this is not even about gender. This is about sex. It's just biology. You forget about psychology. [laughter]

0:54:14.0 CD: Yeah, I love that piece. It's a piece from Scientific American and it's called beyond XY and XX or something like this. And yeah, it's a beautiful flow chart. The designers and the research team looked across all the most recent literature in sex differentiation and showed how, again, our received wisdom is that there's maybe three sexes, male, female, and intersex, and that it's biologically determined at birth and then it's just set. But it's a beautiful flow chart visualization that shows how no, in fact, sex is differentiated and it unfolds. It's dynamic.

0:54:57.7 SC: Over time.

0:54:58.4 CD: Over time, yeah. I just love the piece because it really complexifies... Even in, I would say, gender studies. Because in gender studies there is also received wisdom of like, "Okay, well, gender is more of your identity. And sex is a biologic thing." So even in gender studies there is that... That's the received wisdom there. And so I love that that piece is complexifying and challenging that received wisdom.

0:55:26.5 SC: The word complexifying is great here because that's a lot of the work that you've set for yourself, taking clear and easy distinctions and saying, "Well, it's not always quite that simple." And the one thing I loved about the chart is there's nothing in there about your feelings. It's about a mutation in this gene leads to a decrement in this particular hormone. And it's hard to look at the chart and go, "Oh, you're denying the science." It's exactly the opposite of that.

0:55:52.0 CD: Yeah. No, it's like showing the science.

0:55:53.1 SC: Yeah it's just being honest with the science.

0:55:54.8 CD: It's bringing us up to date about the science, actually.

0:55:56.6 SC: That's right. And the one other example I love because it's just so direct and relatable is the crash test dummies and how they've affected ideas about safety in cars. I'll let you tell that story if you want to.

0:56:11.8 CD: Sure. Yeah, no. So for years and years, we only used a male-sized crash test dummies. So these were based on the statistical average of the male body. And in fact, what this led to is that pregnant people or women were 40% more likely to be injured or die in a car crash because we hadn't been basing it on women's bodies. So this is just such a clear and it's also such a harmful example of the ways in which... Just a simple thing of not considering gender differences in bodies. [chuckle] It's such a clear example of that. And it's very similar to... It was only fairly recently that the NIH mandated that you have to have equal numbers of women and men participants in health research studies, because for many years women were excluded from scientific research due to, evidently, their menstrual cycles.

[laughter]

0:57:24.0 CD: So I'm just imagining a bunch of men being like, "Oh, I don't know. They're weird. They have menstrual cycles. We can't include that 'cause that might mess everything up." [laughter] But I just find that so interesting because men have hormonal cycles too. So it's like their hormones will mess things up if women's [0:57:43.1] ____ anyway. So yeah, but so I think that's a really clear example of this thing where, again, that kind of goes back to this objectivity question of true objectivity maybe that would be great, but we don't have it yet. [laughter] There's all these ways that we have excluded and we are still pretty far from including. And so we have to work on that.

0:58:08.0 SC: Well, the crash test dummy example is a good one because it seems like such an obvious mistake. If you're trying to be objective and you know that different people come with different size and shaped bodies, isn't the most objective thing to do to try to get the cross-section that is as fair as possible of bodies. But these human flaws, and again, sometimes they might be pernicious, but sometimes they might just be as human beings, we're finite and fallible, they get in the way. I remember the story and I'm not gonna get the numbers right, so I'm sure someone on the internet will correct me, but Sally Ride when she was the first female astronaut in the space shuttle, they packed something like for a week long trip, thousands and thousands of tampons because well, who knows?

0:58:57.1 CD: Totally, I remember this story. [chuckle]

0:58:57.2 SC: All you had to do was ask somebody. Even if it was all men doing the planning, how hard would it have been to ask? And that got in the way. [chuckle]

0:59:06.5 CD: Right. I love this story. Yeah, I know, I think that this story is hilarious. But it shows you how it's things like social stigma, how they shape our inability to even have these conversations. Like the fact that they couldn't ask her until she, I don't know, had to open up some cabinet and a bunch of tampons fell on her head. [laughter]

0:59:24.4 SC: Or any other woman. They just had to ask for any other woman in the world.

[laughter]

0:59:27.3 CD: So yeah. I think that's the thing too. It's like, if you just think about that as a bias, but then magnified up into what do we like take on as a research subjects, in terms of, who's gonna work on menstruation, for example. If the majority of scientists and health, let's say, think that menstruation is weird, they're not gonna work on it, or who's gonna work on Transgender health? Or who's gonna work on segregation? So it's sort of thinking about how these taboos, norms, stigmas, values, sort of human and social things, they creep in and they affect things, and they are pernicious ways, but they're not often... I would say the majority of the time they're not often from some one person's brain being like, "I'm gonna do this," they're systemic, they're part of this sort of, again, this systemic inequality that we swim in every day, and so that's why it's important to have a kind of, both a theoretical and kind of the material understanding of like, what is this water that we swim in? How do we look out for those things? And how do we ultimately create better work and better science by being able to recognize those things and take steps to avoid them in our own work?

1:00:57.1 SC: Well, this is... I might be going... I might be generalizing too glibly here, but I think that this is one of the reasons why the tension about feminism is particularly sharp in my own field of physics and closely related fields of computer science and philosophy and things like that, because on the one hand, they are fields that valorize objectivity and treating everyone equal would be part of that goal. But on the other hand, they make all of their money out of simplifying things and boiling things down to the essence and treating things like spherical cows. So this messiness of saying, "Well, you know, there's actually a lot of heterogeneity in the sample," and things like that is anathema a little bit to the method that has been so empirically successful in these fields. So there's a mismatch there, which... Maybe shining some light on it will make it a little bit better, I don't know.

1:01:53.5 CD: Yeah, no, I think that's a really important observation, and I still... I hold hope for sort of the quantitative fields, in a way, what I hold hope for is when I see really excellent mixed methods research, because I think we do get somewhere by simplifying, we just have to remember that we simplified. You know what I mean? [laughter] Again, we can't confuse the representation of reality, for the reality itself. We can't confuse the spreadsheet, for the truth out there, even though they are interconnected and you have a relation there, that's important. And, so I think for me, I get excited when I see, sort of, the mixed methods and in one example, like a concrete example of this is, Mary Gray has a book called Ghost Work, and she wrote it collaboratively, and her co-author is a computer scientist, she's an ethnographer and qualitative researcher, and it's all about, the sort of, behind the scenes labor of the platform economy, like Uber basically, on all the people that work behind the scenes, they showcased a lot of folks in India, for example.

1:03:08.0 CD: That they're kind of like, helping the algorithm along the way as it does things, they do human in the loop, sort of, approvals of things and checks on people's licenses and stuff like that. And it's this really lovely study because, Mary will be doing ethnography and interviewing people in their houses in India, and she'll surface an interesting question, and then her partner will go test that computationally and quantitatively across a bunch of network data, and then he'll surface some interesting questions and then she'll go try to validate those in interviews, and I get really excited by this kind of work, because I feel like it's building on the strengths and the limitations of both things, 'cause the qualitative work is like, okay, you can interview like...

1:03:54.5 CD: Even if you interview a 100 people in India, it's not like you've interviewed the population, [laughter] and then for the quantitative work, it's like, "Okay, you've got all this really great coverage in terms of scope and scale, but then what explains the variation?" And so, I loved how they can go back and forth and make it really multi-scaler, and so that for me, it's a little bit closer, at least when we're talking about things in their human and social and political realm, to being able to use these kind of different field's methods to their best advantage and in a complementary way. Again, to get further to what is really explaining some of this variation on the ground.

1:04:35.8 SC: Yeah, and again, if you put it that way, who can object to the idea that we should have these richer methodologies, and we might find something out, I don't know, but people can be a little bit resistant, which I guess brings me to this question I probably should have asked within the first five minutes, but how do you define the word feminism? What does that mean to you? That's one of the points of contention, which maybe it shouldn't be. Whether or not you think it has a positive or negative connotation?

1:05:02.5 CD: Sure, yeah. So for us, and I should say, there's many feminisms, and Lauren and I in the book, are specifically pulling from intersectional feminism, but at the very basic kind, if you are a person who believes in equality for all genders, you're a feminist. And that's kind of the basic definition, and that is feminism, is a belief that all genders are equal, and there's a kind of a corollary to that, because if you believe that all genders are equal, and then you look at the world around you, you can see that that equality has not been realized in the world, and so feminism compels you to take action, to realize the world in which all genders are equal, and then I would say those the two kind of indisputable things about feminism. [laughter]

1:05:55.4 CD: And then the sort of a specific type of feminism that we draw from is called intersectional feminism, and this is an idea out of Black feminism in the US, were Black feminists said, "We cannot consider... Gender inequality, can't explain our reality and it can't explain social and equality, 'cause we have to take race into consideration," and so that would be the basic idea of intersectionality and so this idea that we include both sexism, racism, and then since then other sort of forces of oppressions to think about classism or colonialism and so on, so that we can try to think simultaneously and comprehensively about how those forces intersect to produce social inequality. So that's the... And I should say intersectionality is pretty well established at this point. So that was around in the late '80s, early '90s, and that is, I would say, the dominant kind of, sort of feminist thought today.

1:07:01.2 SC: I think the physicists would be a lot more in favor of the idea of intersectionality if it were just labeled non-linearity, I think that's what we would call it, it's just a...

1:07:10.9 CD: That's great. Okay. Maybe I'm gonna say that next time I speak to physicists. [laughter]

1:07:14.9 SC: Well, because yeah, if I understand it correctly, it's just the idea that whatever discrimination you face for being, let's say a woman and black is not just the discrimination you face for being a woman, plus the discrimination you face for being black, they can interact non-linearly. Yeah, they can enhance. [laughter]

1:07:33.7 CD: Exactly, yes. That's exactly right. Yeah, yeah, yeah. Okay, I'm gonna totally remember this when I'm talking to... Is this just in Physics? Or other sciences as well?

1:07:40.5 SC: Anything, any field that likes to have equations, non-linearity is literally a feature of equations just as you can't add things and get the sum... The sum of the effects is not the sum of the causes, that's all of it, that's what it is.

1:07:53.5 CD: That's exactly Kimberle Crenshaw's points. It's not reducible to these two separate things, but they combine and interact.

1:08:01.2 SC: The Schroedinger equation is linear, but human beings are absolutely not. That's...

1:08:03.2 CD: Definitely not.

[laughter]

1:08:06.0 SC: Pretty obvious thing. But the less obvious thing, the thing that... Your definitions of feminism seemed pretty transparent and unassailable, but the tricky part is that you use the word of equality without quite digging into what that means, and if I pretend to be hard-nosed about this, I say, "Well, what do you mean," I mean the average height of women throughout the world is not equal to the average height of men, they're not equal in that sense, so what do you mean? Men and women, all genders should be treated equally.

1:08:36.9 CD: Yeah, equality of access to services, happiness, equality of opportunity to promotions to, etcetera. And equality of power. I think like this is when we'll know that we have achieved feminist goals, is that when we look at who's in power and it's representative of the population, and it's just still... We're so far behind on so many things in terms of both, if you just look at political representation, or if you look at the wage gap, just all of these different markers and measures. And so I think that to me would be like, this is when I will have achieved the world in which we have equality of genders is when all those measures of power have equal representation. Yeah.

1:09:38.1 SC: Yeah, I mean this is tough for me because... Well, I had Elizabeth Anderson on the podcast, and equality is for a thing, I'm not sure if you're familiar with her work but...

1:09:47.1 CD: I don't know her.

1:09:50.2 SC: She's a leading theorist of equality and people also... The audience members, I can tell from YouTube comments, etcetera, some of them get their hackles up because they hear the word equality and they instantly read that as equality of, I don't know, wealth, everyone has the same amount of wealth or the same about of anything and Anderson's point of view is much more nuanced, and it's more about the equality of the ability to become who you want to be or something like that, right? I can at least imagine a world where men and women have exactly equal equality to become who they wanna be, and a bunch of women decided that they don't want to be in the positions to power, maybe that's not the actual world, because we're very far from this thought experiment of equal opportunity, but I worry about measuring it or judging it that way because that's an outcome-based measure, and I want the... As a poker player, I know that you can do the perfect strategy and the outcome might not be what you want, or what you anticipate, right? So maybe we have to be a little bit more nuanced about the kind of quality that we're shooting for here.

1:10:56.7 CD: Yeah. I mean I like Anderson's definition there are sort of equal opportunity to become who you wanna be or something like that.

1:11:05.3 SC: I might have mangled it, sorry Elizabeth, but yes, I think that's the [1:11:08.6] ____ of the idea.

1:11:11.1 CD: No, and I like that, and I do think as more women come into power, we might see a transformation of the institutions themselves like, yeah, maybe the institutions themselves will change and look really different from what they look like before, because I do think there is a... If you look at other strains of feminism that are pushing women into the corporate workforce and being like, "Hey, be more like men, wear your power suit," and whatnot. And so I certainly don't think the goal is not just to fill the pipeline, which is sometimes where I get frustrated with that line of research, particularly in STEM, which is not just saying it shouldn't exist, like I'm glad people are thinking about this, but it's always framed as like, "Where are the women?" Not like, "Why are the men taking up so much space?" Or something like that. [laughter] It's always like the women's problem that they're missing.

1:12:21.0 CD: And so, yeah, I think institutions themselves will change as different people, both women and people of colour, come into power, and that's possibly a really awesome thing. [chuckle] So for example, I'll say, I run a research lab at MIT, and I'm trying really carefully to run it in a feminist way and in an inclusive way. We have a kind of a handbook, we have a set of values and norms. We try to do our best within our little scope and sphere of operation to push back against some of those sort of toxic academic culture that exists at universities, both for students and faculty and staff and so on, and so thinking about the ways in which... And those are small interventions, but I think they are potentially transformative. So I think it's right that institutions themselves may transform in the process of realising this sort of equality of access opportunity, sort of freedom to become the person that you wanna be, and so on.

1:13:32.5 SC: Well, and this leads in very well to sort of what I wanted to ask for the last big topic here, which is, we talked a lot about how not taking feminism seriously or not taking fairness, in some broad sense, justice seriously can lead to these unintended biases in data analysis and processing and so forth. So what about the flip side? Can we use data or the analysis of data or the collection of data to bring about a better world? Can we be proactive data feminists in that sense?

1:14:08.7 CD: Absolutely. So yeah, and I'll give you a really concrete example, which is my next project. I'm working on it now, and it's gonna be my next book, and in fact, I sent the book proposal today, so...

1:14:21.8 SC: Oh, congratulations.

[chuckle]

1:14:25.5 CD: So one of the things I'm looking at... You may remember from the book, we talked about the case of Maria Salguero. She is collecting data about feminicide in Mexico, so this is gender-based violence. This is when women are killed basically for being women, and this could be what we call the United States, like intimate partner violence, but it also happens in not only with intimate partners, but with other family members, it happens with drug trafficking, etcetera. And this is a big topic in Latin America. Governments are passing laws about feminicide, and yet they're not... Governments are not systematically tracking feminicides themselves, and they're not kind of putting into place the apparatus that would enable them to, like medical examiners to label deaths as feminicides and things like that.

1:15:18.4 CD: And so Maria Salguero sort of steps into this and for five years, she's been collecting data about feminicide from news reports, and she's ended up with the largest public database of feminicide in Mexico, and she's ended up sharing data with journalists, NGOs. She's actually testified in front of Mexico's Congress a couple of times. And we talked about this as this really interesting example of sort of feminist counter data, so kind of collecting data in order to hold institutions and societies accountable for things that they are missing, that they are not doing. And since then, I ended up... I was living in Argentina, in fact, on sabbatical, and ended up... I was really interested in this and I started asking around. I started meeting other groups. It turns out she is not at all the only one who's doing this. There's many groups across Latin America who are doing this work.

1:16:17.6 CD: They range in size from people like Maria who are individuals to large non-profit organisations who are mapping and monitoring gender-based violence. And so I've been interviewing them, and in fact, my lab has started building technology to help them detect feminicide better from news reports 'cause invariably, they're using news and social media to tally these killings. And so this is a case that is, I think, really interesting for how we can collect back or monitor back or really think about how do we use data, aggregate these statistics as a way of holding governments accountable and building political will towards social change, to really making a case and building an evidence base for changing policy.

1:17:09.3 CD: So yeah, so I think that's one example. I think there's many other examples out there, and in fact, especially... And that's one of the reasons I'm also so committed to data literacy is 'cause I think these are tools that can and should be in the hands of human rights groups, journalists who are doing some of the most interesting sort of accountability work with data right now, social movements, community-based organisations, and so on. So I think there's a lot of power and potential of data science, really for good. That's what we're talking about here, it's data science for justice.

1:17:47.0 SC: And I think along those lines, there was... One of the points you made in the book that... Your sort of suggestions to do better which really struck me was the blending of reason with emotion, you're like, "Let emotion exist. Don't be afraid of it." And you can elaborate on why we should do that and what it means, but it reminded me of a talk I heard at MIT by Evelyn Fox Keller, your MIT colleague, about the dawn of the scientific revolution and how it wasn't completely organic. Francis Bacon and friends would sit down in the coffee shops and decide what was science and how we do science and how we go about it, and they made up a bunch of rules for sounding more objective than we really were, right? Like, write in the second person to explain that things couldn't have been any other way, don't tell about all the mistakes you made, just tell the successes, and it's not as necessary or inevitable, it was a choice that was made along the way. And so maybe give your point of view of why it's okay or even good to not always try to sound as objective as we could possibly be.

1:19:00.8 CD: Totally. Yeah, no, I love that. And this goes back to the objectivity thing, in fact, right? Because it's sort of like when we have to put on that cloak of objectivity, we are sort of trying to push that other stuff under the rug, like the mistakes that we made, the ways in which our own blind spots were maybe uncovered during the process. On what, a kind of feminists approach would say is, "In fact, we wanna see that." That's part of being transparent, in fact, is not only telling this heroic genius story of what some kind of discovery or a new knowledge that we made, but also showing those bumps in the road, showing the blind spots, showing how we've got help and support. That is in fact much more transparent along the way. And then one of the things that we say about reason and emotion is that... A kind of feminist maxim is, whenever you encounter a binary, you should be deeply skeptical of it, because usually it's hiding a hierarchy and usually it's empirically false.

1:20:11.9 CD: So if there's a gender binary man and women, that's hiding a... It's just false. There's more than two genders. And we say this about the reason and emotion as well, that's an often a binary that we encounter of reason on one hand and emotion on the other hand. It's hiding a hierarchy. Reason is usually seemed to be on top, and emotion is this messy thing that's gendered, that more women are emotional or whatever, and yet there's this wonderful line from Patricia Hill Collins where she says, "A feminist knowledge would be about valuing reason, emotion and ethics sort of equally on an equal playing field."

1:20:56.0 CD: And the reason to do that is, again, sort of acknowledging situated-ness of our own knowledge and our own selves. It's about being transparent about our own limitations and the fact that we have emotional motivations for studying things, for doing things, and then when we communicate our knowledge, communicating with emotion is a way to make it accessible to other people as well. So using emotion in communication is a way of being inclusive, and this I think, comes in particular with data visualization, which can be seen as a very... It can be seen as a very scientific, technical, specialized way of communicating, which is fine if what you're doing is... It's in a scientific journal or something, but when you're doing more for general public, it can be... Sort of, it's like a gated community. It's letting certain people in, and it's prohibiting access by other people. So thinking through about what are ways that we engage emotion, that we engage multi-modal sensibilities for people to access whatever our message is with the data. So emotion brings all these things into our tool kit of communication that are lost if you're like, "No, it must be neutral, objective, and appear scientific," or whatever.

1:22:18.8 SC: Well, David Hume, one of my favorites, said, "The reason is and should be the slave of the passions." Reason gets us what our passions tell us is what we want in some sense. So they need to go hand-in hand. It reminds me a little bit, what you just said of controversies in news reporting from wars and things like that, where you're allowed to say how many people died, but the government will try to prevent you from showing pictures of those dead people. In principle, no extra information is conveyed, but the emotional residents is a very different thing and that matters, right?

1:22:53.5 CD: Yes, absolutely. Yeah, there's a really interesting piece, now I'm forgetting the author, but it's something like, "Numbing with numbers," or something, it's this really interesting piece about how often we are moved by the story of one sort of unjustified death, but not by numbers describing kind of mass death. It's a really interesting philosophical reflection on that, and then sorta also how do we communicate the scale of things like a genocide or these kind of mass extermination events at a scale that is meaningful to us as human beings, and doesn't just numb us into feeling disempowered by the scale of it or something like that.

1:23:40.8 SC: But I always like to end the podcasts on a more or less optimistic note, if possible, so maybe say one more thing about how great it is that we can use data and analysis and all these things to make the world a better place.

[laughter]

1:23:54.1 CD: Yeah. Not end on death. Yeah, so I think Lauren and I talk a lot about... In Data Feminism, we try to describe some of the ways that our data are infected and polluted by the inequalities that we encounter in the world, but at the same time, we do pause it and advance the idea that data are also part of the solution, and so really thinking about ways in which we put that power of data into the hands of the people who can really make social change and use data to challenge those same structural inequalities that keep showing up over and over again, in our data sets and in our systems.

1:24:38.9 SC: And did I hear correctly there, the most important controversy of all that we haven't even mentioned, you treat data as plural, not singular?

1:24:49.4 CD: Oh, well, we flip back and forth.

1:24:51.8 SC: Okay. Me too.

1:24:52.3 CD: So when we talk about big data, we tend to singular, 'cause it's like big data is blah blah, but if I'm talking about a data set, like a collection of things then I tend things plural, but I probably slip and go back and forth. [chuckle]

1:25:11.5 SC: There you go, complexifying the world for us, once again. Just can't help yourself.

1:25:14.9 CD: Yeah, yeah, just complexifying... I fell like physicists should like complexity. Aren't you all about complexity?

1:25:19.1 SC: No no. Kicking and screaming, they would like everything to be a perfect sphere as far as physicists are concerned, honestly. But complexifying is a good thing for human beings when they're not doing physics, and with that in mind, Catherine D'Ignazio, thanks very much for complexifying our world view here on The Mindscape podcast.

1:25:36.0 CD: Lovely. Thank you, it was a pleasure.

[music][/accordion-item][/accordion]

Pingback: Sean Carroll's Mindscape Podcast: Catherine D’Ignazio on Data, Objectivity, and Bias | 3 Quarks Daily

LGuapo

July 21, 2021 at 7:56 pm

I’m not sure the degree to which one can value the opinions of a data scientist (or any other kind of scientist) who doesn’t know the definition (and seemingly has never heard of) the property of linearity. With respect to the oft discussed topic of how one can know if a purported expert is legitimate, this would seem to be a an example of a red flag.

Matt

July 23, 2021 at 8:16 am

LGUAPO: Are you seriously commenting on a podcast on sexist biases to disparage the epistemic authority of a female professor?

Aidan

August 3, 2021 at 7:51 pm

LGUAPO: I think that’s a pretty minor ‘red flag’. If you’re trying to discredit her argument, then I think you should try to aim for points that are actually included in the argument. Since when was the reliability of a data scientist measured by their knowledge of a single concept anyways? That seems overly simplistic.

156 | Catherine D’Ignazio on Data, Objectivity, and Bias

4 thoughts on “156 | Catherine D’Ignazio on Data, Objectivity, and Bias”

Subscribe via RSS

About Mindscape

Support Mindscape

Mindscape Big Picture Scholarship

Join the Mindscape email list

Listen/Subscribe via:

Mindscape Advertisers:

Follow @seanmcarroll on Bluesky

Archives by Month

Archives by Category