151 | Jordan Ellenberg on the Mathematics of Political Boundaries

Any system in which politicians represent geographical districts with boundaries chosen by the politicians themselves is vulnerable to gerrymandering: carving up districts to increase the amount of seats that a given party is expected to win. But even fairly-drawn boundaries can end up quite complex, so how do we know that a given map is unfairly skewed? Math comes to the rescue. We can ask whether the likely outcome of a given map is very unusual within the set of all possible reasonable maps. That’s a hard math problem, however — the set of all possible maps is pretty big — so we have to be clever to solve it. I talk with geometer Jordan Ellenberg about how ideas like random walks and Markov chains help us judge the fairness of political boundaries.

Support Mindscape on Patreon.

Jordan Ellenberg received his Ph.D. in mathematics from Harvard University in 1998. He is currently the John D. MacArthur professor of mathematics at the University of Wisconsin. He competed in the International Mathematical Olympiad three times, winning a gold medal twice. Among his awards are the MAA Euler Book Prize and a Guggenheim Fellowship. He is the author of How Not to Be Wrong and the novel The Grasshopper King. His new book is Shape: The Hidden Geometry of Information, Biology, Strategy, Democracy, and Everything Else.

[accordion clicktoclose=”true”][accordion-item tag=”p” state=closed title=”Click to Show Episode Transcript”]Click above to close.

0:00:00.3 Sean Carroll: Hello, everyone. Welcome to the Mindscape Podcast. I’m your host, Sean Carroll. There’s something that is very common in physics as well as in other areas of science, namely coarse graining. This is something that you do when you have a situation where there are many things going on, perhaps at a microscopic level, and you’re not keeping track of all those microscopic things. You don’t know exactly where the molecules of air are in the room or in a cup of coffee or something like that, so you take many different possible arrangements of the microscopic things and you group them together and you say, like, this set of arrangements is going to count as one big macroscopic configuration. This different set’s going to count as a different thing.

0:00:40.1 SC: Now, it turns out that in representative democracy, something very similar happens. You have many, many people, many, many citizens of the democracy, and it’s usually impractical to imagine that all those people will directly vote on every single issue that comes up in the country, that’s called direct democracy. It doesn’t really work very well, it hasn’t been tried that often, but instead, what you do is you make a republic, a representative democracy. You have the individual people vote for representatives. The representatives are supposed to represent their interests in the legislature or as the president or whatever. This raises an interesting problem. How do you go from the set of all the different opinions and feelings that the citizens have, which might be a huge different sort of heterogeneous set of ideas and goals and desires, and coarse-grain them into a small number of people who will actually travel to the capital and sit in the legislature, or the parliament or what have you.

0:01:42.3 SC: So today’s guest is Jordan Ellenberg. Jordan is a well-known mathematician, who specializes in algebraic geometry. He’s the author previously of How Not to Be Wrong, a New York Times best-selling book, and he has a new book coming out called Shape: The Hidden Geometry of Information, Biology, Strategy, Democracy, and Everything Else. So, what Jordan and I are going to talk about, he has many different things going on in the book, but we’ve picked out this theme of how you coarse-grain your democracy, and in particular, how do you draw boundaries for congressional districts. This is the problem that we face in the subject of gerrymandering. Gerrymandering is the idea that a party in power can fiddle with the shape of different districts to guarantee that it gets more seats in Congress than it deserves in some sense.

0:02:30.7 SC: Now, as soon as you say that, you say, “Well, what do you mean deserves?” And it’s not clear. We have to actually think about the politics of this, the political philosophy. How should we represent the goals, desires, interests of different people in the government. That’s fine, we’re not going to talk about that today. That’s a very important politics and political philosophy question. But there’s also a math question: Given some particular goals, how do you make them happen mathematically, and even more importantly, at a legal level, how do you know when someone is bending the rules? How do you know when someone is cheating? Turns out, you can say, well, consider the space of all possible shapes of districts in a state. Can you say that the particular one that the legislature has drawn is unfairly biased? That’s a good rigorous math problem and talking about it gets us into interesting questions in geometry, but even more interesting questions in randomness.

0:03:27.8 SC: Even though the question we just asked seems pretty straightforward, you’re comparing the specific map that people have to draw the districts to the set of all maps. You can’t really analyze the set of all maps, it’s too big; instead, what you do is a random walk through the set of all maps. That gets us into questions of how you do random walks and Markov chains and different algorithms for searching the space. So it’s a wonderful example of how not just physics, but other areas of human inquiry bring up these very intricate, very interesting math problems. We’re using politics here really as an excuse to talk about math. We’re not talking about politics that much in this episode, but the math that you need to do politics well turns out to be fascinating in its own right. So with that, let’s go.

[music]

0:04:32.5 SC: Jordan Ellenberg, welcome to the Mindscape Podcast.

0:04:34.9 Jordan Ellenberg: Hi, thanks for having me on.

0:04:36.3 SC: We’re going to be talking about gerrymandering and the role of geometry and math in gerrymandering. It’s a slightly political topic, but you’re a mathematician as a professional here. So, why don’t you just for the sake of the listeners say what you do for a living, like how, what kind of mathematician are you?

0:04:52.9 JE: I’m primarily a number theorist, which means that I study these questions that mathematicians have been wrestling with really since there was such a thing as formal mathematics, these questions of, you write down the equation and does it have solutions in whole numbers, and if so, what are they? That is something we somehow still don’t know that much about all these many millennia later, although we certainly know more than we did at the outset. But maybe more specifically on what’s called an arithmetic geometer. Over the… Especially in the last 100 years or so, we’ve come to see that these questions which seem to be purely numerical or algebraic actually have like a very, very serious geometric understructure, and that’s very much the way that the subject is seen now coming from the [0:05:37.0] ____.

0:05:37.5 SC: Well, that was one of the interesting things that I learned from talking to Emily Riehl, who we just had on a few weeks ago, and we haven’t had that many mathematicians on.

0:05:46.1 JE: Oh, fantastic.

0:05:46.8 SC: Yeah, we haven’t had too much math, but now there’s like a surfeit of mathematics. I don’t know what’s going on here, but she comes, I guess, from the opposite side. She’s a topologist and she keeps finding, as topologists do, these algebraic structures in the study of how you can deform things from one to another.

0:06:02.8 JE: Yeah, that’s so interesting, because she has so fully manifested herself as a category theorist that I think of her as that and I actually didn’t know that she started in topology, although that’s certainly the origin of that subject.

0:06:13.9 SC: Yeah, no, that’s exactly right. So despite all this and despite the fact that math, one of the most pure abstract, philosophically-elevated things we can think about, I don’t think that you testified in front of the Supreme Court, but you at least signed a brief in front of the Supreme Court, right?

0:06:33.1 JE: Exactly, right. So it’s sort of when the Supreme Court is considering a big case, especially a case with a lot of technical content of various kinds, they get briefs for all the kinds of specialized views, but typically… And in this gerrymandering case, this was the case that has briefs from historians, briefs from political scientists, briefs from lawyers, briefs from community groups that have interest in the case. I think this is, as far as I know, the first case in the history of the Supreme Court where there was a mathematician’s brief.

0:07:02.5 SC: And you’re 0 for 1. The mathematicians did not get the answer that they were hoping for, right?

0:07:07.8 JE: So we’ll have to wait another 200 years for the next case that’s mathematical to come before the Court.

0:07:12.5 SC: Okay, so…

0:07:13.1 JE: To try to even our record.

0:07:14.3 SC: We might have listeners who are not familiar either with politics or the United States, or our lingo or anything like that. What do we mean when we use the word gerrymandering?

0:07:23.2 JE: Right, so the fundamental idea is this, that in the United States, in both the US Congress and in state legislatures, we elect the people who make the laws from geographic districts. We cut the state up into a set of districts. In Wisconsin, there happen to be 99 of them. Sort of a weird number, I know, but it’s in the Constitution that every Senate district has to be three Assembly districts. So it has to be a multiple of three, that’s what is constitutionally mandated. And you break the state up into these districts, and then each district elects a legislator by a majority vote.

0:08:00.3 JE: And what in some sense we’ve known for a long time, but has really come to a kind of apex now, is that this process of choosing how you execute this division, how you divide the state up into districts, which seems kind of technical and boring, and I guess you just cross a line to do it, that turns out to have an absolutely huge effect on who sits in the legislature, or rather, it has a huge effect if you work hard to give it a huge effect. And that process is called gerrymandering, that process of partisan actors motivatedly drawing the line, so as to ensure that their party is advantages in the election. And I think it’s sort of best summed up with a popular description of the practice is saying the legislators are choosing their voters instead of the other way around.

0:08:48.8 SC: Yeah, and it’s all based on the fact, or arises from the fact that for whatever reason, we choose to allocate our representatives geographically in some sense. The starting point, which we’re not even going to question in this discussion, is you divide the state up into regions of geography, and then within each region, you vote for who wants to represent you.

0:09:12.5 JE: Right, and one thing I learned writing about this is that there’s lots of places that don’t do it this way. In Hong Kong, there’s a constituency just for teachers. To run for the seat, you’ve got to be a teacher. There’s a teachers seat, there’s an engineers seat, there’s a… Like one of these so-called functional constituencies. In Iran, there’s a seat just for Jews. In Ireland there’s a seat just for graduates of the University of Dublin. There’s lots of ways… You want to hear my craziest idea about this?

0:09:35.1 SC: Sure.

0:09:35.7 JE: I thought it would be cool if… Maybe we shouldn’t really do this, but imagine if there was a seat… Imagine if we broke the age ranges up into bands of equal population and was just like, okay, there’s a Congress person… There’s a representative of people aged 40 through 51. You could argue that those people across the state have a lot more in common with each other politically, in terms of what their priorities are, what their needs are, than people of very different stages of life who happen to live 50 miles apart and thus sit in the same congressional district.

0:10:04.3 SC: Well, it’s interesting to think about the future, because I can imagine that 200 years ago, geography was just the way that you could easily slice this pie, but these days we can do better. Yes, so I’m interested to see whether or not… I know that there are a lot of clever, maybe overly clever, voting schemes on the market, but we’ll see whether any of them actually catch on in real elections. But one of the instant worries you get to, and we’ll probably get to this in more detail, but when you have this scheme where you divide things up geographically and then just have a majority vote, is that… So if you have a state with 99 representatives, let’s say, and 60% of the state, as you very nicely put it, there’s the orange party and the purple party, so you’re not associating these with any real parties here in the United States.

0:10:51.3 SC: So 60% are orange voters, 40% are purple voters. And in this weird, idealized situation where every district is exactly 60% orange voters and 40% purple voters, all of the representatives would be orange, even though 40% of the voters are purple. So there’s sort of a built-in amplification of tiny edges in the voting, right?

0:11:17.7 JE: Yeah, and this is… I should have set this up as a thought experiment in the book, but it’s also called Massachusetts, that’s another name for this phenomenon. And Massachusetts is a state, which… There are Republicans in Massachusetts and not even that few. I think it’s about a third to 35% of the state, so that’s pretty significant. If I remember correctly, there hasn’t been a Republican member of Congress from Massachusetts since 1996. It’s been nine Democrats for a long time. And that’s actually not because of gerrymandering, that’s because the very nature of geographic districting. As you say, in a state that’s two-thirds Democratic, there’s not going to be very many Republican neighborhoods, certainly not entire Republican patches of the state big enough to be a Congressional district.

0:12:01.1 SC: Right, so I guess what I wanted to get on the table there was just that, like you said, even forgetting about gerrymandering, this whole idea of just having geographic districts and voting in them can lead to big distortions, if you compare it to just proportional representation from inside the state.

0:12:17.7 JE: Right. But this is one of the reasons this question is so interesting, is that you really find yourself saying, “What does it mean to be representative?” So one, it’s very natural to think that the right notion of representativeness is, as you say, what’s called proportional representation, which means that the proportion of legislators from each party is roughly the same as the proportion of voters for that party. Maybe not on the nose, but that a deviation from that is understood to be some kind of failure of representativeness. That’s a system a lot of countries have. It is definitely not the American system, and it never has been.

0:13:00.2 JE: So look, I always say like people write about this in terms of Democrats and Republicans, but like you know sympathy for the Libertarians, man, like year in and year out about 1% of American voters vote for a Libertarian Congressional candidate. Now, there’s 435 members in the House. If you believe in proportional representation, you’d think there ought to be a few Libertarians sprinkled in there, and how many are there, zero, because there’s no such thing as a Libertarian congressional district or even a Libertarian neighborhood. It’s kinda fun to imagine what it would look like, isn’t it? But as far as I know, it doesn’t exist.

[chuckle]

0:13:31.1 SC: Well, yeah, because of this effect that our voting system or our system of apportioning representatives turns up the contrast knob, so a slight advantage in the electorate gets you a huge advantage in the Chamber. And if you’re down there at 1%, you’ll be normalized down to zero very, very quickly.

0:13:51.0 JE: But one thing, I mean, we’re sort of imagining this as sort of some kind of system with noise in it, if you’re in a state like Wisconsin, where I live, now you’re in California, which is a very different story, but in Wisconsin, where the proportion of voters who vote Democrat or vote Republican is pretty close to 50/50, but because of that, the political divisions are quite different when the mood of the state is more Democratic and the mood of the state is more Republican. So under what you might call a normal system, even with single-member geographic districts, there would be swings, there would be years where Democrats held the majority and years where Republicans held the majority. So at any given time that majority might be pretty big and in favor of one party or the other, but it would switch back and forth and in some sense, there’s some political justification for that. There’s some political justification for saying like it’s good for the legislature, some of the time to sort of have a healthy majority of one party, so that things can get done.

0:14:55.1 SC: Yeah. That makes sense, that makes sense.

0:14:55.3 JE: And then, if the voters don’t like those things that are done, then it switches, right. And then…

0:15:00.9 SC: Exactly, and you’re bringing up the fact that we don’t agree ahead of time on what we want a fair outcome to be in any of these situations, there’s competing interests that we have to get to. But so just to be super clear, I think that you said it already in some sense, but how gerrymandering works is sort of to advantage one party over the other taking advantage of the fact that the representation within each district need not be reflective of the state as a whole.

0:15:29.6 JE: Exactly. And if you’re the person drawing the maps and in many states, Wisconsin being one of them, the people who draw the maps are the legislators, the very people whose electoral fortunes are at issue. So it’s like hard to imagine a worse feedback loop from the point of view of conflict of interest. To put it… And you can draw lots of charts and pictures and graphs, but the fundamental way it works is pretty simple. The party that you don’t favor, you try to cram all of their voters into just a few districts. You try to create districts which are absolutely dominated by the opposing party because by doing so, you’re sort of using up their voters and, if you like, wasting their votes on a few districts that they’ll win by a lot, leaving the rest of the state composed of more districts where you win by a more modest amount, and that is exactly what we see in Wisconsin.

0:16:24.5 SC: And just to be clear, this happens everywhere, well, at least in the United States, like this is this procedure of drawing weirdly-shaped districts. And you can show pictures and sort of laugh at how weird they are. This is really, really common as far as I can tell.

0:16:39.5 JE: Well, a couple of things I want to say about that. One, it doesn’t happen everywhere, because not every state has the procedure where the legislators themselves are the ones drawing the map to determine their own fate. So there are states, like Iowa is one, where forever, they’ve had sort of some panel of retired judges draw the maps, and nobody’s going to say they’re perfect, but they’re not people whose literal job is to be political operatives for a political party. And by the way, when I say that the legislators draw up maps, that’s not even quite correct, it’s literally people who work for the state party, not even elected officials, but just people whose entire job is not to represent the public but to represent the interest of their party, that’s who’s actually sitting in the room drawing the maps. So one thing I like about this as a place where there’s a possibility for a reform is the system we have is like literally almost the worst possible thing we could have, so any change from it… Is the system of just asking some retired judges to draw the maps perfect? No, but it’s way better than what we’re doing in a lot of states.

0:17:41.4 SC: So this then raises the question of how… And now the math begins to come in. One question, I guess the first question we might address to ourselves, is there a way of characterizing how unfair a particular set of maps is, and so what do people say about that?

0:17:58.2 JE: Yeah, and actually that’s a great question, because what it turns out, if you try to say, “What’s fair, like what’s the perfect map? What’s a map that’s truly representative?” I don’t think that’s the question that has a mathematical answer; actually, I don’t think it has a philosophical or a legal answer either. And then you might say, “Well, so what the hell are you going to do?” Well, I think the question of what’s fair is very hard, but I think the question of what’s unfair is much easier, and that’s a crucial distinction. We’re not looking for perfection here. I think eliminating gerrymandering, saying, “We’re going to turn the gerrymander knob to zero,” that’s impossible, that’s an unrealistic hope, and it’s not even clear if it make sense, right? It’s like saying, “I’m going to eliminate all trace of bias from my thinking and be like perfectly rational.”

0:18:40.5 JE: No. But you can look for the worst offenses, again which we can see right here on the ground in Wisconsin, and it’s a few other states. But one thing I want to say that’s really important is that a very common and historically grounded myth about gerrymandering is the thing that you said that it has to do with funny-shaped districts. In fact, it’s right there on the name, that sort of the word gerrymander comes from this famous political cartoon, about a map drawn by Elbridge Gerry whose name, I’m told, was actually pronounced Gary, but that ship has sailed, which everybody calls it gerrymandering, so sorry to the descendants of Elbridge Gerry, but we’re calling him Jerry.

0:19:19.3 JE: You know, this kind of crazy districting of Massachusetts that he engineered as governor, which they said looked like a salamander, and so, it was the gerrymander, this was the origin of the name. And, now, too, there’s a famously, crazy looking district in Pennsylvania, that was normally called Goofy kicking Donald Duck because of the way that it looks. But here’s the thing, nowadays, when these maps are not being drawn by some political savant, sitting with a giant map, rolled out on a big table, but are being done by machine. If you say, “I want a map, that advantages my party by a huge amount, and it looks fine, the districts are shaped, roughly, rectangularly and nice,” you can do that.

0:20:03.0 JE: And, in Wisconsin, we have exactly that. If you looked at the map we had before 2011, which was drawn by a court, and the map we had after 2011, which was a highly gerrymandered map, drawn by political operatives, you wouldn’t be able to say one of these has funnier looking districts than the other. They look about the same. So, if that was ever a good measure of gerrymandering, it no longer is. We can’t detect it with the naked eye, we have to do something more specific.

0:20:27.7 SC: That’s very interesting, actually, ’cause I hadn’t quite appreciated that. So, you’re saying that the use of computers and our modern data-driven techniques, allow us to gerrymander in a way that we don’t know it when we see it. We have to be a little bit more sophisticated about what it means for the map to be tilting in one direction or another.

0:20:45.8 JE: I mean, you don’t know when you see it, when looking at the map. You may know when you see it when you’re like, “why do I live in a swing state that has lots and lots of members of each party, but year in and year out the legislator has a huge, super majority of one party.” There you can see it, you can see the effects, but you can’t see it on the map.

0:21:03.0 SC: So, there was one effort to characterize the unfairness of maps that you talk about, before sort of discarding as not the right way to do it. Which was the, I guess, the efficiency gap, how many people’s votes are wasted. And that does have a superficial plausibility to me.

0:21:18.9 JE: Yeah, and I don’t… Discarding is too strong. I think there’s been a sort of, as in any area of scientific progress, there’s going to be an iterative thing where we develop better and better measures. You think of a measure and then you’re like, “Well, here’s some of the problems with it.” In some sense here you see the differences between scientists and politicians, because politically you should probably choose one thing and be like, “This is it, if you argue against this you’re wrong.” But as scientists, we’re not inclined to do that. We’re always inclined to look at whatever it is we’re using and be like, “Could this tool be made better? What are the issues with it? What are the edge cases, where the tool doesn’t work so well?”

0:21:54.6 JE: So, I think it’s a bit, it can be a bit frustrating to the lawyers who are like, “I thought we were arguing on this or that measure,” which I very well understand. But, again, because my framework is not “let’s make it perfect,” but “let’s root out the absolute most grievous gerrymanders,” the good news is that we have a bunch of different measures, and on a map like Wisconsin, they all agree. It’s not hard. It’s not hard to see that something very dirty is going on, and literally any reasonable measure you can think of, will light the right… Light the big red blinking light.

0:22:32.1 SC: Why don’t you explain where the efficiency gap is, because I think it’s, at least, a good example of sort of an attempt to make the voting fair.

0:22:38.8 JE: Yes, the efficiency gap… What it measures is, it says… It relies on this notion of a wasted vote, where a wasted vote is on of two kinds: It’s either a vote for the party that loses in a given district, so it’s a vote that somehow doesn’t go towards getting you a seat; or, it’s a vote for the winning candidate above that 50% threshold that you need. So it’s any votes above or beyond 50%. And that captures something very, very true about the way gerrymandering works, which is that, if you’re trying to be efficient with allocating your votes… Like if you could just draw the districts completely non-geographically and just choose the voters, you’d be like… Especially if you knew how they voted.

0:23:24.6 JE: You would say, after the fact you would say, “Oh, okay, I’m going to make a bunch of districts where 50.001% of the voters are mine, and 49.99% of the voters are yours, and I’m going to win them all. And, if the political nature of a state means that, actually, there must be a lot of your voters somewhere, I’m going to put them all in one district. I’ll put them all in one district and I’ll win the rest.” That would be very efficient where every time you win, you win by only a little. You’re getting a lot of seats, with very few extra voters at the margin.

0:23:53.8 JE: And so, I think that notion of efficiency gap is very simple to compute, and it captures something of… It captures something very real about how gerrymandering works. And you can very clearly see empirically that it’s a lot bigger in states where the legislature is drawing the maps for their own benefit, than it is in states where there is some other system in place, whether a citizens’ commission or a judge’s commission, or whether it was drawn by courts after a battle, or what-have-you.

0:24:22.5 SC: And then, one of the arguments against it, though, if I’m remembering correctly, is that, if you did have this ideal situation, where every part of the state had exactly the same representation, 60% orange, 40% blue, that would not… Even if you just did everything completely evenly, that would not get a good score on the efficiency gap measure, but there was no gerrymandering going on.

0:24:47.3 JE: Yes, exactly. I think the one, the flaw of that measure is that in the end it does kind of seem to posit that there’s some right answer to how many legislative seats there should be, given a certain popular vote. It’s not proportional representation, but in some sense, it’s in the same genre as proportional representation. And, I think… And, by the way, this is something there is definitely argument about, so I don’t want to say that my view on this is the only one. But it is the view that claimants went forward with in the Supreme Court case of Rucho v. Common Cause, is that, the opposite of gerrymandering is not hewing to some standard that you think of in advance about what the number of seats should be.

0:25:30.4 JE: The opposite of gerrymandering is not gerrymandering. The opposite of gerrymandering is what would have happened if those maps had been drawn by a neutral arbiter of some kind.

0:25:41.3 SC: Good. So now the math is going to start coming fast and furious, because your question now has shifted to realm of the ensemble of all possible maps we could imagine drawing of congressional districts, and asking questions about that. And this is what gets mathematicians very excited.

0:26:01.6 JE: Right, I mean, this is where there starts to be some really interesting math, and where, I think, a lot of mathematicians have gotten involved. You know, I can name Moon Duchin at Tufts, Justin Solomon at MIT, Jonathan Mattingly and Greg Herschlag at Duke, Wes Pegden at Carnegie Mellon, lots of people whose day job is proving theorems have got an interest in this, because it would be great if we were all so civic-minded that we worked on problems just because they’re important. But no, there’s lots of important things in the world, we work on whatever’s important and mathematically interesting, so that this speaks to our special skills, which to be honest, are not usually called for in most political questions.

0:26:44.2 JE: So exactly right, what you’d like to say is, but what if the map had just been drawn by somebody who didn’t care which party came out ahead? Let’s say a map plucked at random from all the possibilities that there are, that would certainly be an unbiased, neutral, non-partisan way to do it. That sounds very appealing, but then you have a big problem, which is that the set of all possible districting of the state is absolutely unfeasibly, uncomputably huge. I couldn’t even tell you how many there are. The ways to cut Wisconsin up into 99 pieces, that’s a lot.

0:27:24.7 SC: Well, you’re slightly aided by the fact that there are these voting wards, thousands of them, but still they’re discrete units, so at least it’s not literally an infinite number, right, but the question is, how would you divide up the voting wards into a set of congressional districts.

0:27:41.0 JE: Right, I think there’s about 7000 wards in the state of Wisconsin, so… So if you think about how many ways there are to sort of partition 7000 things into 99 pieces, that is a very, very, very big number. Even if you start to impose requirements that each piece should be connected in Wisconsin, constitutionally, you’re not allowed to have districts that break up into several pieces, although in some other states that is allowed. Even so, you simply can’t write them all down on a sheet of paper or store them all on a hard drive and then pick one at random. You have to do something, you have to find some proxy for that.

0:28:17.2 JE: And the way we typically do that is by this mechanism, what’s called a random walk. It’s something that people in physics do all the time. So a Monte Carlo simulation is something that happens all the time in physics. You probably know better than me what’s a context where that would be.

0:28:33.7 SC: Random walks are surprisingly important for many reasons. I just want to… Before we get to that, I want to home in on one question that I had. So I can imagine the math problem of… Given a state with a certain shape divided into geographically pretty regions with approximately the same number of people in them, that sounds both easier and very different than imagining all the ways to partition the state into geographic regions and find a typical one. Maybe the typical one looks kind of like weird and fractal, is there… Is there a reason we don’t just pick the geographically pretty ones, with minimum area curves or something?

0:29:17.7 JE: And by the way, I want to say that not a hypothesis, it’s good intuition. If you actually could choose a uniformly random one, the boundaries probably look really bad in fractal, you can sort of see that in experiments on much smaller cases. My grad student Lorenzo Knight knows a lot about this, actually, so it’s a… So right, one thing I like about this problem is that it is a math problem mixed with a political problem and a legal problem, and you can’t really separate the strands from the other, so there is a history of work that treats it as just a mathematical problem. In other words, like let’s mathematically sort of define what it means for districts to be nice and try to figure out what would be the nicest division of a state into equipopulous regions.

0:30:01.5 JE: And you get nice polygons, with like a straight-line boundary, and then you show this to a politician or a lawyer and they will laugh until they pee their pants, because this has no bearing on anything that’s politically feasible or anything that respects like actual political divisions in the state. And actually, these are people who live in real communities, and actually we have in Wisconsin, we have a People’s Districting Commission, which has no legal powers, but the Governor made one anyway, just to sort of say “This is what it would look like if we could have… “

[chuckle]

0:30:33.5 JE: And I’ve been to some of their hearings, and it’s pretty cool, just they hold a hearing and like people call in and just regular people whose base is in Wisconsin. It’s like, “My community is split down the line between two assembly districts, why? It makes sense for us, we have these specific needs having to do with this particular lake and its health, and why do we not have one assembly member who cares about this?” Those kinds of considerations, if you’re actually doing it… I want to say, I think the system we have is bad, but if you want to know why there can even be such a system in the first place it’s because in principle, a representative is actually supposed to be listening to the people that they purportedly represent, working for those people, like not working for a political party they happen to be a member of.

0:31:18.7 JE: So there is a reason this is thought of as part of the political process. So if you talk to political science people or elected officials, they’ll say, “There’s no way we’re going to let a computer draw the map.” That’s not… Nobody wants that. Voters don’t want it, politicians don’t want it, judges don’t want it, that’s a non-starter, and that’s what happens if you try to treat it like a math problem without treating it also like a political and legal problem. On the other hand, having now sort of crapped on my own tribe, if you try to treat it as a just a political or a legal problem and not a math or geometry or quantitative problem, your results are going to be just as bad, and traditionally that is what we have done.

0:31:56.6 SC: Yeah, that’s fair enough. But the nice thing about the ensemble approach is that these other considerations, like you want… You’re not picking a completely random partition of the state, you want things to be geographically continuous, for example, but also you want to give representation to minority groups and maybe have some sensibleness to the way that people are divided up, and in principle that can be included in how you define your ensemble, is that right?

0:32:28.9 JE: Exactly, so the ensemble just refers to, that’s what we call a set of things in the sense of kind of dress it up and make it sound fancy.

[chuckle]

0:32:37.5 JE: You just have to say the same thing, but in French.

0:32:40.1 SC: Yeah.

0:32:40.9 JE: And then that sounds like bigger and more technical. So yeah, you have this big, you have this big ensemble of randomly generated maps, and it’s certainly not all possible maps, but it’s a set of maps, which for reasons we can talk about if we want seem to be fairly representative of the set of all possible maps. And we can build in certain biases that we want. We can build in, hey, I want the districts to be like roughly round, if you want that. You can build in, hey, I want the districts not to cut the county lines if they don’t have to. Which in Wisconsin it’s a constitutional ask. I mean, different states have different criteria for what their districts should look like.

0:33:18.3 JE: Basically you build in everything the districts are supposed to be like, except you don’t build in, oh, and by the way, I want my party to have like five more seats than it would if it were fair, and then you look at your whatever, your 20,000 maps or 100,000, it’s easy to make lots and lots, and you see, you look at actual election results and say like, okay, if the wards were arranged in this way, what would the outcome have been? How many seats would each party have in the legislature? And invariably, you just get a very nice roughly Gaussian curve, a bell curve, just like you always get when you do any kind of experiment like this, right?

0:33:53.0 JE: You sort of see that there’s very typical outcomes, and then there’s very atypical outcomes. And in states like Wisconsin or North Carolina or Maryland, where they have heavily gerrymandered maps, the outcomes from the actual maps are way off in the tail of the distribution. They’re very visible outliers, and so it gives you a really nice clean signal of this is not something that happened by chance, much as the people who drew the map may want to tell you that. No, this is an absolutely like grievous thumb on the scale in favor of one party and against the other.

0:34:30.0 SC: I think this is very, very important, because probably if people hear that we’re talking about gerrymandering and math and geometry, they might leap to the idea that you’re using geometry to draw the map. But that’s not what you’re doing. You’re suggesting that, yeah, no, human beings will draw the map, we’re going to use math to judge how fair, how typical or how non-distorted that map actually is in terms of comparing its actual results to what typical results would be. Is that right?

0:35:00.9 JE: Exactly, the geometry is the referee here, not the player.

0:35:04.9 SC: Good. And so we need to construct or at least deal with this idea of an ensemble of maps to figure out what would be fair in the world of all the maps. And by the way, is this one of the… I hypothesized that this is one of the gaps between professional mathematicians and civilians, that when a regular person talks about a random number. If you say you have a random number between 0 and 1, that instantly in their mind, in my mind, a number appears, 0.71835. Whereas I think mathematicians actually just instantly go to the ensemble or to the probability distribution or something like that. Like converting the phrase random element into really what you mean is an ensemble or sort of a random variable is really a whole ensemble. Is that a helpful step into thinking more like a mathematician?

0:35:55.6 JE: Yeah, and I think, you know, the word random is of course one of those heavily overloaded words. To my kids it means a song I don’t like. To a… And often, to a mathematician, random, something that’s completely deterministic could be random, it’s just random and all the probability is concentrated at one point, so literally it just means something… It just means the outcome. How would I even describe it in words, Sean, this is hard, actually. Maybe it means something in which chance could play some kind of a role. But it doesn’t mean, for instance, if you flip a coin and it’s weighted so that two-thirds of the times it comes heads and one-third of the time it comes tails, a civilian, as you put it, would probably say, well, that’s not random, it’s biased in favor of heads. A mathematician would say, oh, of course it’s random, it’s just that there’s a different probability distribution. So we would say it doesn’t have to be even; anything that is subject to chance in whatever way we would consider a random variable.

0:36:57.5 SC: That’s right. Okay. And so, but the particular issue that we’re facing with here, as you already made very clear, is if you’re trying to compare the actual map to an expectation of over all the possible maps you could reasonably draw, that ensemble of maps you could reasonably draw is far too big. So rather than… And sorry, I guess maybe something to get right here is there’s an NP hard problem sneaking in here. We did have Scott Aaronson on the podcast a while back, and we talked a little bit about NP hard problems. But remind us what those are and what the specific one is.

0:37:31.5 JE: Yeah, so if you were to say, I want a way of provably drawing, and now I’ll use a term, a uniformly random map, which means that every possible map is equally likely, you have to formalize it a little bit more than that to sort of get it into literal things we can prove theorems about complexity theory. But essentially, that problem is known to be NP hard. What that means is… Complexity theory is this very funny subject where there’s this whole class of problems and what the kind of theorems people prove is like problem A is as hard as problem B, and problem B is at least as hard as problem C, and problem C is at least as hard as problem F and etcetera, which in turn is as least as hard as problem A.

0:38:17.3 JE: Okay, but proving that any individual problem actually is hard and doesn’t have a fast algorithm to do it, is like completely out of reach. So the whole edifice could come crashing down tomorrow, somebody could be, okay, I found a fast algorithm for one of these problems, and now they’re all easy. Okay, nobody thinks that’s actually going to happen, but it’s a very funny subject in that way. So NP hard kind of refers to this class of problems that are, so to speak, the hardest problems, and we think the case is that they’re all hard rather than that they’re all easy, but we don’t actually know.

0:38:49.4 SC: And so, but we can approximate them often, so an NP hard problem takes a lot of steps to solve, but then there’s a somewhat independent question, which is: How efficient could an approximate solution be? Is there a general theory of that?

0:39:06.9 JE: I don’t think so. And I’ve been doing a lot of seminars about it, and this is definitely… This is a little bit outside my field, I’m not a complexity theorist, but like I said, I’ve been to so many interesting talks in like this sort of machine learning and optimization group at Wisconsin, which is very closely tied to the Math department. It does seem like the way the subject is going is to say like… I’ll give you an example, traveling salesman problem, a very famous NP hard problem.

0:39:33.7 JE: You want to find the fastest route, you have 100 cities or something, and you want to sort of draw a path through all of them, and you want to find the fastest one. Now, that is an NP hard problem, to do that provably and find the very fastest one, and you could say like, okay, great, we proved the theorem, like that’s really hard. Meanwhile, while you prove that, people are just doing it, right. Because you know, the engineers while you do that, they’re like, yeah, but I’m content to get… To write an algorithm which with probability 99.99% finds a solution that’s at least 99.99% as good as the optimal one, so while you’re over there proving your theorem, I’m just making the path, I’m doing it.

0:40:10.5 SC: Half the people used Google maps to get to the conference where you show that you cannot solve this problem, right.

0:40:14.9 JE: Although people don’t usually plan 100 conferences in a row their entire itinerary, through the entire thing. I actually have no idea how Google maps performs if you give it like 100 destinations and ask it to chart a map through them all. So I do think the way that field is moving is towards what problems can we solve approximately and ideally, provably with high probability, but maybe probably with high probability, it starts to get a little philosophically fuzzy what it is that you’re doing. And so, do we want to have our random walks now?

0:40:53.6 SC: Yeah, well, I think that’s exactly where we are. So just to sort of catch our breath, ’cause we should pat ourselves in the back for getting this far. So, we have this problem of, we want to compare a given map that some people in a smoke-filled back room came up with to what to expect in the ensemble of all the fair maps. But the ensemble of all the fair maps is way too big. So, what you’re going to tell me is the way to sample from it, but unbelievably, in my mind, is to pick one and then start changing it in little random ways.

0:41:24.0 JE: Exactly, so what you do is you start with some given map and even, it could be the map that we have right now, it could be this gerrymandered map and say, let’s do something to it, but something pretty simple, like the set of things. Again, the set of possible things we can do to it, it’s pretty big, so I’m going to be very restrictive about what I’m allowed to do. I’m going to say: You’re allowed to take two districts that are adjacent and move a ward on the boundary from one to the other. Okay, I’m going to be honest with you, the best possible algorithms are a little bit more complicated in this, in an interesting way that makes them work better, but I can’t do it in audio, so you get the book and you look at the pictures…

0:42:00.4 SC: Okay, they’ve got to buy the book. They should buy the book anyway, yeah.

0:42:06.3 JE: But the basic idea would look kind of like this. Rather than sample from every single way you can possibly change it, you restrict just these very natural local changes that I take a ward that’s on the boundary and shift it to the district on the other side of the boundary. And doing moves like that, and then you say, okay, I’m going to do like 10,000 of those moves, each time choosing at random some ward of the boundary. Remember the number of wards is not very big, the number of wards is about 7000. So choosing a random thing out of 7000 is cake, that you can definitely do.

0:42:40.6 JE: And it turns out that just doing a few moves like that actually can sort of get you around the space pretty well. It can sort of get you a lot of very different looking maps. The analog I always like to bring up is the number of ways a deck of card can be… Cards can be ordered is pretty big, it’s a number called 52 factorial. You are not going to be able to list them all and pick one at random. So then you might say, well, I guess the problem of putting a deck of cards in a pretty random order must be an unfeasible problem, but it’s not unfeasible, right, I mean, you do it all the time. What do you do? You just shuffle the deck, we know that there’s an algorithm for that.

0:43:20.0 JE: So, how does that work? Because a shuffle is a particular kind of change that you’re allowed to execute, but unless you’re some kind of sleight of hand genius, a shuffle that you do is not going to be exactly the same every time. There’s some amount of randomness where there’s some small set of possible shuffles that you can execute depending on exactly what order when you like… Like when you put the cards down. And it turns out, and this is a thing you can prove theorems about, so my undergraduate advisor, Persi Diaconis, is a very… Together with Dave Bayer has a very famous theorem called the seven shuffle theorem.

0:43:53.5 JE: It’s one of those theorems that does what it says on the can. It says that seven shuffles, which is a very small number, is essentially enough to put a deck in like extremely close to uniformly random order. So this is the magic of the random walk, is that choosing from a very limited menu of possible changes and choosing one at random and just repeating that has a wonderful power to allow you to explore a space that you simply can’t see from afar, you can’t see it globally. It’s like wandering around in a landscape where in order to have a map of it, you would have to sort of have a balloon and be like 10,000 feet above it, but what if you don’t have a balloon? All you can do is kinda wander around in each moment, sort of choosing whether to walk north, west, east or south. North, west, did I say them all?

0:44:44.7 SC: Yes.

0:44:45.0 JE: Okay, good. But if you do that, that’s actually not a bad way to explore a whole space. If you do that for a long time, you actually probably will get everywhere, given enough time.

0:44:56.0 SC: So I don’t know if you know, but if you go to Vegas and play poker, many of the good poker tables will have automatic shufflers, so the human being is not involved. But if they don’t, the casinos do not let the dealers actually shuffle the cards in an ordinary way, because they are good enough to more or less put it in whatever or what they want it to be. So the way that they shuffle is to just scatter the cards face down on the table and swirl them around with their hands before gathering them back up, because that’s at least a little bit less controllable.

0:45:28.4 JE: But I would think that doing that once would not be enough, is that right? My intuition would be that you’d want to do that several times. Is that right or is once sufficient?

0:45:36.5 SC: Well, if they’re really scattered all over the place, and then it’s not just like a single shuffle where you pick two and move them back in and then you cut the cards, etcetera. So I think it’s good enough for government work or for Vegas work. Yeah. Okay, good. So with that footnote on the table, so just to be super duper clear, we’re doing a random walk, but we’re not doing a random walk through the state, assigning things to different Congressional districts, we’re doing a random walk through the space of all possible maps. And the nice thing is, there’s sort of a obvious local way to do that. You have a map and you pick one boundary on the map and you sort of mess with that, right?

0:46:16.5 JE: Right, except, as I said, it turns out that there’s a better local way to do that that has more behavior. But it’s something like that, yeah.

0:46:22.4 SC: Slightly more sophisticated way.

0:46:23.2 JE: Yeah, and this is a move that happens again and again in the history of geometry, and that happens again and again in the book, is that you think you’re dealing with one geometry. But then to really solve the problem, you have to kind of go one level up, and it’s kind of meta, and think about the geometry of geometry. It’s like the geometry of the space of all possible ways to cut the state up. So that is definitely a leap into abstraction. And in a book like this that is not for professional mathematicians or Supreme Court Justices or some professional gerrymanders, we gotta lead up to it and this is at the end of the book. But it’s a move that we constantly make going from the sort of very concrete geometry of two and three-dimensional objects that we can understand, kind of building our intuition there and then lifting that into geometrizing entities that are much more abstract and yet somehow can be approached using the same techniques.

0:47:18.8 SC: Well, one obvious question is, convince me or at least assert very strongly that this kind of procedure will be a fair sample of the space of all the maps. Isn’t it possible there’s just some maps we’d never get to by starting with one map and making little local deformations?

0:47:36.8 JE: So that is absolutely possible. In fact, we don’t even know whether this space of all district things is what’s called connected, which means it’s possible there’s some other complete undiscovered country of ways to district the state that you never get to. You can choose for yourself how much of a problem you think of that as being. For me, it’s not, and I’ll explain why. Because if you start with a map that the legislators drew and you mess with it, and basically every single thing you do to it rapidly dissipates the partisan advantage that it has, that’s a pretty strong indication that that particular map was cooked. In other words, you’re right that we may not provably know whether it’s an outlier among all possible maps, but what we do know is that it’s an outlier in its own neighborhood. Even things that are very much like it still are not as incredibly favorable to the party that drew the map as the map that we actually use. So that to me is a very strong signal.

0:48:33.8 SC: Yeah, and I think that this makes sense to me as a physicist in sort of entropy terms in some sense. There are a lot more equilibrated configurations of a set of molecules than there are low entropy out of equilibrium configurations. So even if you start with a weird configuration and mess with it a little bit, you’ll push it much closer to an equilibrium one ’cause there’s just so many more of them, right?

0:49:00.2 JE: Right, like the headphone cord in your pocket, it’s got to get tangled up because the untangled state is pretty small. I actually mentioned this exact example in my book and I’m like, “Man, even for readers now who are under 30, do they know what a headphone cord is? This… Alright?” And then 10 years now, people will be like, “What is that? What does he mean? What is the thing that you would have it… ” Hopefully I’m not…

0:49:19.8 SC: Outdated.

0:49:22.2 JE: Yeah.

0:49:23.3 SC: But okay, so yeah, I don’t want to let this go by without taking advantage of some of the work you did for the book, digging into the history of the idea of the random walk, ’cause it’s fascinating. All these history things are always very, very fascinating. So when you say, “I started a point in some space and I walk randomly in some direction,” that doesn’t sound so hard, but in fact, it crept up on us quite slowly. Mosquitoes were a big player in the first random walks?

0:49:52.6 JE: Yeah, I knew almost nothing about the history of this, so I had an incredibly fun time learning about it. And it’s one of these things that is surprisingly common in the history of science where somehow the world is ready for a certain idea. So this idea of the random walk kind of comes up in a bunch of different contexts at once, and it first comes up… Well, maybe not quite chronologically first. But as you say, Ronald Ross, who is the fellow who discovers that mosquitoes are driving the spread of malaria. He’s the person who discovers the transmission mechanism of malaria, and so he becomes very interested in, well, what’s the right mitigation? You can’t kill all the mosquitoes. That’s another good example of a problem that’s infeasible ’cause there’s too many. But you can go to clear some small area of mosquitoes, and then what you want to know is, well, how long does the area stay mosquito-free? How good of an intervention is that? Because, of course, the mosquitoes are going to get back in from elsewhere after you cleared the area.

0:50:49.8 JE: So to understand that, you want to know if a mosquito starts in sort of some stagnant pond where it’s born. How far is that mosquito typically going to migrate in a certain amount of time? And because mosquitoes are not really very goal-directed beings, that’s a random walk problem, because each day the mosquito is going to do something, but as far as we know, the mosquito doesn’t have particular aims, it just kind of flits where it will day by day. And so its progress is sort of a random walk towards the landscape. And Ross, he’s a fascinating character who I literally had never heard of before I started writing this book, this very famous figure in the history of medicine. He actually kind of secretly really wanted to be a mathematician and also a poet, and was kind of bitter about his life in medicine and sort of felt like it wasn’t really that great of a thing to do, even though he was this sort of super famous, Nobel Prize-winning doctor. So he would write poems about how unappreciated he was. Anyway, it’s a whole thing. It’s a whole thing.

0:51:50.5 JE: But he recognizes that this mathematical problem is of critical import to understanding the spread of disease, and actually he comes to the Saint Louis Exposition of 1904, which I think among physicists is famous because it’s where Poincaré gives his sort of famous speech about like, maybe we’re going to have to change physics, maybe things look different near the speed of light, and things are going to have to be like modified, that the laws we know are only approximations.

0:52:12.4 JE: I mean, this is a… But he’s also there, he sort of talks about how, well, they expect me to give a speech about medicine, but what’s really important is the math, [chuckle] so I gave a talk about math and no one understood. He was that kind of guy. He was sort of a difficult character. So he asks… He asks Karl Pearson, the statistician Karl Pearson, if he knows anything about it, because he recognizes he can’t really figure out analytically how to solve this problem. And then Pearson writes a letter in the letters column of Nature, and Pearson coins the term random walk. He tells Ross like, I’ll put in Nature, but I can’t mention the problem is from biology, because then none of the mathematicians will work on it, because they’ll be like, we don’t do biology problems, this is considered very disreputable.

0:52:56.2 JE: So he made it an abstract problem and put it in Nature, and rather quickly, a physicist, Laura Rayleigh, sort of explained, like, oh, I actually know how to do this. I did it like 20 years ago for an application in acoustics. So this is sort of like in England, how this notion of a random walk was first analyzed, the two-dimensional random walk of a mosquito moving around in the landscape. But what I discovered is that at the same time, almost at the same time… I think if, as you say, civilians, that people who are not mathematicians actually know the phrase random walk, they probably know it from the title of Burton Malkiel’s book, A Random Walk Down Wall Street, a book I really love, ’cause it’s one of the most… It’s probably the giant bestseller with the most math in it of any of any book.

0:53:44.5 JE: And so I discovered that, actually, this idea is old too, there’s a graduate student of Henri Poincaré, Louis Bachelier, who comes from this non-traditional background, he worked in the Bourse, he worked in the bond market in Paris before coming… Before coming to study math, and he was like, look, everybody’s like buying and selling and trying to understand the market, but I think it’s just a random walk, I think things are just bumping up and down randomly, let’s try to understand how that random walk would behave and see if it matches the behavior of the market. Well, again, as with the biology, this was considered a little disreputable, and Poincaré was kind of, I get that what you’re doing is good math, but it’s not the kind of math we do here in Paris, we’re supposed to be studying the free body problem in celestial mechanics, not bond prices. But of course, now it’s the dominant point of view, right, if you’re going to sort of do finance, if you’re going to get a PhD in finance, you’re going to be studying fantastic differential equations, so you’re going to be like random walking like hell.

0:54:36.9 SC: Well, and by the way, this… This goes back to the point you made earlier, that random doesn’t mean absolutely equal probability of everything happening, right, so the stock market can trend upward while still having a random component.

0:54:50.9 JE: Absolutely, right. It’s like… It’s noise plus drift, as like I think Danny Kahneman’s new book, like… It’s just about this, right, noise. And maybe his other books too. I can’t even talk about it all ’cause it’s too long… And then there’s the whole Einstein story, when Einstein is sort of ratifying Boltzmann’s theory of what the understructure of matter is by sort of treating Brownian motion like a random walk, which I didn’t really know that much about, but then what we call a such a thing today, we mostly call it in math, it’s not a random walk, but a Markov chain.

0:55:22.8 JE: So Markov is among mathematicians the person who really first develops this theory, and here is the… This is a story that blew me away that I’d never heard. He invents the Markov chain because there’s this huge theological war between the arch conservative, super Russian Orthodox Russian mathematician and the angry atheist Russian mathematicians, of whom Markov was the leader, and so his opposite number, a guy called Nekrasov, believed that he had found a mathematical proof of free will, kind of ratifying the views of the Russian Orthodox church in this matter.

0:55:58.0 JE: And Markov was like, no way, you can believe what you want, but don’t bring math into it, like that. Now I’m offended that you would sort of say that math proves that your church stuff is correct. So Markov develops the Markov chain as a counter example to Nekrasov’s claimed proof that free will is implied, that free will is implied by probability theory. I always say like… Like you’re on the internet, right, I mean, you know, is there anything more intellectually sterile and unuseful than like a long argument between like a movement atheist and a movement Christian, right, those are… Nothing good ever comes out of that.

0:56:35.9 SC: No, I think… I’ve got to push back there against… I think that there’s a sampling problem, when you say that, there are loud, noisy exemplars of both sides that have very… Not very helpful arguments between each other, but there can be useful intellectually productive discussions between people who hold the same opinions, but hold them in more productive ways, I would argue.

0:57:00.4 JE: I guess you are absolutely right, but I would counter that by saying that, in fact, in terms of their personalities, Markov and Nekrasov were both the exactly the kind of loud, obstreperous people. So it’s amazing that out of their dispute came this absolutely fundamental mathematical idea, but then it’s amazing that none of them knew about each other, this was math… I mean, now sort of math is global, but then, stuff is happening in England, stuff is happening in France, stuff is happening in Switzerland with Einstein, and stuff is happening in Russia with Markov, and none of them knew… It’s all happening between 1900 and 1905, and none of them knew about each other, it’s incredible.

0:57:33.7 SC: Right, so this is pre-Russian Revolution, Nekrasov and Markov, and I do, I think that the audience deserves just a little bit of a glimpse into what Nekrasov thought he was doing with the free will business, like he was trying to reconcile the idea that individuals are allowed to have free will, even though whole societies act in predictable ways.

0:57:56.2 JE: Yeah, exactly. So one thing… So you start with the law of large numbers, which has been around for a long time, which essentially, put it in terms of coin flips, because in probability theory the only thing we have are coins and urns, right, everything is about one of those two things. But this one’s about coins, right. You flip coins for a long time, and with a very high degree of probability the proportion of heads you’re going to get is going to get very close to 50%, or any fixed deviation from 50%. The chance you’re going to get, say, 51%, you certainly might get 51% heads if you flipped 100 coins, but if you flip a million coin that’s incredibly unlikely.

0:58:33.7 JE: And what people started to see, and this is long after Bernoulli, when sort of people started to study demographic statistics, is that just as coin flips kind of settled down to predictable averages if there was enough coins, so social measures like tended to converge to predictable averages if there was enough people in your aggregate. And so there was a lot of kind of philosophical unease about what that meant, did that mean we’re just all like mindless coins, we’re just all sort of like little balls in the roulette wheel, like sort of with… You know, fated in some sense essentially deterministically to sort of at least as a society, act in a certain way.

0:59:15.0 JE: Well, as you can imagine, that is not a happy thought for people of a certain philosophical bent. So Nekrasov thought he had found a way out of this. Nekrasov said, “Hey, look, this theorem of Bernoulli’s, this law of large numbers, it requires that the coins be independent from each other.” If your coins are not independent, like let’s say in the simplest case, let’s say if each coin is constrained to fall the same way as the previous one, well, then you’d either get all heads or all tails, you wouldn’t get convergence to 50%.

0:59:47.1 JE: So Nekrasov up said, “Ah, this law of large numbers, the statistical regularities you see of like age of first child, age at first marriage, proportion of various crimes, all these things you see, the fact that those seem to converge to stable averages might have made you thought we’re just mindless coins but no, what it really shows is that we must all be independent from one another. So ha, I win.”

1:00:09.7 SC: Yeah, but there was a basic logical flaw there.

1:00:12.6 JE: Yeah, so what Markov observed is that he was mistaking the theorem for its converse. Bernoulli’s theorem says that if the coin flips are independent, then you get this convergence to stable averages, but it didn’t show that if you have convergence to stable averages, then the flips must have been independent. So what Markov showed is that any kind of what we now call a Markov chain or a Markov process or a random walk, that the behavior of such a thing also converges to predictable averages, even though if you’re sort of walking around at random, where you are today is anything but independent from where you are yesterday.

[chuckle]

1:00:51.3 JE: Because you’re very close to where you were, you’re only one step away from where you are yesterday. They’re very, very, very tightly connected. But then having won this battle, he did all kinds of crazy things with it. He analyzed Eugene Onegin, the famous poem by Pushkin, and sort of study it like… I mean, if you treat this as… It seems crazy to think about a poem as a random walk, and he certainly knew that a poem was not just a random walk, but he nonetheless showed that you could just treat it as a string of consonants and vowels. It was a very modern kind of computational-style approach, reducing it to a string of bits, 0s and 1s. And showed that you can calculate what’s the probability of transitioning from if a letter’s a consonant, what the probability of the next letter is a vowel, or the next letter is a consonant. That’s a very simple Markov chain. You might say that cannot tell you very much about this famous Russian poem, but you know what, he was able to show that it distinguished it from a different book by a different Russian author, that there were measurably different patterns in the Markov chain, just in the sequence of consonants and vowels.

1:01:55.2 SC: Because different authors have different favorite words or different sequences of words, so whether a consonant follows a vowel will be different likelihoods for different authors.

1:02:04.7 JE: Right. And that’s in some sense a precursor to so much of the natural language processing we do today. Of course, we do it in a much more sophisticated way than him writing down these kind of grids on pencil and paper of consonants and vowels, but he did use all the letters.

1:02:17.7 SC: Well, right, so all the way to GPT3, this purported artificial intelligence is actually just looking for these kinds of patterns or these likelihoods, these probabilities of different things happening in texts and spitting them back at us.

1:02:32.0 JE: Right. It’s in some sense trying to find the Markov chain that best matches the language production it is able to see, and it’s able to see a lot because it has access to this gigantic corpus of English language texts that exists on the internet and that people have stored.

1:02:48.9 SC: Okay, wait a minute. What is the definition of a Markov chain? How should we be thinking about that or visualizing it in our brains?

1:02:54.8 JE: You should think of it as any process that explores a space where you’re in some state and then… I mean, we can be even more complicated than this, but let’s say the simplest thing would be to say, you’re in some state, and then where you go next only depends on where you are right now. And there might be some randomness to it, it might be that if I’m at 5th Avenue and 29th Street, I’m going to walk either a block north, a block south, a block east or a block west, so there’s four places I can be. And it’s not deterministic, like maybe I flip a four-sided coin and like decide which of the four ways to go.

1:03:30.7 JE: But the point is that if I choose to walk north to 5th Avenue and 30th Street, then my next choice doesn’t depend on the choice I just made. All I know is that it’s kind of like having… It’s kind of like in the movie Memento where he has amnesia and at every moment, you don’t remember anything except where you are right now, and you have to decide what to do next based on that. That’s the characteristic feature that makes a Markov chain a Markov chain.

1:03:54.4 SC: Good, it’s very much like a random walk, it’s a generalization of the idea of a random walk, but we’re walking in some very abstract space, which for Markov himself was just vowel or consonant when he was looking at Pushkin.

1:04:08.1 JE: Yeah, right, a two-point space, vowel and consonant, yeah.

1:04:10.9 SC: And so when you’re either in either one, you’re either in vowel or you’re in consonant, there’s… The Markov chain is characterized by the probability that you’ll either just stay at vowel or jump to consonant.

1:04:22.0 JE: Right, these are so-called transition probabilities. And you can already see that this model is not complicated enough to really capture how language works, because maybe you’re going to have two vowels in a row but you’re probably not going to have five. And this simple model cannot see that, because at each stage when you’re at a vowel there’s some chance of staying at a vowel, so you want to dress it up a little bit more and make it a little more complicated.

1:04:43.4 SC: Yeah. So the whole point of the Markov chain and the Markov process, is this memory list quality. It only depends on where you are now. It doesn’t remember where you’ve come from, which makes the GPT something you can’t really have a conversation with, it has no recollection of what you’ve already said.

1:04:58.9 JE: Well, no, because… Okay, I’ll say this. There’s a trick to this. It’s very beautiful so I’m going to tell it to you, which is that, let’s say you’re doing a Markov chain on letters. But what you do to be clever is you say, actually, my Markov chain is on strings of three letters. So let’s say if I’m… If I see THA, if that’s where my current state… Probably the most likely the next letter is T, right, because I’m probably not saying like thanos, so I’m probably saying that. Like that’s a very common word I might be in the middle of. But then your next state is not T, it’s HAT, because you’re walking on the space of three-letter sequences and you’re only allowed to lop off the first letter of your sequence and replace it with a new letter at the end.

1:05:37.3 JE: So this business of memory… You can kind of build in a certain amount of memory just by enlarging your idea of what the state is. This is a very clever idea. And so GPT3, or any language generation system like the AutoComplete in your Gmail, it’s a little bit more than that. And I’m not competent to say what’s inside the guts of GPT3 because, despite the company being called Open AI, we don’t really know what’s in it. But they’ve definitely worked hard to give it more long range memory than a sort of more very basic Markov chain-based system. That’s part of what makes it cool.

1:06:14.4 SC: And the important thing for the audience is that when they’re texting and there’s AutoCorrect, or when they’re doing Gmail and they’re suggesting answers, that’s a Markov chain at work proposing those possible solutions to what comes next.

1:06:26.0 JE: Right. It’s saying here’s where you are now, what’s the most likely place you’re going next. Now, I don’t know about you, but I always type something else when it starts to suggest something. I’m like, “Well, I’m certainly not going to say that now that you’ve suggested it. I’m my own man.”

1:06:38.6 SC: Yeah, but you’re still part of the overall ensemble. You’re going to contribute to the expectation values at the end of the day, so…

1:06:45.2 JE: But I’m disrupting it ’cause it’s like, “Oh, I guess I was wrong about, like… ” I’m reducing its level of certainty of what it’s doing.

1:06:51.0 SC: I think the system takes you into account, that’s my prediction, but… Okay. And so then with these Markov chains, we have a way of generalizing the random walk and you can see how we’re coming back to the gerrymandering, right? Because we’re going to have a Markov chain that represents where every place that you are represents a map, and we’re going to hop to a different map, right? And then you can sort of, once again, the space of different places you might hop to is enormously big, but then you can start talking about the ensemble and its properties, and there will be, in general, a favorite equilibrium distribution, right? There’ll be, I don’t know, a set of probabilities. Tell me what the equilibrium distribution is. What is the simplest way to characterize that?

1:07:31.7 JE: Yeah. So ideally, if you run the random walk for a long time, you will get to sort of some consistent answer to how much of your walk do you spend, each possible place on the map. Now, in the case of the gerrymanders, you’re probably not going to approach that distribution, because you would have to run it so long that you can conceivably see like every possible distribution, which we are probably not doing. But a very well known place that does that is Google Page Rank, which I think both of us are old enough to remember the incredible difference in how the internet worked. There were search engines before Google but they did not work the way Google works.

1:08:11.5 JE: And what Google did was build the Markov chain into the process. Basically, they did a random walk on the internet, a very good abstract space that we like a lot. You’re on a page, you follow a random link from it. That’s the walk. You do that for a long time and you start to converge to a probability distribution. There are sites you’re hitting again and again. Which sites do you hit? Well, you might say, “Well, it must be sites that have a lot of links to them,” so you’re likely to get there. But it’s not just that. There’s something recursive about it, because you want the sites that link to it, to themselves be sites that have a lot of links to them.

1:08:47.2 JE: If I just have two random sites that have like a billion links to each other, I’m never going to get into that little closed ecosystem unless other people link to it. So this process of random walk and then keeping track of what’s this limiting distribution of how much, how… What proportion of your time to spend in each place, that’s the basis of what Google called Page Rank back in the mid ’90s when they developed this. That is the order in which they serve you your search results. And that absolutely revolutionized people’s ability to find things, because this limiting distribution, what the Markov chain finds for you and computes for you, is… It’s somehow a global property of the system, that it’s a very, very hard… It’s kind of an emergent property that’s very, very hard to sort of see in any local way.

1:09:31.1 JE: You just gotta do the walk and see what happens, and what they saw is that it does an incredibly good job of sort of capturing this notion, this human notion of importance or centrality or the search results that I actually want.

1:09:47.5 SC: But for something like our maps, our gerrymandered maps, or the maps that we were trying to show are not gerrymandered, I think, like you said, we don’t have the computational power to sample every single map, much less keep coming back to the same map over and over again. There’s just too many of them. So what is the thought that makes us say that some Markov chain is going to help us figure out what a typical map looks like?

1:10:13.0 JE: Well, it helps us… Well, as I say, we can’t know for certain what a typical map looks like, but we can know what a typical map sort of like… You know, within a neighborhood of the starting map. So imagine, a random walk is in some sense like sort of particles diffusing from a given source. If you’ve seen any of the many newspaper articles with these incredibly terrifying COVID pictures of what the spray of COVID particles from somebody’s nose and mouth looks like who’s infected, it’s much denser near the person’s head, and then it rapidly gets less dense as you go farther away from the person, so you should think of it like that.

1:10:50.5 JE: You should think of it as like you may not be seeing the entire universe, but you’re getting a pretty good sample from the random walk, sort of like what’s going on in the neighborhood of the map that you started with. But again, the fact that we’re not sort of converging to some equilibrium distribution, it means we’re not going to do what Google does, we’re not going to say, I’m going to rank all maps for you and tell you which map is best, the most liked, the most random, the most likely to be hit, but that’s not the goal. The goal is not to tell you what to do, the goal is to tell you what not to do. The goal is to identify when an existing map is completely an outlier.

1:11:28.8 JE: So maybe the analog of that, I’m just making this up as I go, maybe this sort of Google analog would be that if I claim to you, hey, my homepage is the most important site on the internet, and Google sort of started its random walk from there and walked like a billion steps and never found its way back there, that would be a pretty good… It wouldn’t necessarily tell you what was the most important site of the internet, but it would definitely rule out that my homepage is the most important site of the internet, ’cause it’d be like, come on, like I walked around for like a billion steps, and I never found my way back there, it can’t be that important.

1:12:00.6 SC: So what is the modern mathematical research level problem in the gerrymandering and Markov chain game? Is it coming up with algorithms to do these random walks or is it… Are we ever going to get ambitious enough to say, well, yeah, we think we actually can draw the maps as mathematicians, get out of the way, politicians.

1:12:22.3 JE: But why would they agree to that?

1:12:25.2 SC: Again, I’m not a politician either, so I’m not going to say they should agree to it, but so what are the mathematicians trying to do?

1:12:31.6 JE: So I would say, there’s a lot of questions and research in this area [1:12:34.6] ____ and I should emphasize, I am not one of the people doing this research. I’m a popularizer of this research and I’m a conduit between mathematicians and various like voyeurs and commissions that I’ve testified before, etcetera, etcetera. So I would say some of the things people are doing are trying to get better provable guarantees about maybe you don’t know, it’s like a 100% random, or can you get some kind of Google guarantees about how close to uniformly random you are.

1:13:11.9 JE: Studying the properties of different choices of what the local moves are, this is very granular stuff, but it turns out to actually make a difference to how rapidly you converge to something that looks kind of uniform. So this kind of, you might call it kind of an engineering problem, just being in practice like what versions of the many local things you could actually work. Something that I’m particularly interested in, and I actually don’t know the way that people are working on this, but as you probably know, around the country, there is a lot of appetite for interest in reform of the basic voting mechanisms themselves. People have introduced ranked choice voting in Maine, in New York City, I think in Austin today or something like that. In Austin, Texas, I think they’ve just adopted something like this.

1:13:53.5 JE: There’s a really interesting proposal with some bipartisan support, actually here in Wisconsin, where the legislature hasn’t done anything for two years, but maybe, maybe not, they’ll do something and ask for support from both parties. So I think a lot of the work in gerrymandering has really focused on conditions as they are, conditions where we have the voting system we have, and we have the two parties we have, and we sort of even ignore the existence of candidates or voters who don’t identify with one of those two parties.

1:14:23.7 JE: Introducing fundamental changes in how the vote works, like ranked choice voting, is going to scramble this a lot, both for the people drawing gerrymandered maps and for the people trying to detect gerrymandered maps. So one thing that just personally I’m interested in is trying to understand how this landscape changes with more variation in possible voting systems, which we’re, as I say, we’re already starting to see, and there are states, well, there’s one state, Maine, that has really like gone all-in on this mechanism, and it’ll be really interesting to see how things play out there. But again, doing the work, it’s going to be both math and political science and law, all mixed together. If you try to do it with one discipline by itself, you can’t do it.

1:15:03.6 SC: We’ve said it before in the podcast, but why don’t you just say what ranked choice voting is, so the folks get an idea for why it would be an attractive thing to shoot for.

1:15:11.5 JE: Oh, yeah, sorry, sometimes I get immersed in this voting stuff, and I forget that most people, oddly to their psychic benefit, don’t think about democracy maintenance all the time. So ranked choice voting means that you don’t just vote for one candidate and say this is the one I like best, you rank all the candidates who are on the ballot in order of how much you would like them to hold the office. And what that allows you to do, is if you are somebody who doesn’t identify strongly as a Democrat or a Republican, and maybe even identifies with a third party, it allows you to vote for that party’s candidate while not giving up your right to express a preference between a Democrat or a Republican that you might well really have.

1:15:56.6 JE: In fact, in the proposal from Wisconsin, they’re talking about having, ditching partisan primaries and just having an open primary where all the Democrats and all the Republicans and all the libertarians and all the everybody and the unaffiliateds all are in one primary. The top five finishers go to the general election, and then it’s ranked choice from there. So in a situation like that, you probably would have a general election with multiple candidates. You might have two Democrats and two Republicans and a fifth person, for example.

1:16:24.6 JE: And then I think… Again, I can’t do this by myself as a mathematician, we need political scientists to weigh in, but intuitively, you feel like it scrambles things up and gives people more choices in a political arena, and which frankly, just listening to people talk when you go to these commissions, a lot of people are pretty frustrated with the choices that they have.

1:16:45.3 SC: Do you think that there is some future collaboration between political scientists and mathematicians where you are very ambitious and say, look, let’s try to think about what are the goals that we have in electing people in sort of coarse graining the individuals in our society into a legislature and try to figure out a priori, or ab initio, I should say, what is the best voting system that would reflect the actual will of the people. Is that too utopian?

1:17:18.8 JE: I mean, people are going to work on it, whether you think it’s too utopian or not, right? There’s nothing too utopian for people to think about. No, I think back to those kinds of discussions already are happening and have been happening, look, they go all the way back to Arrow’s theorem, there’s lots of interest in kind of like mathematical formalisms about not just how voting, not just analysis of existing voting, but how voting could work, right? So that’s a pretty old idea.

1:17:44.0 JE: So what I would say, maybe this is a good opportunity, ’cause one thing I just thought of, that certainly, Moon Duchin has told me, is that they see these techniques not just as a way to sort of like give meaningful testimony in court about a particular political question that’s happening in the moment, but as a sort of new kind of tech for doing social science research, in the same way that in physics… Now I’m going to talk about physics, so please tell me if I say something like completely off-base. I mean, people use Monte Carlo simulations all the time to sort of simulate and try to understand what the effect of doing a certain thing would be in a situation where for various reasons they can’t actually experiment. Isn’t that a thing…

1:18:23.8 SC: Yep, oh, absolutely. They do, yes.

1:18:28.0 JE: I went to a great seminar of a guy who’s like, well, we’re sort of trying to figure out things about atomic weapons, but we are not… You know, we don’t just like drop atomic bombs on places where we think not too many people live or not too many people who are important to us live anymore. It’s not how we test them anymore, like a lot of it is like simulations based on pretty hardcore numerical analysis, but also the Monte Carlo technique. So I think that at least some of the people in this area envision a world in which… Also, in political science experiments are like hard to do and often ethically questionable, right, so maybe this is too utopian, but there is an idea of we can at least get some good ideas about what the results of sort of different kinds of voting systems would look like from simulations of this kind.

1:19:14.1 SC: And maybe to wrap up on a less utopian note, ’cause I know that as scientists and mathematicians we like to be utopian, but as you emphasize very, very properly, politicians, feet on the ground, have a different set of values and different set of cares. So one of the things you mentioned very briefly which intrigued me was the idea of inventing schemes by which you can admit that the map will be drawn by a legislature, not by an independent commission, and yet inventing schemes where sort of like for two kids trying to slice up the pie, they are driven toward a more or less equitable outcome.

1:19:53.2 JE: Yeah, I mean, so this is like a… There’s a very famous problem called the cake-cutting problem, where there’s a very elegant answer, like, so how do you cut a cake fairly into two pieces between two people who both want to eat and both like cake, and want a lot of cake. And the way you do this is you have one kid cut and the other kid chooses. So the kid who… So somehow this iterative process, where you separate it into steps, somehow sort of creates an incentive to like make the division roughly fair. And then there’s an entire literature of how do you do this if there’s three kids and three pieces, four kids and four pieces, like, you know, multiple kids who may be able to choose a subset of the pieces. Whatever.

1:20:33.0 JE: Like mathematicians want to be, have to be, like generalized than that, but so, one thing that people have discussed is could there be a protocol where you make a game out of gerrymandering. Like, you start with some map but then the parties take turns. I’ll change it this way. Now I’ll change it, now I’ll change it. Each team sort of tries to make it, maybe, presumably tries to make it more favorable to their party. Now, actually, you have to be a little careful, and the people who write about this, there’s more complicated protocols than what I just said to make it even possibly work. But the idea is that the reason this might be useful… I haven’t decided if I think it’s practically useful, but the reason that it might be useful is that because these… It’s precisely because these highly gerrymandered maps are so special and they’re so unusual in their own neighborhood, they represent this kind of local apex of partisan advantage.

1:21:26.9 JE: Imagine there’s like two kids wrestling on top of a mountain, and one kid is trying to get you to the top of the mountain, and one kid is trying to get you off the peak. Well, there’s a lot more non-peak than peak, so if the two kids kind of take turns pushing each other, they’re not going to stay at the peak, they’re going to be like somewhere else that’s like much farther down. So I think this is like… Some… But I, listen, here we come back to the thing we said at the very beginning, that the current mechanism makes it so easy to do so much to cement your partisan advantage that almost any disruptive change, I think, would be much more likely to improve the situation than to make it worse.

1:22:00.9 SC: So this is a rare example where entropy is working in our favor.

1:22:04.5 JE: Kick the TV, that’s what I say. Kick the TV and it’s going to start working again. Or at least it’ll work better than it works now.

1:22:09.7 SC: I like that message. Thanks so much. Jordan Ellenberg, thanks very much being on the Mindscape podcast.

1:22:14.3 JE: Thanks for having me. This was great.

[music][/accordion-item][/accordion]

1 thought on “151 | Jordan Ellenberg on the Mathematics of Political Boundaries”

  1. Pingback: Sean Carroll's Mindscape Podcast: Jordan Ellenberg on the Mathematics of Political Boundaries | 3 Quarks Daily

Comments are closed.

Scroll to Top