184 | Gary Marcus on Artificial Intelligence and Common Sense

Artificial intelligence is everywhere around us. Deep-learning algorithms are used to classify images, suggest songs to us, and even to drive cars. But the quest to build truly "human" artificial intelligence is still coming up short. Gary Marcus argues that this is not an accident: the features that make neural networks so powerful also prevent them from developing a robust common-sense view of the world. He advocates combining these techniques with a more symbolic approach to constructing AI algorithms.

Support Mindscape on Patreon.

Gary Marcus received his Ph.D. in cognitive science from MIT. He is founder and CEO of Robust.AI, and was formerly a professor of psychology at NYU as well as founder of Geometric Intelligence. Among his books are Rebooting AI: Building Machines We Can Trust (with Ernest Davis).

0:00:00.1 Sean Carroll: Hello, everyone, and welcome to The Mindscape Podcast. I'm your host, Sean Carroll. If you've been paying attention to advances in technology, science, or just news in the world, it's hard not to be impressed with recent progress in artificial intelligence, mostly driven by neural networks and deep learning, machine learning kinds of techniques. We've really been able to do things with AI that when I was your age, we just couldn't do. For one thing, artificial intelligence programs are easily able to kick the butts of human beings when it comes to games like go and chess, which was considered very far away not too long ago. Another example is GPT-3, which you may have heard of, which is one of these language processing things where you can ask it a question or you can give it a prompt in some sense, and it will respond or will continue on in the vein of the words that you gave it on the basis of the fact that it has read a lot of things and it sort of looks for correlations between them. And finally, it may be most importantly for our everyday lives, AI is everywhere around us, in vision recognition, recognizing the images that are in front of us, maybe even self-driving cars or something like that someday, but certainly recommendations, what music to listen to, what movies to watch, etcetera, AI is at work.

0:01:19.6 SC: So on the one hand it's very impressive. On the other hand, none of these are gonna be confused for a human being. None of these versions of AI are going to pass the Turing test in some very advanced way. You can sort of jigger up versions of the Turing test that are possible by my modern AI, but it's not a full-blown general intelligence, AGI, Artificial General Intelligence, the kind of AI that would really fool you into thinking it might as well be human. So today's guest, Gary Marcus, thinks that he knows why we're not able to do that. More importantly, he thinks that we're moving in the wrong direction or focusing on the wrong things to make progress in this particular direction. The idea is that there are certain kinds of things that neural networks deep learning is good at, looking for correlations and gigantic data sets. There's other things that it's not good at. It's not good at understanding in some big sense that we would like to define. It's not good at common sense, at understanding how the world works at a basic level so that in individual circumstances, we can apply our knowledge in a kind of reliable way.

0:02:27.8 SC: That's why self-driving cars turn out to be harder than we thought they would be. The world out there is a messy place, and you need a picture of the fundamental way the world works, not just a set of correlations in your computer, or at least that's what Gary would say. And he even has advice for how we can make progress in the right direction because there's been a shift in how artificial intelligence research has been done. In the early days, it was symbolic. You would try to define symbols, variables in your AI program that represented different things and then look for relationships or try to define relationships between the different variables. Whereas it's almost more mindless today, the deep learning algorithms just take in a whole bunch of data, and then spit out correlations between them. As Gary points out, the best deep learning algorithms actually are hybrids. They actually make use of the symbolic approach as well, but he still thinks we should be going a lot further in that direction.

0:03:25.9 SC: We really need that kind of understanding-based approach to make artificial intelligence of the kind you would recognize as human-like in some sense. And that might not just be something we wanna do because it would be cool, it might be important technologically going forward. So we're gonna dig into that. It's a lot of fun. Gary's a very opinionated guy. He actually started out in neuroscience and psychology before moving into AI, so he really cares about how real human beings think. He wants to make computers think better than they do today. So let's go.

[music]

0:04:15.3 SC: Gary Marcus, welcome to The Mindscape Podcast.

0:04:17.6 Gary Marcus: Glad to be here.

0:04:19.2 SC: So we're gonna talk about artificial intelligence. Let me give my impression of the history very, very briefly so you can tell me whether I'm correct or not. There were go-go days of artificial intelligence, maybe in the '60s and '70s, where we were first getting computers that were up to the task of even thinking about it, and people thought, very soon we'd be talking to them and having deep philosophical conversations, and then that didn't pan out. It turned out to be harder than we thought. These days, there's a bit of a resurgence. I mean AI is kind of everywhere, neural networks, people are thinking about self-driving cars. And you're on the side of being a little bit of a gadfly, where you're saying like, "Okay, yes, we've had some successes here, but this is not going to be just smooth sailing until we get real human level intelligence." Is that a fair overview?

0:05:07.2 GM: Yeah, I could pick some details, so I think the first bit of enthusiasm was in the '50s rather in the 60s, and the first winter, first AI winter which I think is important, was in the early '70s when there was something called the Lighthill report I think in 1973 that said, "Hey, this stuff isn't really going anywhere. We're throwing all this money into it, but what are we getting for it?" and research really slowed down then. And I think that the field lives in permanent worry that that might happen again, and maybe should live in even more worry than it does. So that's a little edit there. I would like to clarify for the record that I love AI. I'm not someone who thinks AI is impossible. I used the word gadfly. I would probably use the word skeptic. And to that point of that skepticism, there's both a specific and a general. The general thing is... One of my favorite graphs I ever saw was the prediction for how far away, let's say, artificial general intelligence might be, although the term is new, but the idea's been around for a long time, that, how many years until we have AI that's actually, let's say, as smart as people, and we can talk about whether that's even the right criteria.

0:06:21.5 GM: And you look and it's basically always... People say it's 20 years away.

0:06:26.6 SC: Classic.

0:06:28.8 GM: They always say 20 years from now. So that itself is a little bit of an object lesson. And then there's a question about, what is it that we actually have now? What have we made progress on and whatnot? And I know as a physicist, you have respect for the data and you have respect for the fact that there's different kinds of data and different kinds of measurements. And there are some measurements on which the kind of, I'll call it the orthodox Kurzweilian notion of exponential growth seems bang on, and one example for that is chess-playing, another example is go-playing. On these board games, there has really been exponential growth or even super exponential growth. You look at go now as compared to when I was a kid and computers couldn't play it at all. I could beat a go player now and there's no way I could beat AlphaGo. On those kind of things, there's been exponential growth. On some other things, growth has been slower than the popular media would have it. So if you had read the popular media over the last few years, you probably think the vision was solved, that we now know how to do computer vision.

0:07:35.2 SC: How to recognize things, yeah.

0:07:37.9 GM: The reality is we have actually made progress there, but we have not solved the problem. So I'm looking at you right now on a Zoom type of call, and I guess your audience isn't looking at the image that I am, but I can instantly parse what's going on there, and not just label the objects, but I can actually answer questions, like there are things on the wall, and I can make guesses about how they might be mounted there, and I would be surprised if they started floating around the room. I have an understanding, not just of the entities, but how they relate to one another. I see stacks of books and the paper on top of them, and I understand why the paper is not floating, and not falling. I have this kind of integrated with physics understanding of the scene, and AI does not have that, and that's part of what perception is. So every other month, there's another study now showing these so-called adversarial attacks and so forth, showing that these vision systems can be fooled. In fact, mostly they rely on texture and things like that. So one of my favorite examples from recently was...

0:08:39.6 GM: What was that? I think a fire truck overturned on a snowy road, and the system said with great confidence that it sees a snowplow. And it did that because on a snowy road, if there's a vehicle, it's likely to be a snowplow. And these systems are very much driven by textures that they see and by kind of probabilities of what things are generally likely. They don't have an overall understanding of the scene. Another recent example was a... I'm trying to remember what it was, it was something rather... I think it was an apple with the word iPad on a piece of paper in front of it, iPad, and so it thought that that was an iPad because the word was written on the page. So perception is not actually solved, but there has been actual progress. So that's the intermediate case. The first case was true exponential progress, then there's a perception where there are pieces of it where there is exponential progress and pieces of it not so much. And then there's natural language understanding and reasoning, and I would say we have not really made progress at all. GPT-3, which we may wanna talk about, gives the illusion of having natural language of understanding, but I don't really think that it does. And we are nowhere near, for example, an all-purpose general assistant. We're nowhere near to having the kind of language you would want if you had a domestic robot.

0:09:58.9 GM: I have a cartoon in my book where somebody says, "Put everything in the living room away," and the robot ends up cutting up with a saw the couch, because it doesn't understand what it is that we would mean by put everything in the living room. We have no candidate solution for that problem. It's not just that we made no progress. We don't even know how to make progress on that.

0:10:21.3 SC: Now you're making me sad that there's no artificial intelligence podcast editor. That would make my life a lot quicker. [chuckle]

0:10:27.0 GM: That would be great. I mean even there, there's an example of how AI does actually help in some way. So now there are tools, if you wanna put together PowerPoint slides for an online talk in this crazy era in which we are living, that will automatically transcribe and do a pretty decent job and then go find where the breaks are in the words. So there are a lot of places maybe that you wouldn't even expect where AI is actually helping now. There are also some where it's hurting now. We should talk about those too. But AI is real now when it wasn't before, and that's both a blessing and a curse because some of it's reliable and some of it's not and there are all kinds of problems with it.

0:11:07.1 GM: But just on that first question of history and where we are now, what I would say to wrap up a long-winded answer is we've made a lot of progress on a lot of things, but there are some core problems which are mostly about understanding the world and what people are talking about where we really haven't made that much progress. And it could be now we really are for the first time, 20 years away, where all these other times we weren't, or it could be we're still 50 years away. The core question of how you use common sense knowledge of the world in order to interpret things, so again, back to the scene in your room, also there's a place where the lighting values are higher, and I can guess that that's outdoors, making all those inferences about what I see and what's likely to be going on, we just don't know how to do that yet.

0:11:58.7 SC: And maybe it's good to... We're able to get into details a little bit. The audience likes the details. So let's try to understand why there has been this progress. And as far as I can tell, the overwhelming majority of recent progress in AI has been driven by neural networks and deep learning algorithms. Is that fair? And what does that mean?

0:12:18.9 GM: It's true, but with some caveats. So, first of all, there are older techniques that everybody takes for granted but are real and are already out there. Second of all, there are things like AlphaGo, they're actually hybrid models that use classical tree search techniques enhanced with Monte Carlo techniques in order to do what they're doing. So they're not just straight multi-layer perception is a kind of stereotype that people have with neural networks. We have some inputs, they feed into a hidden layer that does some summation and activation function goes to an output. They're not just that. They actually borrow some important ideas about search, for example, and symbols from classical AI. And so they're actually hybrid systems and people don't acknowledge that. So this is the second caveat I would give you. The third caveat I would give you... We can come back to the second, but the third caveat I'll give you is, yeah, most of the progress has been with deep learning lately, but most of the money has been there too, and it was really interesting to see... And I don't just mean like 60% versus 40%. I mean, like 99.9% of the investment right now literally is in deep learning and classic symbol manipulation AI is really out of favor, and people like Geoff Hinton don't spend any money on it at all. And so it was really interesting.

0:13:42.7 GM: There was this competition presented at the NeurIPS Conference which is the biggest conference these days in the AI field just a month or so ago, on a game called NetHack, it has various complications in it, and a symbolic system actually won in an upset victory over all this deep learning stuff. And so if you look back at the history of AI, in the history of science more generally, sometimes things get counted out too soon. It is true the deep learning has made a bunch of progress, but the question is, what follows from there?

0:14:13.9 SC: No, I'm not actually trying to make any value judgements. I would like to explain for our audience what the options are. What do you mean by deep learning? What is that and what is that in comparison to symbolic manipulation?

0:14:25.2 GM: So deep learning is fundamentally a way of doing statistical analysis on large quantities of data, at least that's... It's Forte. You can actually use it in a bunch of different ways, but most of the progress has come from that. And what's impressive about the recent work is it allows us to learn from very large quantities of data. The classical AI system really didn't do a lot of learning at all. They're mostly hand-coded and sometimes that's the right thing to do. So we don't need to learn how to do navigation. We need to learn some details, but we don't need to learn how to do navigation for the purpose of one of the most useful AI things out there, which is root planning, telling you how to get home from whatever crazy place you wound up in. Right? That's not a deep learning-driven system. But there are other systems where if you can glom on to all the data that's out there, you can solve certain problems very effectively, and that's what deep learning has been good for.

0:15:26.1 GM: So an example of that is labeling your photos in, let's say the Apple Photos app or Google photos or something like that, there what you really wanna do is to get user data measured in the billions or trillions of examples and have a system that can extract from all of that data, what is the most likely label for this image, given the other images that are in my database that have been labeled. So that's a kind of typical use of deep learning is very good at. And speech recognition is similar. So I hear this word, lots of people have said it in lots of different ways, and I hear this particular sound, is it like that collection of things that I've heard before or this other collection? It turns out deep learning is far and away the best, and in some way, simplest way to solve a whole bunch of problems like that. Sometimes it's only a little bit better than the other solutions and it gets more press maybe than it deserves, but it usually is the best for these problems. And we have billions and billions of training examples. It's usually by the right way to go.

0:16:28.4 SC: But what is it? How does it work? It's statistics, it's correlations, but how does it find these correlations in ways that we couldn't do a few decades ago?

0:16:37.7 GM: So the basic idea is not actually new... Something I should clarify first. So the mathematics around this has actually been around for decades. People had the idea to do it before. Basically you're just trying to figure out, "I have an error, how can I reduce the error that I've made before adjusting weights between a bunch of things that we call nodes that are supposed to make us think of neurons?" We could have a whole separate discussion about whether they have anything really to do with neurons, but they're at least loosely inspired by neurons, and you're adjusting the weights between them, and sort of how loudly they talk to one another. And if one of them talks too loudly to the other one, you find out over time, "Well, I should make it talk a little bit more softly, and this one should talk more loudly," and you're basically doing that on a mass scale, and it turns out to work really well.

0:17:29.0 GM: The math was rediscovered a bunch of different times. It was actually a debate in this mailing list called Connectionist right now about that history, and people periodically have these debates. There's no question that it's been around for a long time. What really happened is that people developed GPUs for video games that allow you to do a lot of the relevant mathematics in parallel at the same time, and that allowed people to do this deep learning thing at a scale that they didn't really even dream of 20 years ago. So that was a major thing. There's this paper called The Hardware Lottery, I'm trying to think of... I think her name is Sara Hooker, with this really interesting piece about how the... And I confess, I've only read the summary of it, I haven't read the piece yet. But her thesis is basically, you can have these accidents of history where a particular architecture or something like that is available to a particular moment and people just run with it.

0:18:29.6 SC: Sure.

0:18:30.0 GM: And there's a little bit of that going on here. And I don't know if she makes this observation or not, but it connects with what she says where it's kind of partly an accident of what we figured out how to parallelize first, that is made deep learning as popular as it is, so... It was clever to try to use these chips that were built for something else for the purposes of deep learning, and it really changed deep learning. I've made a remark earlier about unfairly dismissing things too soon, Deep Learning was unfairly dismissed too soon, so I have dismissed its ability to do sort of deep cognition, but that's a separate question. Its ability to do basic pattern recognition was actually endowed, so in the early 2000s, Geoff Hinton, who's this big star now, it was kind of like he gave a poster at this conference, nobody came and they're like, This stuff doesn't really work, we understand the math...

0:19:22.7 SC: Classic story.

0:19:23.6 GM: It's kind of cool, it has its own kind of elegance, but you're not really getting it to work, so forget about it, and to his credit, he stuck with it, and once people like him could use it at scale, it turned out that this technique is actually lousy with small amounts of data. But it's brilliant with large amounts of data, and so there was actually like a perfect storm, and so one of it was... One aspect of it was getting these chips, which made a huge difference, another was we didn't have databases with trillions of examples in... Let's say the year 2000, the internet is the other major technology that has driven deep learning. The Internet means that you have large amounts of data.

0:20:03.6 SC: Yeah, and in some sense that from your description, it sounds like it's artificial intelligence, but it's kind of dumb and kind of straightforward, you have all these perceptrons, these knows and they have weights and they just float to whatever is the best at fitting the data on the training set, without any deep understanding of what has happened as you were talking about, papers do not float in the air or anything like that, so what's the... How would you characterize the alternative of the symbolic approach, I guess is what you're calling it...

0:20:35.4 GM: Yeah, well let me, before I do that, let me say that I think that we need elements of the symbolic approach, I think we need elements of the deep learning approach or something like it, but the... Neither by itself is sufficient. And so, I'm a big fan of what I call hybrid systems that bring together in ways that we haven't really even figured out yet, the best of both worlds, but with that preface, 'cause people often in the field like to misrepresent me as that symbolic guy, and I'm more like the guy who said, Don't forget about the symbolic stuff, we need it to be part of the answer. Okay, so the symbolic stuff is basically the essence of computer programming your algebra or something like that, what's really about is having functions where you have variables that you bind to particular instances and calculate the values out, so simplest example would be an equation in Algebra, Y equals X plus 2, I tell you what X is, you can figure out what y is... And there it doesn't matter, which Xs you have seen before, you have this thing that is defying universally is the way a logistician might put it universally for everything in some domain, any physicists would grasp that immediately or any program or any logistician.

0:21:49.2 GM: And that is the essence of what allows programs to work, so we're using in a tool called Zencastr, and it's putting the bits together of your image such that I can see you in vice versa, and it's doing this in real time, because there are functions that can do that across any image. And then we might do some image processing if we're on Zoom to do segmentation, we could talk about that, but the basic thing there is I have a function that says For any set of bits, I will do this function, and I don't care if this image is one that I saw before. And similarly, if I type something in the chat box, it doesn't matter if I come up with a novel sentence or familiar sentence, whereas the deep learning stuff is all about similarity to the things that you have seen before. And so it's really a different almost thesis about what cognition should be, and I think the right thesis is actually our brains... Anyway, can do both.

0:22:39.1 GM: We can do the logical abstract stuff, and I actually did experiments and all the way back in the late '90s on human infants showing they could do the abstraction even at seven months old, so there's this ability for us to do abstraction, which allows us to be computer programmers were to do logic, and there's also this heavy statistical analysis that we humans do, we're not quite as good as the machines at it, but we can do a lot of it, so we know that the word inextricably is often found by linked or bound, but never by water, and we know a lot of statistical things too, not at the same scale, but we're good at it, and we use it, so we use it in passing sentences, for example, we make predictions about what the other person is gonna say, but then again, if the other person surprises us.

0:23:25.5 GM: We can usually figure it out. So a lot of comedy is based on saying something that isn't expected and having the listener figure out that thing, and if you had a system that only kind of makes predictions, you can think of GPT-3, which is the most famous language system right now as a really amazing version of auto-complete and auto complete is pretty useful and we auto-complete to the senses, but we also have to deal with the unexpected that symbols are actually really good. And this why we need to bring both of these traditions together.

0:23:57.9 SC: Well, you have the example in one of your papers that I really liked of children learning how to make the past tense in English, where there's a rule... You add -Ed, I podcast, I podcasted, but then there's all these irregular ones where it doesn't follow the rule, and so it's kind of like for the regular verbs, it's a symbolic kind of manipulation and for the exceptions, it's more like a deep learning kind of thing.

0:24:24.9 GM: Exactly. And that's actually what my dissertation 1993 was about, exactly that it was about these split systems. In fact, I wandered off from AI for a long time, 'cause I just found it kind of really not very inspiring and came back around the time of Watson, 'cause I was surprised that Watson actually won at Jeopardy. I can tell you why. I think it went, but I was surprised and not often surprises, and as a scientist, when I'm surprised, that really wakes me up. And so I was real woken to AI in 2012, I guess, or so by Watson, and then around the same time, Deep Learning was popular, and I was like, Oh man, I've seen this movie before because the stuff that I was working on for my dissertation, which included those regular and irregular verbs, which Steve Pinker called the fruit fly of cognitive psychology or something like that.

0:25:17.7 GM: All that... It's the same issues, went into my thesis looking at how children were doing things has come up again now, when we try to figure out, Well, what can deep learning do for us and not... And it's like it can do the irregular verbs it's not so great with the regulars...

0:25:33.0 SC: Right, but it's interesting because... So deep learning is very, very good at some things, obviously, we've had tremendous success playing chess, playing Go, protein folding is a new success that the Alpha...

0:25:45.1 GM: Can we pause there. The success in the protein folding and the success in the games actually depend on hybrid systems and the media coverage of that. And even the internal understanding in the field doesn't, I think realize how much the hybrid stuff is important, so for example, if you just took the deep learning part of AlphaGo and did not have all of the search stuff, the monte Carlo search there, it wouldn't be that good. And similarly, alpha-fold has a whole lot of very careful structured representations around the nature of the three-dimensional geometry that it's trying to solve, and it's not just a simple multi-layer perception, I'll put an arbitrary data, get arbitrary data out and I'm good to go.

0:26:37.0 GM: So oftentimes, the field kind of pumps up the deep learning and doesn't really talk about the other piece of it. I'll give you one other example, which is open AI had this example of the system called solving the Rubik's cube with deep learning, but if you actually read the paper, the part of it that I would think of is solving, which is like knowing which face to turn when was done entirely by a symbolic algorithm and they didn't mention that the deep learning was doing the motor control, and it was a nice contribution to a motor control, not as nice as they made it out to be, but it was a real thing to be able to get the system to turn in one hand, Rubik's cube the right time, but the cognitive part of what should I turn in the Rubik's cube, which is kind of the part that makes it interesting to the average person, they pick it up and they can do the motor control, but they don't know how to do the other part that was done by symbolism and none of the media accounts talked about that, and there is a mystique associated with deep learning right now, but often it's actually just part of the picture, and that just gets completely lost.

0:27:38.9 SC: Sure, but what I'm trying to get at is the fact that these successes don't easily generalize, we think about chess and Go as quintessence of intelligent thought. Right, but in some sense, they're really, really simple, the rule set is very, very simple, and it's mostly a matter of having enough capacity and computing power to think about it, then obviously there's a tremendous amount of cleverness that goes into designing the algorithms, and I take your point that is a hybrid kind of algorithm, but my impression is that if you change the rules of the game by a little bit, you change the rules that you're allowed to do with the stones with Go or how the pieces move in chess. The algorithm that was the World's champion at the regular rules wouldn't be able to adapt very easily to the new rules whereas in human beings... Whereas a human being could adapt pretty quickly because it has more heuristic understandings of what positions are strong and things like that...

0:28:37.5 GM: Yeah, I pretty much agree with that. It could start over and learn a new game. I think that the things that they might have built are pretty good at games that where there's a closed world and you can gather an infinite amount of data for free, so related to your point and trying to scope out what is the generalization and generalizability. So these systems are good at closed worlds where the rules are ideally haven't changed in 2000 years and you can play it yourself and you get as much data as you want, they don't generalize as much to the real world because usually don't have the same kind of fixed set of rules, and it actually is costly to get data, so if you're trying to figure out what I should do today with my life, you can't get infinite data and you can't solve...

0:29:25.7 GM: Well, I'll give you an example of an article that I had somebody made... The article was with Ernie Davis in the ACM journal, and the illustrators came up with a great picture, which was... We were talking about common sense reasoning and the importance of it, they had the picture of a robot on a tree cutting the limb from the wrong side, such that if it succeeded in the cut, the tree limb was gonna fall down and so was the robot and so that's an example of something you can't get by infinite self-play, right, you don't wanna fall out of a lot of trees, you want have some other way of getting to that source of knowledge if you can work in a hermetically-sealed problem where there's kind of no.

0:30:06.9 GM: Influence from the external world, and it's always the same, then you can use this kind of brute force approximation, but if you have to deal with things, you can't expect... It's problematic now, it's problematic for other approaches too, nobody has a great AI solution to dealing with the unknown, so you might remember when Long-Term Capital failed and a billion dollar epic mess up, a bunch of novices had a model of what they thought would work and they didn't realize that you could have problems with the Russian bond market that would influence this other stuff, that wasn't a deep learning failure, that was a failure of models though, and we don't know how to make models in general versatile enough to deal with the unknown. I was not a fan of Rumsfeld, but his point about unknown unknowns is actually a good one, human beings are better at dealing with unknown unknowns, at least in some cases, the... Any technology that we've currently developed, so if you imagine trying to make a domestic robot right now, the Amazon's got something that they're talk about, there's just a lot of stuff that comes up that nobody has anticipated, and if all you're doing is kind of looking stuff up in a database of what you've seen before, at some point that breaks down, so to your point about generalization, nothing really unforeseen happens in Go if you've played yourself 20 million times, but in the real world...

0:31:32.7 GM: It's snowing in Vancouver and that's not really happening, and I need to cross the street and I can't even see the street, and now what do I do it? And Systems aren't really built around that.

0:31:42.6 SC: Well, we can see that there's gonna be some trade-off between letting the algorithm learn by itself versus giving it some structure, like you mentioned for the protein folding where it's not just consider every configuration in space. There's some pre-existing ideas that are built in there, my impression is that for chess and Go, the lesson was, don't spoil the algorithm by teaching it human tricks because it'll learn faster just by playing against itself.

0:32:13.2 GM: Yeah, that's true. At this moment in the history of AI, I don't know that it's always true. So another weakness in CERN AI, we just don't know how to leverage existing knowledge, we don't know how to specify it and we don't know how to use it, there are some domains, it's actually fine, so we can do taxonomy, so if I tell you that a Penguin is a bird... You can make a bunch of inferences about that and realize that it's gonna breathe and reproduce, and you say there are some things where we can take a bit of knowledge and extend it further, but we don't know how... First of all, in the case of go, we don't... Often don't know how to represent the expert knowledge, in some cases we do, and we don't really know how to use it, and it turns out in that domain right now, as you say, but with emphasis on the right now, it is easier just to do brute force and just start over, basically. Than to have a bunch of expert Go players tell you stuff, although there was actually an expert Go player who was an author on one of these papers... Did say some things, and there are some things that are built in because we know how to do them...

0:33:19.9 GM: We know that its rotation invariant, again, as a physicist, you know what I'm talking about, I can rotate the board, I can flip the board, and basically I wind up in the same conceptual space. So there are some things we know how to build in, but here's another example, General Intelligence ought to be able to re-Wikipedia and use all that information in order to make all kinds of decisions, like to help us with material science or medicine or whatever. And we don't have systems that can do that, that can sort of like take the results of hard one human knowledge or it could be about almost any domain and put them in... So right now on the systems right now, the systems that we have are mostly kind of blank slates, they get whatever they know by having all of those nodes line up and balance out in the right kinds of ways without much influence from the knowledge of the world, and it's cool when it works, and in some domains it works well and others it doesn't, but like Here's another case, driving, you would like to be able to just put in the rules of the California driver's code and stick it in with your deep learning system, but we don't know how to do that. We just don't.

0:34:43.3 SC: Well, language is a great example, and you alluded to it several times already, GPT-3 is the system that everyone talks about these days, maybe you can tell it's just how GPT-3 works. What's so interesting about it? You need to be honest. I'm less impressed with its results than many people seem to be...

0:35:04.1 GM: Well, I may be even less impressed than you are, but many people are, true fact most... To me, it's a kind of parlor trick, that's actually a mistake in the evolution of the AI, what it does is it's another of these systems more complex than the one that we talked about before, but it's still basically about setting weights for connections. It has some prior structure around attention to help it know about relations between words over a certain space and time, there's things called positional encodings, and we don't have to go into all the technical details, but the basic framework, if you will, is you get a prompt, it sees some set of words, and then it predicts what might follow, and in this way, it's kind of like the mother of auto-complete, so you can type in anything and it will continue in that same style.

0:36:00.2 GM: In some ways, it's astonishing, so you type in something that looks like a movie script and it'll continue often with the same characters and in the same format, and all this as a kind of surrealist generator is fantastic, type in part of a story and it will continue the story so why am I not an enamored of it? When it's capable of doing some really cool things, I would not dispute that it can do really cool things, and also it's capable of being really grammatical, which earlier systems were not, and it's kind of astonishing in how it does that, nonetheless, I think that it's misguided and I think it's misguided because there's no real semantics there, there's no underlying understanding of what it is talking about, and this manifests in different ways, so it will give you flu and speech, I wrote an article that was supposed to be called GPT-3 bullshit artist.

0:37:03.0 GM: The editor wouldn't let me call it that. So I can call GPT-3 Bloviator. They did allow us to have our conclusion which is that it's a fluent spouter of bullshit. And we had examples like this. You're thirsty, you have some grape juice, but not enough, so you look around, you find some Cranberry juice, you sniff it, you pour it into a glass, and then you... And GPT auto-completes. And so it says, then you drink it, which is plausible, statistically speaking as a continuation, and then it says you die. Most people don't die by having cran-grape juice. It's usually pretty harmless stuff. So the system doesn't actually understand anything about toxicology or why you might die, it's just statistically speaking, the probability of the word die after you sniffed and your thirsty and some corpus that it is... Learn from happens to be high. And I think that illustrates what's really going on there is it's just looking for corpus, through the corpus for correlations. It doesn't understand what these correlations are about. And that leads it to a weird position in terms of what it does. You can't type in an idea and have it formulate that idea in words, which is what classic computational linguistics tries to do. It can only do this game, and then people work around this game of, I'll feed my thing in and hope that it continues.

0:38:18.7 GM: And what they wind up, for example, with is a lot of toxic speech and deep learning just had like 10 people working on the problem, and actually there are hundreds in the field trying to make these things not be as toxic. And there's no solution there because you just have the correlations, you don't have an underlying system where you can query it the way you could query a database. So you can query a database and say, "How many people of this age group are here or whatever?" You can't query GPT and say, "Are you making a toxic remark, or are you singling out a... " It doesn't know. It's just statistical correlations between words. And so people are trying to put all these Band-Aids on top of it to make it less toxic, but it's not gonna happen, the technology does not really afford that. And then it has a truthiness problem. So it's very fluent, and so it's easy for it to make stuff up and you not notice. And so we'll make up whatever anti-vaccine stuff, if that happens to be in Davis, it has no idea what it is that it's spouting. And again, no Band-Aid that will solve it.

0:39:22.7 SC: You used an analogy... You had an analogy which I thought was very illuminating with the guy who won a Scrabble tournament in French, even though he spoke no French, because he just sort of memorized a list of French words that would be really useful in Scrabble.

0:39:35.8 GM: Yeah, I went back to that book by [0:39:37.5] ____ to try to find... I think those people call them word tools or something like that. They don't know what the words are, so they're just using word tools and that's exactly what's going on.

0:39:46.0 SC: That's why I don't even like playing Scrabble against other people, because if they're good, then they've memorized all these little words that fit very well.

0:39:53.5 GM: Two-letter words, man. That's where it's at.

0:39:54.2 SC: Terrible. But the thing that got me, I was actually... Before I understood what it really was about, and I had seen some of the hype about GPT-3, I thought maybe it'll be fun to do a podcast where I interview GPT-3 and had it voice synthesized, but then I realized the very basic fact that it has no memory. So it doesn't remember what you just talked about, one question before, and so there was really just no... After five minutes, it becomes highly un-amusing. [chuckle]

0:40:21.0 GM: I'm doing an art project around that notion, and I did some interviewing of GPT, and I would ask questions like, "Are you a computer?" and it would say yes. I would say, "Are you a person?" it would say yes in the very next sentence. It doesn't remember.

[laughter]

0:40:36.8 SC: So why not? Why can't you just add some memory in there? What is the conceptual leap that makes it hard? Or is it just that it's...

0:40:44.7 GM: That's a really good question. That's an A question there for sure. What is it about the nature of the system that makes it non-trivial to just add memory? And some people have tried certain kinds of things. It's just built from the foundation in a different way. It's built from the foundation to correlate little bits of information, like the probabilities of these words following those words, and it's not built to have a representational scheme. It does not contact a representational scheme about these are the entities in the world and these are their properties. It's just built on a completely different path. And maybe there's some way of merging them. And I do think that ultimately the answer to AI is gonna come from merging at least some of the insights from the GPT tradition with some of the insights from the more classical AI tradition, but I don't think it's gonna come literally from merging GPT with these other systems, because GPT does not have the internal representations that you need. It'd be like saying, I've written this big computer program, but I'm not gonna let anybody else see what's inside of it, and now I just want you to hum and I hope that they match together. They're not.

[chuckle]

0:42:00.4 SC: Yeah, there needs to be some planning to make them work together.

0:42:01.9 GM: There needs to be some planning around, what are gonna be what we call technically the interface conditions? And it doesn't have an API, to use computer geek terms, an API where you can say, "Hey, what are the people that you're talking about right now? What are the assertions that you've made about them? What are you pre-supposing?" And you can't build the API because it isn't there.

0:42:26.3 SC: I mean it seems to me...

0:42:27.7 GM: Sometimes it looks like it's there because it looks like it's coherent, but it's a superficial illusion of the fact that it's drawing on this vast database of things that people have said. You can't build the API to do it.

0:42:38.2 SC: It seems like this is a pretty strong argument just by itself for deep learning or that kind of statistical correlation to be a tool used by a symbolic manipulator. You need some view of the world that is represented symbolically, but then by all means, have some deep learning help you with what the correlations are to predict what's gonna come next.

0:43:01.8 GM: Well, there's a narrow version of that and a broader version, I guess. The narrow version I think is actually wrong and the broad version, I think is right. So the broad version is, yeah, we need to have symbol systems rely on learning systems to do some of their grounding about what the symbols are about. And I think that's the broader argument that you're making, I think is just right. The narrower version, I don't think GPT itself is actually the right tool for doing the grounding, because it doesn't have those interface conditions. It hasn't been built from the ground to land in the right place. So you're trying to have two sides of the bridge or the tunnel meet up and it just wasn't built that way. But I think the idea of building that tunnel is right, of like, let's figure out what these systems are good for. There are lots of opportunities in the world to be tracking correlations, but you need to have respect, I think for where you're trying to wind up. And as a cultural matter, as a sociological matter, the deep learning people for about 45 years have been... Or no, actually like 60 years, have been aligning themselves against the symbol manipulation.

[laughter]

0:44:13.2 SC: Okay, well this is why we're on the podcast, we're gonna change that.

0:44:16.5 GM: Sorry, say again?

0:44:16.9 SC: This is why we're having this podcast. We're gonna change it. You're doing...

0:44:20.5 GM: I was about to say it might be changing a little bit. So Geoff Hinton, who's the best known person in deep learning has been really, really hostile to symbols. It wasn't always the case. In the late '80s, he wrote a book about bringing them together. And then he at some point, went off completely on the deep learning side, now he goes around saying deep learning can do everything, and he told the EU don't spend any money on symbols and stuff like that. Yann LeCun, one of his disciples actually said in a Twitter replied to me, yesterday, "You can have your symbols if I can have my gradients," which actually sounds like compromise. So I was kind of excited to see that.

0:44:56.2 SC: That does sound good. Sometimes people can say they're on opposite sides to really be pretty close to each other. There's one example I wanna get on the table because it really made me think, and I think this is the time to do it, which is the identity function. You talk about this in your paper. So let's imagine you have some numbers, they go through a process that spits out an output from the input, and every single time the output is just equal to the input. So you put in 10010 binary number and it puts out the same number. And you make the point that every human being sees the training set, here's five examples, and goes "Oh it's just the identity function, I can do that, and extrapolates perfectly well to what is meant, but computers don't, or deep learning doesn't.

0:45:41.9 GM: Yeah deep learning doesn't. I don't think it means that computers can't, but it means that what you need to learn in some cases, is essentially an algebraic function or computer program. Part of what humans do in the world, I think, is we essentially synthesize little computer programs in our heads. We don't necessarily think of it, but the identity function is a good example. My function is, I'm gonna say the same thing as you, or we can play like Simon Says, and then I'm gonna add the word Simon says to the ones that go through are not the ones that don't go through. Very simple function that five-year-olds learn all the time. And it's done as a function that applies to a whole bunch of different inputs. So you can say "Simon says touch your finger to your nose, or Simon says put your phone in front of your nose, or Simon says put your wrist strap on your head or whatever." Your viewers can't see me doing these ridiculous things but I'm glad you're laughing. And so you can do this on an infinite set of things, and that's really what functions are about and what programming is about, doing these things with an infinite range.

0:46:44.3 GM: Identity, this is the same as that. You learn the notion of a pair in cards, you can do it with the two's and the three's and the four's, and now I make a new deck, I don't have two's and three's and four's, I have, I don't know, stars and guitars, and you can tell me the pair of guitars means two guitar, you've taken that new function, put it in the new domain. That's what deep learning does not do well. It does not go over to these new domains. There are some caveats around that, but in general, that's the weakness of this system, and people have finally realized that. Nowadays people talk about extrapolating beyond the training set. But the paper that you read, I don't know which version, but I first was writing about this in 1998, is really capturing that point. It took a long time for the field to realize that there are actually different kinds of generalization. It also goes back to the past 10 stuff. So people said, "There's no problem. Our systems generalize," and I said, "No, they're these special cases." And finally, now they're saying, "Oh, they're these special cases when you have to go beyond the data that you've seen before." And really that's the essence of everything where things are failing right now.

0:47:44.0 GM: So let's take driving. These systems interpolate very well in known cases, and so they can change lanes and the environments they see, and then you get to Vancouver on this crazy snowy day that nobody predicted and you don't want your driver-less car out here, because you now have to extrapolate beyond the data and you really wanna rely on your cognitive understanding where the road might lead because you can't see the landmark anymore. And that's the kind of reason they can't do it...

0:48:10.3 SC: Your identity function example, it raises an interesting philosophical question about what the right rule is, because it's not like the deep learning algorithms just made something up, but you gave an example where the training set with a bunch of numbers, it all ended in zero and the other ones were random and so we figured it out, but the deep learning just thought the rule was your output number always ends in a zero. And the thing is that that is a valid rule. It didn't just completely make it up, but it's clearly not what a human would want the conclusion to be. So how do we...

0:48:44.1 GM: I've been talking about this for 30 years. I've made that point in my own papers. You're the first person to ever ask me about it.

0:48:50.6 SC: How do we formalize...

0:48:51.4 GM: Which brings joy to my heart. It's really a deep and interesting point. It's not that even when the systems make an error, it's not that they're doing something mathematically random or something like that, they're doing something systematic and lawful, but it's not the way that we see the universe. And in certain cases, it's not the sort of functional thing that you want to do. And that's very hard for people to grasp. So for a long time, people used to talk about deep learning and rule systems. It's not part of the conversation now as much as it used to, but they would say, "Oh well, the deep learning system learns the rule that's there." And what you as a physicist would understand, or what a philosopher would understand is the rules are under-determined by data. You need something... There are multiple rules. An easy example is if I say two, four, six, eight, what comes next? It could be 10, but it could be something else and you really want some more background there.

0:49:47.4 GM: So it turns out that deep learning is mostly driven by the output nodes, the nodes that at the end giving the answer. And they each learn things independently of one another, and that leads to a particular style of computation that is good for interpolation and not so good at extrapolation. And people make a different bet. And I did these experiments with babies to show that even very young people make this different bet, which is, we're looking for tendencies that hold across a class of items.

0:50:17.4 GM: We're looking for the rule, that's just how we're built. Sometimes that gets us in trouble. There's a word apophenia, which is like looking for patterns that aren't even really there. And so sometimes it doesn't serve as well, but it very often does, and language is a great example where it serves us really well. You learn a grammar and then you can apply it to any words that you can throw in that grammar, even novel words. So if I told you that this thing was called a blanket, then we can start talking about blankets right away. I can say, how much does that blanket cost you? Would you sell me your blanket? Would you recommend the blanket? Is there something, an alternative to the blanket? You're off to the races with one training example because you put it in the context of something that is rule-governed where you have a grammar that tells you not only the syntax, blanket... Plural... The morphology is gonna be blankets and it's gonna tell me how I can use it with verb and noun, but also semantics, you can know that I probably mean an individual-able object, a single object that I can go and count and, you know all this kind of stuff right away because you have a world model that you map your language on to. That's what it's really about...

0:51:22.6 SC: Well, and this is exactly where I was gonna go with this, because what is the way that clearly, with all this setup, we need to give our world model to our computer friends, to our artificially intelligent friends, and how do we do that? Is it that we human beings need to formalize our sort of manifest image of the world, our picture of common sense and then turn it into a bunch of symbols or is it... I think I know what the answer to this is. Could we deep learn our way into common sense? Could we just... Is there a way of letting computers figure out the same kind of common sense that we have?

0:51:58.1 GM: I take a view that I think is a little like what Kant was trying to say in the Critique of Pure Reason, although I'm never sure I've completely understood that book, but he talks about having basically prior knowledge of space and time, and I'm sorry I haven't read your book, which would be super relevant at this point, but I think I'd like to hear your take on it, but my view is you can learn a lot, but that you need a framework to learn... To embed that knowledge. And so minimally I think you need to know that there is a space, that there is time, that there is causality, that they're enduring objects in the world, and some other stuff, but stuff like that. And I believe that there's some reasonable evidence from the animal literature and the human infant literature to think that these things are in humans innate. I think you need to start with that or else you just wind up with GPT. And in fact, I think GPT is a brilliant experiment, unintentional, but brilliant, on the idea of, could you just learn everything? Like let's say, from words or from... People haven't really done it from pixels, but I think you'd wind up in the same place.

0:53:06.7 GM: And I think the answer is no, you don't wind up with the API I'm talking about if you don't have prior notions about enduring objects that you're talking about. Then you just... You're just in correlation soup. And it's made the best job of correlation soup that I've ever seen, but it's still a correlation soup and it doesn't really connect to those things, which means that it can't know that it's ridiculous to say I'm a computer and a person in one breath or two breaths. It doesn't have the framework to know that things don't tend to change too much over time 'cause it doesn't know what time is. So I don't fully have an answer to the question that you posed a minute ago, but I think it starts by saying, we're gonna learn some stuff, but it's gonna be relative to a framework where we have some basic knowledge about the world to start with, that there are these enduring objects, etcetera.

0:53:58.8 SC: I mean, just to emphasize how tricky all this is, I think it may be undersells the difficulty if we just think about there's space and there's time and there's objects and they have solidity, because there's also, number one, there are relationships between these objects, there are functions that they have, you already mentioned causality, you mentioned the fact that there are values. You don't chop up the sofa to put it away 'cause that's something that is already away in some sense. And so it's gonna be quite a trick. And I guess what you're saying is, I think I agree, you're saying that the computers are not just going to learn all that stuff by looking at correlations, but there's still a tremendous program out there in front of us of figuring out what it is we want them to know ahead of time.

0:54:42.1 GM: Yeah, so the most impressive shot on goal, to use a kind of a cliche I've heard much lately...

0:54:48.1 SC: In Canada.

0:54:49.5 GM: Yeah, right, but I heard it a lot around the vaccines and rightly, I think people said, "This is gonna work. There's so many shots on goal." And they did. We underestimated the human capacity to ignore data, but that's another issue. So Doug Lenat built this thing called Cyc, which you may or may not know about, CYC, he's done it for the last 35 years and it was an attempt to put all of common sense knowledge or a large fraction of common sense knowledge in machine interpretable form, and it hasn't been the home run that he thought it would be. And I think people are drawing the wrong lesson from his lack of a huge obvious success. So like ain't no Google with it, and he may have hoped that he would have, but I still think what he was trying to do was right. I think maybe it failed because it started too soon with a different set of tools than we would do to use the project that he was trying to do. But I think the project is right, that we're not gonna solve this AI problem or general intelligence problem without having a lot of knowledge in formats that machine can leverage.

0:56:04.9 GM: So you need to know if you're predicting about grape juice and cranberry juice that they're both juices and that other things being equal, you can mix juices together and you won't die or whatever. And there's a question about what level of specificity you want all of that stuff to be in. Do you wanna derive everything from quantum mechanics? Do you wanna have intermediate representations at the level of juice, which is what people do? But you need some kind of knowledge that machines can reason over, and he built something like 1100 micro reasoners that reason over things like economics, and I don't know if beverages are in there or not, but lots of little domains and desires. The most impressive thing he has, and I write about this in an article called The Next Decade in AI, and give the reference to his article which might be in Forbes or Fortune, I can't remember which...

0:57:00.0 GM: He goes through this example with Romeo and Juliet, where the system is actually able to reason about something complicated, like what Juliet thinks is going to happen when she drinks this portion is going to fake her death. That's really sophisticated stuff. And he shows that his system that has this common sense knowledge can make good inferences around that, and nothing in the Deep Learning tradition can do anything like that, and it's a proof of... Conceptual proof, proof of concept that if you have the right knowledge, you can actually get machines to do really rich inference, but there's also an answer around it, which is like, it doesn't just read the Shakespeare and make this inference rather, he has converted the Shakespeare into a set of logical propositions, and then the system is able to reason over those logical propositions. And so I guess the skeptic would say, well, that's the whole... He's left out the whole problem. [chuckle] I'm a little bit more optimistic. I think he has left out a huge problem, but also showed that another part would be solvable if we do one piece of it. But there's a whole interesting set of issues that don't even get talked about that much, which is like, can we really do this without knowledge sort or like what he was doing...

0:58:11.8 GM: My answer would be, no, we need something like what he was doing, but also that he did it in the '80s and we know a lot about, for example, statistical representation of information that he didn't have the tools to use then, so you want a lot of distributional information, you don't wanna just discretise things into logical bins, you also wanna know what's typical... And he doesn't have a lot of that kind of stuff represented, so you do it differently if you did it now, but I think what he was trying to do is still of the essence.

0:58:39.8 SC: I'm reminded of several years ago, Chris Anderson, who was the editor of Wired at the time, wrote some little piece saying that theory is dead in Science.

0:58:47.3 GM: My least favorite article of Chris Anderson in Wired's of all time admittedly.

0:58:49.4 SC: And his logic was, look, if we have enough data, we can just figure out what all the correlations are... Who needs a theory? And I wrote one of the responses and I said, Look, Tico, Tico I should say Brahe, the famous astronomer, collected a lot of data, and people like Kepler, his protege found some correlations in the data and constructed some very useful rules, and that was good, but it was when Isaac Newton came along and invented a theory to explain why Kepler's rules were there, that was when we really understood something because then we could just talk about beyond going extrapolating beyond the data sets, etcetera. So in some sense, maybe the worry about the problem with deep learning is that we're too good these days at being Tico and Kepler, 'cause we're able to manipulate these huge data sets, but true understanding won't come until we are able to abstract a simple set of rules which will be a little bit more robust than the original data sets...

0:59:52.2 GM: Well, I think even getting to Kepler would be progress in that most of the work in the sort of AI scientific discovery stuff builds in the answer in some way or another, and what Kepler did that was awesome, was to kind of come up with his own answer he wasn't like choosing from three templates or something like that, there's a really cool paper by Josh Taillon and Charles Camp, where they see data and they infer, does this follow a ladder or a circle or whatever, these kind of conceptual relationships, but all the choices are built in the beginning, and it's still a cool paper, but the really cool paper, which nobody knows how to write... I don't know either. Would actually induce that these are even the logical forms that you should think about, and that's... Maybe that's close to what Newton did, but I would give Kepler some credit for that too. These are problems some people sometimes talk about about extracting what the variables are that they even wanna talk about, and I think that's often the critical thing where sometimes there's a billion different choices and you need to know... This is the one that I care about.

1:01:01.2 GM: It's actually impressive. Even that children ever learn what integers are, for example, this is the kind of thing that... If I were still a professor, I retired as a professor young, but if I were still a professor, I would be telling everybody, work on this problem. How is it that a Susan Curry is actually telling people this, is how is it the kids actually figure out what integers are, that's an example of a conceptual apparatus that's incredibly valuable, and yet it's not obvious that it's innate, it's pretty obvious that number is innate. So many animals have some conception of approximate number, that 12 is more than seven like any animal can figure that out, and most animals can figure that out, but knowing what a discrete countable system is where you can have infinity, that's a pretty cool intellectual accomplishment. And kids do it, they do the same thing when they learn to read, some kids don't, but most of them do, but a harder example is fractions, the median split on SAT math, is apparently do you really get what fractions are or not.

1:02:10.1 SC: Do other primates understand integers?

1:02:11.8 GM: That's another form of this... Say it again?

1:02:14.7 SC: Do other primates understand integers?

1:02:17.1 GM: You can get them to count, the extent to which they understand integers is not totally agreed on a...

1:02:24.0 SC: Okay. I don't know.

1:02:25.4 GM: There's some controversy in the literature, they can at least do things like remember a sequence of small integers, whether they get to the point of realizing, Hey, I could just keep going with this forever, if you just teach me the right words for it, I don't know.

1:02:40.7 SC: Well, it goes right into what I wanted to ask next, which is the extent to which being inspired by Biology and Evolution and actual human reasoning is useful, like evolution is not goal-directed, it was not set up to try to build a perfect computer and the human brain is really good at driving and talking and not so good at playing chess or multiplying big numbers together. Do you think that we can take how evolution got us to where we are as inspiration for this program of hybrid systems?

1:03:13.9 GM: I think the right word is inspiration, right. There's this field of bio-mimicry, and I think that the moral of that story is there's often useful stuff and then there's stuff you don't wanna copy, so you don't want to build your theory about how to support objects around the human spine is a terrible solution to supporting this heavy thing right on the top of the stack, and that happens to be there because we were quadrupeds, and it was kind of evolutionarily cheap in the sense of being likely to rotate the quad had 90 degrees and then you're vertical and you're a biped and it's great, but really like a tripod would have been a whole lot better, and so you don't wanna copy everything about our design, in fact, I wrote a book called Kluge, which was about all the things that I think are lousy about human cognition starting with...

1:04:00.7 GM: Or focusing on things like confirmation bias, where we notice evidence for own theories. Our political system right now is an epic morality tale in confirmation bias and how bad that is, so you... You don't want your AI system to be subject to confirmation bias, where it comes up with theories, notices evidence for those theories pats itself on the back and ignores the counter theories, this is the last thing in the world you would want any AI system to do.

1:04:25.9 GM: So we don't wanna copy the biology, but we do wanna learn from it, so there are things that people still do way better than machines, even though there are things we do really poorly, so we wouldn't wanna copy the memory systems of people 'cause they're not that great, but on the other hand, they're cue-driven in a way that's kind of cool, and maybe we can kinda do that in AI now.

1:04:47.5 GM: But the way that people can understand semantics in relation to a syntax... That's really interesting, we don't know how to do that with machines, maybe we'll figure out a better way to do it than people do, but right now, the only game in town is people... So let's see if we can learn from that.

1:05:04.6 SC: Can you explain that issue without... We're defining the word semantics and syntax as you're using them.

1:05:11.7 GM: That issue is how do you relate the meanings of words to the ways in which you assemble them and derive the meaning of a sentence in terms of its parts? And it turns out that GPT can actually replicate the assembly of the parts into a grammatical sense, but it can't relate that to a situation in the world that is being described by that sentence, and it certainly can't go back, it can actually go in either direction, you can't give it a situation in entities and expect a sentence that will validly describe them nor go the other way and get the sentence and figure out, whereas you and I, that's what we're doing sometimes imperfectly, but we're trying to grasp each other's meaning. So you're building a model, what is Gary is actually saying there, and we have a limited bandwidth and whatever, but we get there and the machines don't really have that capability right now in a general way...

1:06:09.4 SC: I guess I'm just wondering how much... You went back to count a little while ago, but how much innate knowledge in the human brain is crucial to this kind of reasoning that we do in extrapolating, and is that something that would help us figure out what to build in to a good hybrid AI system?

1:06:26.9 GM: First thing I'll say is, it's controversial, nobody knows. I spent the first two thirds of my career as a developmental psychologist/cognitive psychologist, thinking very deeply about this, and I wrote a book called The Birth of the mind, which was about how you might get innate structure given the tools of developmental neuroscience or molecular biology. And what we know about developmental neuroscience and so forth, and so I've thought about these things a lot, and the honest answer is we don't know exactly what's innate... The best work, I think, is done by Elizabeth Spelke, a developmental psychologist at Harvard, but there's a lot of work out there. My best guess is that we have at least about a dozen things that are innate, and it could be a lot more so a dozen things include things like the ability to represent these abstractions that we are talking about, the ability to distinguish between types and tokens so I know that this water bottle as opposed to water bottles in general, space time, causality, all of those kinds of things are like form a bare minimum, and I've written about this occasionally, and then you could have a lot more I often point to the last chapter of Pinker's first popular book The Language Instinct, where he runs off a list of 15 things that includes, I think I'm quoting verbatim, a mental Rolodex.

1:07:43.6 GM: And maybe that is innate. Some things you might be able to derive if you had others, if you had a cost benefit system, which I think is innate, and you had abstract variables and you had a few other things, maybe you could acquire some of the others, and then there's a tension in the developmental psychology literature where well I actually won't just call it tension a mistake, a foundational mistake, which is that people think that if something could be learned, it is not innate, but that's wrong.

1:08:19.1 GM: So there may be many things that could be learned, but maybe Biology has chosen, so to speak and you know all the ways in which I'm being... But has lighted upon solutions that make those innate, 'cause it's a whole lot safer or faster or whatever. So you think about a baby Ibex scrambling down the mountain, it is not working out online, the physics of objects and slopes and stuff like that, that is there, maybe or a honey bee can calculate the solar azimuth function and extend it to lighting that it's never seen before, if you do the right experiments, so there's clearly innate stuff there about physics and observation and so forth, so there may be a lot more than just the 10 or so things that I'm talking about, but even those 10 have made me kind of like public enemy number one in the machine learning world where they wanna learn everything from scratch, and they're like, Why is Gary's always on about this and needing this stuff.

1:09:09.8 GM: How do you know I'm gonna... I think the... One of the biggest problems with the field of AI is that right now it is dominated by a group of people who do machine learning, and there's the old saying of To a man who has a hammer, everything is a nail. And so the people in machine learning have made astonishing progress in some ways in the last decade, and so they think that the tool that they have is that tool, whereas I think the right thing is to say, congratulations. That is an awesome tool. We thank you for it now, let's see how we could use it in combination with other tools to do even more awesome things, but it's been a bitter battle getting people to even think about that.

1:09:48.3 SC: Well, one of the possible ways of thinking about this is, I kind of don't wanna think of AlphaGo or whatever as intelligence almost at all, it's very good at playing Go, but it doesn't remind me of a human being in very many other ways.

1:10:07.8 GM: Certainly it's advanced. I would say that intelligence is multi-dimensional and some of the things that I think are fairly counted as intelligence or doing that kind of computation and it's fine to call it as long as you realize that it is multi-dimensional and there are other dimensions where it's not even showing up to that so one definition of intelligence is adaptively solving unknown problems and it doesn't have that at all.

1:10:33.7 SC: Yeah, well, and is the general goal of trying to make AI equal the capacities of human intelligence, the right goal or should we just be saying...

1:10:45.0 GM: We shouldn't do equal, we should go for exceed. Danny Connorman and I had this conversation, we sort of came up with this phrase together in a way, it was at a panel, which is humans are a low bar, I think that was his conclusion. And I said, and yet we still can't exceed it yet...

1:11:00.2 GM: We surely should want our machines not to be human-level intelligence, but to be way smarter than us. So if we're gonna trust AI as much as we seem to want to. It's gotta be good. Right, if we're gonna put it in charge of stuff, whatever that stuff is, it better be able to not be subject to confirmation bias, it better not just perpetuate racist stereotypes from the past, but actually be able to put value so that it's not just interpolating, but extrapolating to the world that we want to have. And that means it's gonna be better than most people are, better than all people, that should be what we're aspiring to... What we're settling for now is we've got these cool tools and they can do some stuff, and sometimes they actually tell people to commit suicide or say racist things or whatever, and we're like, but I get really good recommendations from Amazon, so it's okay, and that's where we are now is... I'm not super thrilled with that.

1:12:02.8 SC: But I guess what I'm getting at is, as you said, I completely agree that there's very many different kinds of intelligence and computers are gonna be... It's gonna be easier to make computers good at some kinds of intelligence than it is at other kinds of human intelligence, and I'm sure both are important, but how do we balance just putting computers to work at the things they're good at versus trying to nudge them to become good at other things that we human beings know and love.

1:12:35.8 GM: I think it starts with what you just said, I often make the... Your point in a slightly different way, which is that people talk about artificial intelligence as if it was one thing, but it's actually many things, it's actually a whole family of algorithms and also databases and so forth, that have different properties. They're good at some things. They're not good at other things. That's going to change over time. So there's the AI of 2022 is different from the AI of 2019, and I sure as hell hope that the AI of 2025 is better than what we've got right now, 'cause it's problematic now. And you can't just talk about it like it's a magic wand it's not... It's a set of tools that are more or less appropriate to certain problems, and so it's totally fine to use current AI for photo taking, 2025 AI will be even better at it, but the cost of getting a mislabelled photograph is generally not that high unless... Then again, you're using it for surveillance, in which case maybe it's really high, so I just saw another one of these examples of somebody who went to jail because an AI system misread something.

1:13:46.1 GM: I think in the book, we gave an example of something in China that gave somebody a speeding ticket because their face, they were an actress, their face was on a bus that went fast or what... And so the wrong person got convicted of the crime, the tools are really appropriate and then not appropriate depending on how they get used, so you can tolerate the error in photo tagging if you're not using it to identify criminals, if you're identifying the criminals, then you probably need at least 2025 AI or whatever, 2030 AI, 'cause then the stakes are so high. Same thing with suicide prevention. You can write a little chatbot that'll make people feel better some of the time, but when the stakes are high, I don't think the tools we have right now are up to it, driving is another example, it's easy to build a car that can follow a lane, you can have like 70 hours of training data and video to show this and you can follow a lane and that's great, but it doesn't mean that you will know what to do on a snowy day, and so we have to be very careful about the laws around driver-less cars and right now, I think Elon Musk is beta testing on public roads, I don't think that's cool.

1:14:55.5 GM: There's been some accidents. And so understanding that AI is actually a heterogeneous thing rather than a single magic wand is important, that that makes it hard, because people want a policy that's sort of about AI writ large, and that doesn't match the reality of we are incrementally developing science and engineering to make things better and we understand some of it and not others.

1:15:21.3 SC: And then without asking you to make predictions about time scales or anything like that, do you see any obstacles to AI being just as good at human beings as human beings are at writing poems or symphonies, and so forth?

1:15:34.0 GM: In principle. No. They're not gonna have the emotions that might drive some of that stuff, it's actually not that hard to write knock-offs of Bach without.

1:15:51.2 SC: Already.

1:15:52.0 GM: Without the emotional resonance... The one piece of your questions is about specifically about creativity, and then there's a larger question, so many particular things that we would define as creative, we can already build machines to do without that connection to the underlying emotional impulse that might lead to something... And so, vocals are hard 'cause vocals are really about emotion, synthesizing a drum beat, logic 12 or whatever is the latest edition you can do that pretty well, right. There's a humanized function to add a little random variation and make it sound like a person... There are certain things that we can do very, very well, and in some cases have been able to actually for 30 years and people are reinventing them with deep learning, but people already knew how to do some of those things. The larger question... Well, sorry, one more thing, there are some artistic endeavors that I think are way beyond current computers though, like a movie where you have to have coherence over a long period of time, so GPT can actually make advertising jingles that are like two-liners pretty well, but it can't keep the coherence that you would need for... For an ordinary film, you could make something, something.

1:17:09.6 SC: Something modern.

1:17:11.8 GM: From the late '60s or pharmaceuticals involved that seemed interesting, but the long form is not a strong point of what we have right now and won't be for a while, but I don't think anything in the realm of cognition is impossible. We are just meat computers, and we don't quite understand how those meat computers work, but they are information processors that take in information and manipulate it and come up with outputs, that's what our brains do, and computers get better at that, and I don't see any principled argument that says 500 years from now, people will still be smarter than machines, I just don't see it.

1:17:50.7 SC: 500 years is much safer than 20 years, I think you've chosen wisely about your time horizon there, but I guess there are people who are worried about existential risks from AI taking over and have different values than us. Do you share those worries clearly giving AIs value systems at all recognizable to us is a tricky situation.

1:18:15.7 GM: So I wrote this piece in 2012 called moral machines for the New Yorker. I was one of the first people to talk about Charlie Problems in AI, where in my particular case, in the New Yorker article was a school bus is out of control, you're on a highway, should you sacrifice yourself? A lot of people picked up on this later, Obama talked about, I now have some regret around that piece.

1:18:42.5 SC: It's all your fault. We've narrowed it down. Okay.

1:18:45.5 GM: There were a couple of us who wrote about around the same time, but I was one of the first, but the thing is that the real challenge right now is much lower to the ground than that, and it's not often that you actually come up with a school bus, but Azma's basic laws. Don't do harm. Just think about that one. Like the model that we have now is around images, I show you a bunch of images and you learn from those images to recognize another... It doesn't work. That model doesn't work for harm, I can't show you a bunch of pictures of harm and really get you to grasp the concept of what harm is, we just don't know how to program what harm is, we don't know how to program really any human values into current technology, and it's actually related to stuff we've been talking about throughout the whole conversation, which is it's kind of an interface thing, we don't know how to specify these things in terms that learning systems would understand, and we can't really do it entirely with an innate set of rules either, there has to be some learning... We'd have to give some examples.

1:19:29.1 GM: There was a film called Chappie a few years ago, where a robot learned its values and one of the lines in the film is the robot is trying to figure this stuff out all out, and the robot has been captured by a bad guy, and the robot's master has said, You can't kill people, and the bad guy is a bit disappointed to discover that the robot knows this, 'cause the bad guy would actually like the robot to kill people, but the bad guy is clever and he works around it and he says, Yeah, you can't kill people, but it's okay to harm them, and the robot is sort of left...

1:20:30.9 GM: To construct its own ethical values based on the input that it's getting, and the reality is that we will get to a point where... Maybe if I already have gotten to a point where it'll be really nice to have AI systems with values, and we have not gotten to the point where we know how to program that in... So one of the problems with GPT-3 is all the toxic language that it produces because it's trained on the worst of Reddit and stuff like that, and we just don't... That already it would be nice to constrain these systems such that they would follow some set of values, and you... You can argue about what that set of values would be, but we don't know how to do it, Deep Mind just had 10 people, 20 people working on this problem and came up dry, nobody knows how to actually constrain these systems to values. They don't have the APIs to plug into and it's a problem, so then when you come to the existential stuff... I'm not worried about it now, I'm not worried about in the short-term robots taking over the world, they don't care about us, they have no motivation do so, and they're frankly dumb right now, the ability to win at a Go doesn't count for anything.

1:21:42.2 GM: Go is actually a great example because Go is about territory, and they don't actually understand anything about territory in the real world, getting better at Go has not made them any more desirous of human territory nor taught them anything about that would be useful in an actual battle like I'm not too worried about those kinds of things in the near term, the near term, I'm worried about mis-application of the AI that we have now, but in the 100-year time frame, it could be an issue. I think it's fine that we have a few people around thinking about these issues now, maybe they never come to pass, but it's good to have some thought into it, it's not an urgent need...

1:22:26.7 SC: I'd like to end on an optimistic note. So maybe what if everyone listened to you, what if you are not the bad boy of AI and everyone said, you know what, more symbols, more variables, more hybrid approaches to join up to our Deep Learning. How would you see AI going in the next few years?

1:22:44.5 GM: I wrote a piece for the times where I argued for the CERN for AI. If people really listen to me, that's what they would do, and I would get people to gather around a particular problem in part, 'cause otherwise, if you have a large sum of money, then people just do their own thing and don't actually coordinate. And the problem that I would coordinate them around is having an AI that can read and understand the medical literature, I think there would be enormous value in that we you could also maybe think about doing the same thing around climate change, read and understand the material science and so forth, either of those would be fine in my view, but have people coordinate around machine reading, and I'm not talking about keyword matching, which we can do very well right now, but having a system, read a the scientific literature, come up with experiments based on what it reads come up with novel solutions and so forth, I think that could change the world, and it would certainly push AI forward, so if I were king for the day, that's what we would do.

1:23:45.6 SC: Alright, now people know it that they can spread the word. Gary Marcus, thanks so much for being on the Mindscape podcast.

1:23:50.9 GM: Thanks, this is really fun.

[music]

6 thoughts on “184 | Gary Marcus on Artificial Intelligence and Common Sense”

  1. So AI can do just so much except imitate those parts of our minds/brains we consider irrational. Of course irrational is not irrational but something we can’t rationally explain and hence model or compute…yet. Nature is still winning the AI battle since irrational drives and motives have gotten us to survive and advance to this point. Who cares if our leaders are tribal narcissistic ego maniacs, they’re looking out for our best interests in the name of a bronze statue and place in the history books.
    Let’s see some AI do that!

  2. One problem with discussions like this is the vocabulary. Concepts such as deep learning for instance. How can you say that a machine is learning when it has no understanding of the meaning of anything? Computers are manipulators of data and with the right program, and huge amounts of data, one might produce results that solve a world changing mystery. But it wouldn’t know it. Not until a person looked at the results and understood the meaning of the findings would there be an Aha moment. The marvelous things that computers do contribute to the advancement of what humans can do, but they are not intelligent.

  3. Human beings have changed radically in the past 10 years.
    I will finish this sentence: “The world has gone ______” and I will have 3 suggestions, ‘crazy, mad, away’ at the bottom of my phone. 3 billion people will use FB and TikTok Tok this month.
    The perception of humanity is severely truncated, reduced, and cut down to these horrible Bayesian produced options. We will completely adapt to what computers can do, well before computers change to accommodate us. Deep Learning will totally replace human thought.

  4. Pingback: Sean Carroll's Mindscape Podcast: Gary Marcus on Artificial Intelligence and Common Sense - 3 Quarks Daily

  5. In some sense it seems unlikely that we can program a computer to think like a human when we don’t fully understand how humans’ reason in the first place. It might be better, in the short term at least, to focus, as Gary Marcus suggests, on the things that computers are good at like working with large amounts of data and having an AI that can read and understand medical literature and doing the same sort of thing with information about climate change, and come up with experiments based on what it reads to suggest novel solutions to every-day ‘real world problems’ like these.

Comments are closed.

Scroll to Top