Have we been thinking about autonomous vehicles all wrong?

October 12, 2022

UM-Dearborn Assistant Professor Jaerock Kwon sees driverless vehicles not as giant computers, but as cognitive beings with bodies.

Not every conversation with engineering faculty starts with a presentation slide featuring 17th-century thinker René Descartes and his signature view of dualism, a philosophy that placed the mind distinct from, and in a hierarchical relationship to, the body. Assistant Professor of Electrical and Computer Engineering Jaerock Kwon, however, is not your typical engineering professor. After starting his career as a software engineer, he completed his doctoral work in computational neuroscience, a field where researchers use mathematical models and computer simulations to understand the development of our cognitive processes. His latest work — “embodied cognitive approaches to autonomous vehicles” — is a great example of his penchant for the unconventional, and the project is as weird and fascinating as it sounds. In a nutshell, Kwon thinks it’s possible that some of the challenges we’ve run into trying to create fully autonomous vehicles stem from the fact that we’ve been thinking about AVs too dualistically. His hypothesis: To make big breakthroughs in autonomy, maybe we shouldn’t be thinking about driverless vehicles just as big computers that control a physical car, but as embodied cognitive beings that have intuitive understanding of how to control their physical presence in the world. Sort of like us.

Assistant Professor Jaerock Kwon

To understand what exactly Kwon means by this, it helps to understand the paradigm that dominates autonomous vehicle research today. Current experimental AVs are outfitted with lots of sensors that allow them to experience their environment, and powerful computing units then interpret that sensor data and relay decisions about how to act to the mechanical parts of the car. A core part of the vehicle’s intelligence comes from machine learning processes that apply labels to relevant parts of its sensed environment. For example, if the computer interprets an image provided by the vehicle’s optical cameras as containing a “traffic signal with a red light,” it will then send a signal to apply a proper amount of pressure to the brakes in order to come to a stop. It sounds simple enough. But an action like that might involve dozens and dozens of coordinated computational processes that take into account dozens more variables, like distance to the traffic light, the distance to a car already stopped at the light, the speed and weight of the vehicle, its center of gravity, the amount of friction between the tires and the road, and so on. Now imagine programming a car not just for stopping at a red light but for all the situations it might encounter — including unpredictable so-called “edge cases,” like interactions with pedestrians or aggressive drivers. It's easy to see that this inherent complexity is a big reason why experts think we could still be decades away from driverless cars hitting the market.

For Kwon, something about this approach always seemed a little strange. Essentially, we’re trying to recreate a set of human behaviors — namely, good driving — but the process we’ve devised doesn’t really resemble at all how we drive. “Think about when you first learned how to drive,” he says. “You weren’t making a bunch of mathematical calculations. All you did was start doing things, and then you adjusted your behavior based on what happened so you could make the car do what you wanted it to do.” Put another way, we were learning by “motor babbling.” We were simply trying out different actions and then gradually refining those actions based on what we perceived. We learned how to have the right touch with the accelerator by first hitting the gas too hard and peeling out. We learned to lane keep by experiencing what it feels like to turn the wheel too hard and drift toward the shoulder or centerline.

This behavior, Kwon argues, can’t be described by a dualistic model in which the brain is simply processing a bunch of sensory information, measuring it, categorizing it, “solving a bunch of differential equations,” and then sending action directives to the body. Rather, it’s an embodied cognitive process in which our sensory perception and our motor coordination have become more directly and intuitively connected through a series of shared experiences. If we accept this, it's a natural next question to consider whether you could get a car to drive by simply allowing it to learn, much like we do. Within certain parameters, the answer actually is yes. Though Kwon’s work may sound far out, artificial intelligence researchers have been down this path more than once. In the late 1980s, robotics researchers at Carnegie Mellon University adapted a U.S. Army ambulance for autonomous driving that was powered by a brand of artificial intelligence called behavior cloning. Basically, the vehicle observed how humans drove, and after thousands of hours, it was able to recognize the relevant features of the environment that caused humans to react the way they did. It even worked well enough to make supervised interstate trips between Pittsburgh and Erie, Pa. Fast forward to 2016 and NVIDIA, the artificial intelligence and graphics processing unit company, used a similar method to create a deep learning vehicle that at first drove like an intoxicated driver. But after just 3,000 miles, it could lane keep quite well, even when lane markers weren’t present.

So how is this different from current approaches to AV development? One major contrast is the use of labeling. A good chunk of the intelligence in current AVs is rooted in algorithmic image processing techniques that enable computers to accurately recognize important visual features of the environment, like lane markers, stop signs and pedestrians. The computer then categorizes them, calculating resulting actions based on complex sets of rules prescribed for each situation. In contrast, a car that works via behavior cloning never really “knows” something is a stop sign. Instead it has learned, over time, that when that thing that we’d call a stop sign appears on its optical cameras, humans always stop so it should too. Importantly, this cuts out the need for engineers to anticipate and name all the important features of the environment and prescribe reactions for safe driving. Because this type of intelligence skips certain processing and interpretive steps critical to typical AI processes, it also makes for a leaner and faster system.

Kwon says behavior cloning does, indeed, shorten the link between perception and intentional movement, but it's not a perfect solution. For starters, because it’s a deep learning system that learns by emulating human driving, the quality and quantity of the data it’s learning from is critical. Let it observe thousands of hours of perfect human driving, a data set which isn’t easy to come by, and it will likely be a pretty good driver. Even then, engineers will still have to intentionally manufacture hundreds of edge cases for the vehicle to experience so it can learn to react safely in all possible situations. More important to Kwon, however: Behavior cloning doesn’t quite mimic human cognition in ways that might be critical to unlocking autonomy’s full potential.

This is where Kwon’s latest work does get a little far out. In contrast to Descartes dualism, the embodied cognition theory Kwon is modeling his work on sees much of human cognition as inescapably shaped by our embodied nature, and specifically our ability to move in the world. Moreover, he thinks our ability to move in an intentional way, seemingly with little thought, is rooted fundamentally in an ability to react not to present perceptions, as today's AVs do, but to an anticipated future. Take, for example, going for a jog. As your body quickly takes steps, one after the other, you're not particularly careful about where your feet are landing, i.e. you’re not making direct perceptions of what's right in front of you and then deliberately choosing where to place each step. Or take an even more extreme case: Walking down the sidewalk while texting. How is it that we can typically place accurate steps so easily? Kwon's theory is you, as an embodied cognitive being, are essentially making great predictions of what the immediate future environment will look like based on cues you’re taking from your present perceptions, as well as how you've successfully moved your body in the past. Then, you’re quickly coordinating movements that seem appropriate for the situation. "My hypothesis is that we have something like an internal simulator in our brains,” Kwon says. "So even without executing a behavior, I can simulate internally and see an outcome, and if the outcome is not quite what I want, I will keep simulating actions until I have the one that will produce the desired outcome. We are a prediction machine. Without prediction, we could not survive."

Kwon's core hypothesis involves applying this internal simulator concept to structure a deep neural network that could one day power autonomous vehicles. It's still early stages, but his team’s current system can already directly associate sensory inputs from an optical camera with a resulting motor movement, giving them a level of performance similar to models based on behavior cloning (at least in simulation). Now, they're working on the even wilder part of the system: Intelligence that, like our own embodied cognition, can make accurate predictions about what future sensory data will be based on the system's present sensory inputs. Said more simply, they're trying to build the sensory-motor "prediction machine" that Kwon thinks is the hallmark of our own embodied cognition. If the team can figure that out, it could allow for a kind of fluid, fast, seemingly effortless style of perception-motor coordination that we take for granted.

It should be said, of course, that Kwon is a fish swimming against the current. The vast majority of today's AV research is based on the algorithmic approaches that take philosophical cues from Descartes' dualism. Researchers like him, who are seeking radical alternative paradigms, are few, though the field of embodied artificial intelligence is definitely growing, especially in robotics. Moreover, the fact that similar approaches, like behavior cloning, were explored and failed to become the dominant AV paradigm, does beg the question of whether it's worth resurrecting an idea many researchers have already judged a dead end. Kwon, however, takes inspiration from the fact that the short history of artificial intelligence is already filled with examples of breakthroughs coming from once-abandoned ideas. One of the big revolutions in image recognition, a technology key to current AV technology, resulted from a single researcher taking another run at convolutional neural networks, an idea that failed to gain traction when it was invented in the 1990s. And the idea of a neural network, itself, went in and out of fashion for 70 years before it evolved into the “deep learning” that’s nearly synonymous with today’s artificial intelligence. “Having a unique approach can be either really good, or really bad,” Kwon says, laughing, aware that he could be on a road to nowhere. But given that the prize is finally realizing the potential of autonomous vehicles, he’s willing to travel the road not taken.

###

Story by Lou Blouin