IBM’s ‘Project Debater’ AI Lost to a Human—But Put Up Quite a Fight

02/25/2019 - 18:00

How can you efficiently change someone’s mind?

The art of debate has always seemed like black magic to me. You’re not necessarily arguing for something you believe in—rather, you’re carefully dissecting the logic of your opponent’s arguments, while appealing to their (or an audience’s) emotions and showing your own character. It’s a cognitive-emotive venture that seems uniquely human.

Not anymore.

Earlier this month, I watched in awe as IBM’s new AI, Miss Project Debater, faced off against one of the world’s leading debate champions. Spoiler: she lost.

But the spectacle brilliantly demonstrated how far the AI has come. Now six years in the making, Project Debater was built to complement humans in digesting nuanced information, making us more informed by arguing both sides of a given topic—for example, should all vaccines be mandatory?

“In the end, it’s for us so we can make better decisions and solve problems,” said the host, John Donvan. And although the AI lost, she held her own as a charming, nuanced, and skillful debater against one of humanity’s best.

Why Debate?

As a brain child of IBM, Project Debater comes from a lineage of AI that faced off against humans. You’ve heard of them: back in 1997, Deep Blue trounced world chess champion Garry Kasparov; more recently, Watson destroyed two of the most celebrated Jeopardy! players in history.

Project Debater is the next logical step up from her big brothers. Like Watson, she needs to efficiently store and access a large database of facts, sentences, and prior arguments to support her reasoning, and be able to parse and understand human language. For all their computing prowess, the nuances of everyday language are incredibly difficult to grasp for a machine—just think how hard it is to have a meaningful conversation with Siri.

Debating is particularly challenging in that each utterance amplifies the complexity of the verbal interaction. What’s more, an AI wouldn’t just need to make her own case; she also needs to fend off an opponent’s claims, both in facts and logic. To make sense and win an argument, an AI would essentially have to be 100 percent successful in understanding human language.

Hard? Yes. Necessary? Absolutely.

“Language can tell us about human thought and expression, and it’s this world that’s most interesting to us,” said Dario Gil, Director of IBM Research.

AI that can understand us has the potential to amplify and complement human cognition as a source of information and context, he continued. For example, the more transparent and explainable we can make AI, the more we can trust it, which in turn allows us to more comfortably rely on it to make better decisions.

“For us, it’s really not about winning or losing, but about the ability to create AI that can master the infinitely complex and rich world of human language,” said Gil.

The Match

This was Debater’s second and last outing: her first, back in June 2018, pitted her against two debaters. She won one, lost one.

The topic facing Debater and her opponent, Harish Natarajan—who holds the record for most competition victories—was “should we subsidize preschools?” Each had only 15 minutes to prepare for the Oxford-style debate.

Here, both sides start with an opening statement, then counter-argue the opponent’s points, and finish with closing remarks. Each segment has a strict time limit. Audiences vote on their viewpoints (support, object, or undecided) both before and after the debate, and whomever swayed more opinions is considered the winner. As a supplemental measure, the audience was also asked to vote on which side was more informative.

Although the motion came from a curated list, neither side was specifically trained on that particular topic.

Armed with pen and paper, Natarajan furrowed his brows and scribbled away, forming arguments against the resolution. Meanwhile, Debater—encapsulated in a sleek, human-sized black cube—hopped on IBM’s cloud-computing infrastructure to comb through ten billion sentences and quotes from hundreds of millions of documents to identify those that supported her position. She then used a framework of pre-built arguments to formulate those tidbits—facts, statistics, and even moral arguments—into meaningful points.

Right off the bat, Debater charmed the audience and her opponent with a self-referential gem: “I heard you hold the world record in debate competition wins against humans, but I suspect you’ve never debated a machine. Welcome to the future.”

In the next half hour, Debater expertly constructed a sophisticated case on why subsidizing preschool is important, arguing three points: preschool increases a child’s academic achievements, overcomes the negative stress of poverty, and decreases crime. It’s the type of structured, logical analysis often seen in scientific papers.

As expected, Debater was heavy on statistics and studies; a retired human debater even called her propensity to squeeze in as much evidence as possible into the allocated time “oddly touching.” But strikingly, she also tried connecting with the audience using moral arguments, such as “giving opportunities to the less fortunate should be a moral obligation for any human being.”

Perhaps more charming was her phrasing. “I can’t personally experience poverty directly and have no complaints concerning my own standards of living,” she said at one point (Natarajan audibly chuckled).

What’s more, the AI showed real skill at parrying with her human champion. She anticipated her opponent’s view—“he might say that subsidies are needed but not for preschools”—and argued against it preemptively. She also fared well in the rebuttal round, grasping several of Natarajan’s rather nuanced points—for example, preschool may foster negative competition at an early age—and contextualizing it by saying “my opponent argued that preschools are harmful.” For us humans, capturing the gist is simple; for a machine, it’s rather unprecedented.

A Human Victory

Natarajan, for his part, represented humans exceedingly well. Rather than immediately argue his position, he started by closing the gap between the two sides—a smart tactic that challenges the opponent to find increasingly subtle arguments to support her side, which Debater faltered at.

He also made the point that due to limited resources, subsidizing preschools is not the best investment—something that Debater relatively ignored, but resonated with the audience.

In the end, Natarajan won. But Debater took the crown as the side that better enriched the audience’s knowledge, the goal that IBM Research set out with.

Natarajan grasped the utility of his opponent immediately. “What really struck me is the potential value of Project Debater when synthesized with a human being and the amount of knowledge that it’s able to grasp…and more importantly, to contextualize and place it as this information tells us this, which I found to be really useful,” he said.

In a way, Debater follows IBM’s philosophy of constructing a man plus machine future, one in which AI augments and supports rather than succeeds us humans. IBM is next working on commercializing Debater’s core system for business ventures, for example, helping journalists or even governments explain contested issues—for which there isn’t necessarily a right or wrong—and thus making the public better informed. Unlike most AI, Debater can grasp the messiness of the real world, rather than games or toy problems. “We are going out of the comfort zone of AI into territory which is more gray,” said project leader Noam Slonim.

Of course, the prospect of the technology contributing to the fake news epidemic, similar to OpenAI’s text generator, looms over the future. But showcasing Debater’s skill in public may be a right step towards transparency.

As Donvan pithily concluded, “Regardless, we made history tonight.”

Image Credit: IBM Research