In the days leading up to last month’s New Hampshire primary election, hundreds of voters received a prerecorded phone call featuring the voice of President Joe Biden. Such robocalls are a common part of campaigns nowadays, but the theme of this one was a little unusual. In the recording, Biden urged Democratic voters not to show up at the polls, arguing that their primary votes would enable “the Republicans in their quest to elect Donald Trump again. Your vote makes a difference in November, not this Tuesday.” The garbled logic of the message was the first tip-off that this might not actually be Biden’s voice. The next day, UM-Dearborn Professor Hafiz Malik got an email from an Associated Press reporter who sent him the robocall for analysis. Malik ran it through the latest version of a system he uses to detect deepfakes — sophisticated, typically AI-generated clones of someone’s voice that sound completely authentic. Indeed, though the voice on the recording sounded just like Biden’s, it showed all the hallmarks of an AI-generated fake.
Malik, who’s been working with deepfakes since 2015, estimates he received about a hundred such requests for analysis last year, mostly from journalists. In some ways, he’s not surprised: He forecasted that such faked audio and video, particularly of public figures and decision makers, would become a problem long before most people knew what a deepfake was. But in some ways, he says his doom-and-gloom predictions still fell short. “One thing I didn’t anticipate was the commercial scale of both deepfake generation and deepfake detection that exists today,” Malik explains. “There are basically many off-the-shelf options, both free and commercial, for generating deepfakes, which means you don’t have to be an expert anymore to do this stuff. Basically anyone can do it.”
The result is that deepfakes have exploded across the internet, particularly in the past year. Some are harmless — and amazing — like the videos on the @DeepTomCruise TikTok account, which feature a spookily authentic-looking, youngish, Tom Cruise doing weird but ordinary things, like dancing in a bathrobe or washing his hands. More often, though, the intent is malicious. Aside from political disinformation, deepfake porn and financial scams are some of the biggest emerging threats. One recent study estimated the number of nonconsensual deepfake porn videos grew by more than 50% in the first nine months of 2023 compared to 2022. And financial attacks are targeting everyone from ordinary people to company CEOs. Last year, for example, in the biggest attack of its kind, cloned audio of a company director’s voice was used to dupe an employee into wiring $35 million into a scammer’s account.
Malik says both the private and public sectors are scrambling to deal with this new reality. Several states have passed legislation banning or limiting the use of deepfakes, though federal laws are still lacking. Major social media platforms, including Facebook and TikTok, have also banned deepfakes of political figures that are intended to mislead people, though they do allow other kinds of generated video and audio and there are still dozens of platforms where deepfakes run wild. Malik says there’s also been a movement within the community of software developers that make deepfake generators to embed media created with their products with traceable artifacts, so fakes can be easily identified. “But that might only solve 60% or 70% of the problem,” Malik says. “There are going to be more sophisticated attackers out there creating deepfakes using their own processes that would be undetectable.”
He also says off-the-shelf options for deepfake detection are still fairly unsophisticated. “I had a reporter from the New York Times ask me, basically, ‘Why do I need an expert like you if I can just download detection software?’ I said, ‘Yeah, you can do that.’ But there are ways to bypass commercial detectors,” Malik explains. “For instance, if you tweak the deepfake recording just a little bit, the commercial detectors are expected to fail. And the reporter emailed me back 30 minutes later and told me — lo and behold — he was able to fool it.”
Malik has become a go-to resource for deepfake detection largely because he relies on a completely original process that his team has created in his UM-Dearborn lab. They start by creating models of public figures based on hundreds of hours of audio recordings that are known to be authentic. Using machine learning processes, the models extract hundreds of qualities that are inherent to that person’s vocal presentation. Importantly, this modeling focuses on the sound of the person’s voice, including qualities that are imperceptible to our ears. But it also identifies things that aren’t verbal, like a speaker’s word-per-minute rate, the length of the space between their words or the particular quality of the breathy inhales between sentences. This creates a sort of fingerprint for each person. After they’ve performed a similar modeling of a suspected deepfake recording, they then compare it to the authentic model for similarities and differences. Interestingly, like a radiologist reading an X-ray, Malik still does the comparative analysis himself — he doesn't use artificial intelligence. There’s a very good reason for that. “If I’m saying to a journalist, ‘this is real’ or ‘this is fake,’ I have to be able to show them why I think it’s real or fake,” Malik says. “My judgment has to be trustworthy.” If Malik used AI for his analysis, he actually wouldn’t have an explanation for how his system made a particular judgment due to a quirky feature of machine learning known as the “black box problem.” Machine learning algorithms can do some amazing things, but explaining how they come to their conclusions is still beyond their powers.
Malik is, however, hoping to streamline his process for the upcoming election. He expects 2024 will be his busiest year yet for deepfake analysis and will soon be launching a website to make it easier for journalists to submit suspicious audio to his lab for analysis. (Since most videos feature audio, he can also use this method to analyze video deepfakes.) In addition, he’s training several of his graduate student assistants to do the analysis that right now he alone does. “I think we are basically viewing this as an important public service that our lab is in a unique position to provide,” Malik says. “Our goal is to help the public make informed decisions, help democracy,” he says. “We don’t align with the Democratic Party or Republican Party, and we have models of all the major candidates on both sides. We align with our own fundamentals of democracy, which are rooted in people having a good sense of what is true and what is not true.”
Interestingly, Malik expects this election will be the most vulnerable to deepfakes, but after that, he forecasts things could get better. For starters, he thinks governments across the world will respond with more regulations in the coming years. And he sees the technology quickly maturing into a “steady state” dynamic that most cybersecurity challenges eventually evolve into. Right now, Malik says the generation technology is ahead of the detection technology. But as detection capabilities catch up, we’ll settle into a subtler arms race, where deepfakers are finding new, smaller ways to fool the detectors, and the detectors respond quickly with new defenses. “So I would hope that in 2028, or even 2026, we’re talking a lot less about our elections being vulnerable to deepfakes,” Malik says. “But, who knows? My predictions have been off before.”
###
If you’re a journalist looking for deepfake analysis, you can reach out to Professor Hafiz Malik at [email protected]. Story by Lou Blouin