Subscribe to the show in Apple Podcasts, Spotify, or anywhere else you find your favorite podcasts!

Five Seconds to Fraud: Detecting AI Deepfakes Before They Strike with Ben Colman • Cyber Sentries • Episode 212

Five Seconds to Fraud: Detecting AI Deepfakes Before They Strike with Ben Colman

Inside the AI Deepfake Threat

What if the voice confirming your wire transfer wasn’t actually your client? Ben Colman, founder and CEO of Reality Defender, joins host John Richards to unpack one of the fastest-growing attack surfaces in cybersecurity: AI-generated deepfakes. Once the exclusive domain of Hollywood studios and nation-state actors, real-time voice and video impersonation is now accessible to anyone with a laptop—and fraudsters are scaling up fast.

From Specialized Hardware to Your Home Computer

Ben traces the evolution from the specialized machinery required six years ago to today’s world where anyone can clone a voice with less than five seconds of audio—locally, for free, using open-source models. He walks through the modern fraud landscape, from grandparent scams and bank account takeovers to an eye-opening story about fake job applicants that will make any recruiting team rethink its screening process.

Reality Defender’s approach is built for how organizations actually work—plugging directly into call centers, video conferencing platforms, and identity verification tools through a simple API, rather than asking teams to adopt yet another standalone product. Their probabilistic detection models scan in real time across thousands of indicators, all without storing or comparing against any biometric data.

John and Ben also get into the emerging frontier of agentic AI—what happens when you need to authenticate an AI voice agent rather than a human—and how smart permission gates can define exactly what those agents are and aren’t allowed to do.

Questions We Answer in This Episode

  • How has the barrier to creating convincing deepfakes changed in the last six years?
  • What are the most common deepfake fraud vectors hitting businesses and consumers right now?
  • How does Reality Defender detect AI-generated media without storing any biometric data?
  • What does deepfake defense look like as agentic AI becomes mainstream?

Key Takeaways

  • Voice cloning now requires less than five seconds of audio and runs locally on consumer hardware
  • Deepfake fraud spans a wide range—from grandparent scams to fake job applicants to wire transfer hijacking
  • Real-time detection can plug directly into tools organizations already use, with no new workflow required
  • Agentic AI is creating a new category of identity challenge—and the defenses are already being built

The deepfake threat isn’t coming—it’s already here, hitting call centers, recruiting pipelines, and financial institutions every day. Whether you’re a developer looking to integrate detection into your stack or a security leader trying to get ahead of the next wave, this conversation is a essential listen.

Resources

John Richards:
Welcome to Cyber Sentries from CyberProof on TruStory FM. I’m your host, John Richards. Here, we explore the transformative potential of AI, cloud, and cybersecurity, where rapid innovation meets the need for continuous vigilance. This episode is brought to you by CyberProof, a leading managed security services provider. Learn more at cyberproof.com. On this episode, I’m joined by Ben Colman, founder and CEO of Reality Defender, a leading AI threat detection company focused on detecting things like agentic AI voice and video deepfakes. I ask Ben what the tells are to detect deepfakes and whether it’s really a problem. And he shows how what was a small issue when they launched six years ago has now proliferated everywhere, as impersonations that can fool the human eye or ear are no longer the sole domain of state-level processing power, but can now be run on most home computers. Let’s dive in. Thank you so much for coming on the podcast.

Ben Colman:
John, thank you for having me.

John Richards:
Tell me a little bit about how you got started and how you got into Reality Defender. It’s a very unique company. What was your journey to getting into deepfake security?

Ben Colman:
I’ve always been obsessed with the space. This is basically all I’ve ever done—really learning and working and partnering at the intersection of cybersecurity and data science. And so it’s really about applying cybersecurity frameworks to whatever is the newest method to attack companies or people. AI is now the place.

I started my career as a summer intern at Google while in grad school. I’ve done some work advising various government groups, always defensive, never offensive. And I worked at Goldman Sachs, sitting five feet away from the CISO, seeing a lot of these activities happen in real time. I’m often a bit early—sometimes too early, I guess frequently too early—to the problem. But in this one it was kind of that same story. I started this company really as a nonprofit research project about six years ago. We were a few years too early, but since then the world has caught up to us in a big way, and we’re proud to be leading the space.

John Richards:
Six years ago, obviously AI and fake content has been around for a long time, but the rise and proliferation wasn’t where it was at six years ago. What started you interested in this, and how has the landscape shaped? Is this really a big problem out there?

Ben Colman:
The technology’s been around for a while. What’s changed is the accessibility for people to do it at scale. Hollywood obviously has done this for over 10 years, but only in the last year or two can non-technical folks deepfake your voice, your face in real time.

We had an experience in a previous role I worked in where we were about to make a multi-tens-of-millions-of-dollar wire transfer, and the last check was to confirm a client. The client got on the call and said how great they were and how amazing everything was. And then twenty minutes in, the client’s voice changed. And we realized it was actually the CTO of the company. It wasn’t exactly a deepfake in the classical sense—more of a voice change. Wasn’t the best, to be honest, but it really demonstrated that if bad things can happen, hackers, state-level groups, and fraudsters will use it at scale—because fraudsters are the best product managers. They have no tech debt, and they only have to be right once to win. Everyone else has to be right every single time.

John Richards:
What does the landscape look like in terms of—you mentioned state actors, large-scale fraud, to anybody can pick this up now and almost do that. Are you seeing very fragmented lots of small groups? Are you seeing larger major groups being the threat? Or is it a 50-50 split? What does that look like right now?

Ben Colman:
The answer is yes to all of that. If you’re a fraudster, you’ll use this to attempt to commit fraud at massive scale. And given that the technology has been democratized but the regulations haven’t caught up, there’s an interesting incentive for someone who isn’t even a fraudster to try and commit fraud without actually learning anything or taking on any risk at all.

We are seeing state-level groups do it at scale. We’re seeing average folks doing it at scale. We’re also seeing a gradient of different kinds of issues that run the gamut from entertainment to fraud. If I deepfake Tom Cruise, is it entertainment? Is it satire? Or am I actually attacking his likeness or committing some kind of impersonation or reputational harm?

John Richards:
You mentioned the scale-up and availability. Do you have numbers or even general figures on the difference from six years ago when you started to where we are today?

Ben Colman:
Six years ago, there were early pieces of technology allowing you to do this if you had specialized machinery. What changed then was about three or four years ago, when suddenly you could do it using commodity cloud compute. The challenge there was that unless you had a hundred grand of free credit on Amazon or you were willing to pay for it, it was still out of reach.

What’s changed in the last two years is that you can do it all locally on your own computer. Anyone can download models from Hugging Face or GitHub, run them locally, or do things in the cloud. What that means is that right now, real-time deepfake voice is available to the world. Audio is easy to impersonate—less than five seconds of audio can make a perfect voice match. Video is still a bit computationally expensive, so we haven’t yet seen video happen at the scale of audio. But six to twelve months from now, it’s going to be just as widespread, given that you’ll be able to do it all locally on your computer or your phone.

John Richards:
Wow. What are the main ways people are committing fraud with this? Is it impersonating somebody to get access to their bank account or personal information? Is it blackmail? What are the different kinds of fraud happening in this space?

Ben Colman:
There’s an interesting question there—where is the fraud happening, and who is willing to actually try to stop it in which spaces?

First, social media is a massive vector for different kinds of fraud, whether it’s non-consensual AI pornography, AI-generated child sexual abuse material, sextortion scams, or AI romance scams. The problem there is a strange lack of incentive for platforms to do anything, because regulations require nothing. They’ll either do nothing or rely on community notes, which only happen once something has already been shared a million times. While I would love to support social media platforms, we’re really hoping regulators will require them to actually do something, but for right now, they’re not doing much.

In terms of fraud across consumer areas, it’s mostly tied to financial fraud and identity fraud. It’s using a deepfake of your voice to execute an account takeover at your bank—reset a password, add different wiring information—all the way into identity onboarding and KYC, where someone will use a fake face of yours, maybe a face swap, to authorize a wire transfer or gain access to your account.

Or beyond that, calling your grandparents, your kids, or posing as them—the grandparent scam, where someone says, “Hey, I’m in trouble. I need money.” It sounds just like them. I’m an expert in this space, and if my eight-year-old called me, I might still wire the money because it just seems so real and scary.

John Richards:
I feel like I’m lagging behind—I’ve dealt with the text messages from the CEO saying, “Hurry up and get me this, I need gift cards.” And you can tell that’s a scheme. But to have a real voice is another level entirely.

Ben Colman:
And the CEO scam you mentioned is obviously interesting, but I think we’ve all gotten so used to gift card requests that we know it’s fraud. What’s more important is thinking about the broader kill chain in terms of using deepfakes as a fraud vector. Oftentimes there’s a 10-step or 100-step fraud process spanning days, weeks, months—where you’re being asked for something that doesn’t even seem like fraud. Maybe it’s about a restaurant reservation or a travel confirmation. There’s not much there, but it’s just enough to be used to then commit a deeper fraud based on that extra piece of information.

Thinking about how Reality Defender plugs in—our goal is to connect into probably 85 to 90 percent of the tools that companies already use across their call centers, telecom partners, endpoint protection and identity verification, and brand intelligence and narrative detection. All these places have real-time media and live communications, and all of them can be attacked using completely off-the-shelf tools.

John Richards:
And I think part of why we don’t have a lot of regulation in this space is that people feel helpless about what they can do. So how are you all tackling this? Because people are getting more comfortable with things like remote identity verification—logging in and taking a photo to confirm who you are for a next step.

Ben Colman:
Right, but that just means someone has your information. It doesn’t mean it’s actually you.

So we took an interesting approach. We had these models, and the question was whether we were going to build the best point solution that did all these things for you to consume directly—or power all the other tools you’re already using. We picked the latter. When an organization like JP Morgan—one of our first clients—chose us, they actually wanted us to come in everywhere there’s a potential risk of deepfake AI or agentic fraud.

So we took something incredibly complex that we’re updating multiple times a month and reduced it down to an API and a range of SDKs, so that a developer working on a call center solution or video conferencing can either integrate it directly in two lines of code—or more likely than not, we’re already in the app store and you just double-click on it and it shows up.

We don’t want a junior person in a call center or in cybersecurity or incident response to have to learn a new tool. We want to engage them where they already are and provide a signal into the same fraud engines they’re already used to responding to, or in parallel, automate that escalation. It should just work. Similar to Stripe for payments or Twilio for telephony, we take something very challenging and make it very easy for everybody to use.

John Richards:
That’s huge. Now, if hearing the voice can’t tell the difference—and obviously you don’t want to share your secret sauce—but at a high level, how are you able to tell? Are you looking for indicators or flags that it is AI? Are you comparing against some source of truth to see how it maps?

Ben Colman:
We do not have any ground truth, so we’re running inference models that are probabilistic, as opposed to provenance models that are deterministic. We do that for a number of reasons—one of which is that in my previous life at a highly regulated bank, any time a vendor touched client data or employees’ voice or faces, it led to a 900- or 9,000-page risk assessment.

The fact that we don’t use any of that is one of our main selling points. It allows us to move quicker and gives clients more comfort in using our technology. Fundamentally, we have a range of ensemble models with submodels looking for different probabilistic indicators that there may have been AI generation or manipulation.

Think of it like a 4,000-axis chart. There are 4,000 different data points—a cloud of points indicating realness, and a range of points indicating generation or manipulation. Even if one of them might be off, as a whole they’re still quite far apart from each other. We’re doing that across the spatial pixel layer and also temporally—across time and transitions—across audio, video, and images. We’re taking audio waveforms, converting them into spectrograms, and running vision models on those as well.

We’re not only looking for things we’ve seen before, but also things we haven’t seen yet, but think we can anticipate in the lab. We have an internal red team and an internal data team. Of our roughly 54 people, roughly 40 are PhD researchers and engineers. We’re constantly putting out research because the space is moving so quickly.

John Richards:
I can’t imagine. Staying on top of that is so important. Does the analysis take a long time? You’re doing all this via API—what does response time look like?

Ben Colman:
When we started the company, it was purely asynchronous and near real time. About three years ago it moved into real time. We can sit directly on top of and in line with Zoom and Teams—we’re in the App Store. Just search for it and start using it. Same with integrations with call center, IVR, IVA, CCaaS, and UCaaS solutions—we’re effectively scanning in real time with overlapping chunks.

We do have some logic around wanting to see a certain amount of signal before flagging, to avoid issues like ambulance sirens or other interesting background noise triggering false positives. But we are scanning in real time.

John Richards:
So you’re getting a probability—some percentage of how likely this is to be synthetic. What do people do when you’re getting something in the middle, like 50-50? What do you recommend as a next step?

Ben Colman:
Typically it’s pretty close to a hundred percent or pretty close to zero. But naturally you get into situations where there’s a high degree of background noise—whether coincidental or purposeful, to try and obfuscate the voice or video. That’s where we get into more specific trade secrets around how we take different ensemble pieces with pre-processing or a total model score and reduce it down to something more actionable.

For example, let’s say a frame is completely real but there’s only a face swap—a real face on my face. There are different levels of realness with different borders that represent anomalies we’re looking for.

One good example: when the first diffusion-based image of an explosion at the Pentagon appeared, we saw it in real time. We detected it, and our clients had that intel—whereas most platforms let it go viral before community notes blocked it. That led to a hundred-billion-dollar flash crash in the market.

Truly, credit to our team. Our red team is second to none. Our research team is constantly pushing out work—probably more than our investors want us to publish. But that’s really how we’re attracting the best and brightest at conferences like CVPR, NeurIPS, ECCV, AAAI, and Interspeech—all of which have selected our research for publication, all available on arXiv.

John Richards:
Where do you see this heading? You hinted at video being a growing factor. What’s the next step for organizations worried about this?

Ben Colman:
Three years ago, people were saying prove to us this is a problem. Last year it was, prove to us it can actually be detected. And this year it’s, prove to us that Reality Defender is the best in the market. We’ve won every major industry award. JP Morgan gave us their innovation award last fall. We won the RSA Conference award two years ago. In December, Gartner named us the market leader in the space—and we actually power a few of the companies they listed as potential competitors.

Yes, it’s getting worse. Yes, we need elected officials and regulators to push forward AI regulations—not to say AI is bad, and not to say all deepfakes are dangerous, because a permissioned deepfake is essentially an AI avatar of yourself. There are a lot of great use cases for that. But at minimum, just require platforms to scan content and let the user or consumer know whether it’s AI-generated—and let them make their own determination. For my kids, I don’t want them to see any generative AI. For me, I want to see it in certain situations, but obviously not on a video conference call or for work.

As far as how we’re thinking about the future—there’s no silver bullet in cybersecurity. We are not the only tool in the world that solves for fraud. It is very much a perimeter strategy. What’s unique is that every other point solution in that perimeter is effectively a yes or no: yes, it’s Ben’s device, Ben’s connection, Ben’s phone number, Ben’s telecom, Ben’s unique device ID. All those just prove that someone has my information or has somehow compromised my phone or computer. We’re saying yes, but the real-time stream is AI.

Now, with the advent of agentic AI, that AI might itself be permissioned, which opens up an exciting new frontier of detecting agentic voice. What we’ll be announcing in the coming days and weeks is that leading agentic tools are integrating us inside their agentic voice platforms so that they themselves can say, “Hey, our agentic voice can detect other agentic voices—inbound call center calls and deepfakes as well.”

We can use those as a routing mechanism to say, “This is perhaps John’s agentic voice. We’ll let it call in. We’ll let it ask a few questions. We’ll let it transact up to $100. But no, it can’t do a password reset, it can’t add someone as a wire transfer recipient, it can’t take over an account.” Very similar to how you permission Amazon to use your credit card in one click—it’s an agentic transaction. It can buy me more shaving cream, but it can’t make a $10,000 transfer.

John Richards:
I love that example, because it is a real use case—customer support actions where you want some level of automation, but you still want to make sure it can’t go further. Being able to set those permission gates is fascinating.

So for an organization looking to use this, would they plug in directly with Reality Defender? Would they work with a third party that already has you integrated? What’s the process if I have my own application that accepts media and I’m thinking, should I be checking whether this is AI or not?

Ben Colman:
Credit to our engineering team—we have a completely public API and a range of SDKs, so anyone can go to our website, realitydefender.com, get an API key in one click, and start using it for free today. We give out a ton of free API calls, so whether it’s real time or asynchronous, you can start immediately.

If they’re an organization with their own proprietary solution, integration typically takes about fifteen minutes. If they want something quicker, they can look at our website and see where they can already use us today in the tools they already consume—agentic tools, endpoint protection, call center platforms, Zoom, Teams, video conferencing, and more. Web-scale scanning, over a hundred integrations. They can consume our signal as one of the signals within platforms they already use and within the risk scoring systems they’re already familiar with.

John Richards:
I’ll make sure we put those links in the show notes so folks can find them easily.

Ben Colman:
And if they shout out the show, they can email me directly—I’ll give extra scans and credits to their account. We love our developer community. That’s where we’ve learned the most, and it’s really how we’re able to push the envelope on what we’re developing.

John Richards:
Can you speak a little more to that? What does building a community around this look like, and is there a feedback cycle? What are you hearing from developers that has impacted your direction?

Ben Colman:
Developers do not hold back. When there’s an issue or a suggestion, it’s not one voice—it’s hundreds of voices. Our product team loves engaging with developers, and we’re always happy to explore the roadmap items we’re considering and have them help us prioritize. We can’t boil the ocean, and we try to stay very focused on deepfake detection rather than getting distracted by other types of identity or provenance-based approaches. But we truly treat them as an extension of our development framework.

John Richards:
With the API approach, the nice thing is that if you’ve got those core fundamental pieces, you can always build out more on top of them.

Ben Colman:
Exactly, and you have more flexibility there. We have an API and a public SaaS environment. We also have clients deploying us in their own cloud, on-premises, on edge, and fully local on device—completely air gapped. That spans the gamut of what we’re doing with both enterprise and government users. There are different use cases and environments where you might not have internet connectivity—or don’t want to—and we’re able to maintain our efficacy in those environments as well.

John Richards:
You’ve talked about voice and video. Are those the main media types? Do you do anything with text or other media?

Ben Colman:
Our focus currently is on audio, video, and images. There are always interesting challenges around text—I think we’re all a little guilty of using AI-written text at this point. It’s something we’re thinking about, and we use various internal text models for different parts of our stack. But our primary focus has been on human-based impersonations: human voices, human faces, humans in situations.

We are now expanding into non-human areas as well—thinking about things like AI-generated military vehicles in certain regions. For us, the core question is: what exists in the world that has been AI-generated or manipulated?

John Richards:
And at a base level, that’s really what you’re getting at—this has a huge security impact, but at the core you’re trying to tell whoever’s using this: can I be confident this is real? A real person, a real thing? Or was this AI-generated? And from there the assessment becomes, is this a security risk or not? Do you do some of that risk assessment, or is it more that you’re identifying the content and leaving the risk assessment to the person integrating with your system?

Ben Colman:
Given how quickly this is evolving, we certainly have a view on how clients can think about a playbook for our signals. We do provide soft and hard bounds. Based on our confidence intervals, they could auto-block or auto-escalate. For lower confidence scores, they can ask for more information or follow a more traditional next-step verification process.

Cybersecurity is very much a collaborative, multi-signal environment. We typically add our signals to other solutions that might handle the provenance pieces. We can say it doesn’t mean that it is him—it just means it is his information.

John Richards:
Probably can’t get specific numbers, but what are you seeing out there in terms of prevalence? Are these signals hitting everywhere, constantly increasing?

Ben Colman:
We have clients scanning hundreds of thousands of items a day. It’s not a daily challenge—it’s a second-by-second or millisecond-by-millisecond challenge. We have clients seeing double-digit numbers of inbound callers or faces in identity verification that are being found to be AI-generated or manipulated.

We just had a new client reach out because they extended four job offers and found that all four candidates were deepfake profiles. Three of them didn’t exist at all. One turned out to be a real person at a big tech company who said, “What are you talking about? I didn’t apply for that job.” So three were fully generated identities and one was a real person whose identity was manipulated.

John Richards:
No way. Wow, that is crazy. No wonder organizations need to get on top of this—even when you catch it, you’ve still spent so much down the pipeline.

Ben Colman:
Selling and recruiting are arguably the top two jobs in any organization—and recruiting is very close to selling because you’re essentially selling the world. It is a full-contact sport, and the amount of expense put into recruiting one person is substantial. Anything you can do to identify deepfake personas early in the process is worth its weight in gold.

John Richards:
Wow. Ben, I really appreciate you coming on the podcast. This has been fascinating—I’ve learned a ton. Where can folks get plugged in? And you mentioned recruiting—any roles people interested in this should be checking out?

Ben Colman:
If you’re passionate about cybersecurity and deepfake detection, go to our website—we have over ten roles live right now, and we’ll be adding more. Everything from engineering to R&D to sales to product. If you don’t see something that matches, please reach out and let us know what you’re great at. We’d love to connect.

And as I mentioned, anyone signing up for the API—reach out to me directly. I’ll give you more credits. Just shout out this podcast.

John Richards:
Thank you so much, Ben. I’ll make sure links are in the show notes for everyone to check out Reality Defender. Fascinating stuff—I’m glad somebody out there is doing this. And looking forward to seeing what’s next, especially that agentic piece you mentioned. Make sure you hit the website if you’re interested in that, and glad you’re on top of video as well.

Ben Colman:
Please don’t put my email in the show notes, though—we want to make sure they actually listen to the podcast. We don’t want agentic bots finding the email and spamming a thousand requests for free credits. Although, we’d also detect that.

John, truly a pleasure. Thank you so much.

John Richards:
This podcast is made possible by CyberProof, a leading co-managed security services provider helping organizations manage cyber risk through advanced threat intelligence, exposure management, and cloud security. From proactive threat hunting to managed detection and response, CyberProof helps enterprises reduce risk, improve resilience, and stay ahead of emerging threats. Learn more at cyberproof.com.

Thank you for tuning in to Cyber Sentries. I’m your host, John Richards. This has been a production of TruStory FM. Audio engineering by Andy Nelson. Music by Amit Sagie. You can find all the links in the show notes. We appreciate you downloading and listening to this show. Take a moment and leave us a like and a review—it really helps us get the word out. We’ll be back May 6th, right here on Cyber Sentries.

Dive deep into AI’s accelerating role in securing cloud environments to protect applications and data.