It’s me, hi: AI can now imitate anyone’s voice live
What was once a futuristic fear is becoming a present-day concern, as researchers from NCC Group have demonstrated that AI-powered voice conversion can now be performed live with minimal equipment, modest computing power, and only a few minutes of recorded audio, Kazinform News Agency correspondent reports, citing NCC Group.
Until recently, voice deepfakes were largely limited to pre-recorded clips. These could mimic someone’s voice but failed in spontaneous, two-way communication, a crucial factor for scams like “vishing,” or voice phishing.
NCC Group’s latest research shows those barriers have fallen. Their framework allows an attacker to speak naturally into a microphone and have their voice instantly transformed into that of another person, such as a company executive, during a phone call or an online meeting.
Voice cloning involves training a deep learning model to replicate a speaker’s vocal characteristics - pitch, tone, rhythm, and speech patterns, from just a few minutes of audio. The model analyzes recordings using spectrograms, separating linguistic content (what is said) from identity content (who says it). With this data, the system can generate new speech that sounds convincingly like the original person, regardless of the words spoken.
Unlike older voice fraud schemes, real-time deepfake calls leave few digital traces and are nearly impossible to detect in the moment. Victims often realize the deception only after the damage is done.
Researchers gathered voice samples using open-source intelligence (OSINT) methods, sourcing clips from public speeches, interviews, and social media videos. These recordings were cleaned using digital audio workstations to remove background noise and isolate the target voice. Once trained, the cloned voice could be deployed over a standard phone line using caller ID spoofing, a long-standing social engineering tactic that displays a fake number on the victim’s screen.
In their demonstration, the team successfully mimicked the voice of Siân John, an NCC Group executive, who consented to the test. The cloned voice was used to conduct calls in real time, proving that such attacks could fool even experienced professionals.
While this research focused on audio, the team notes that real-time video deepfakes are the next frontier. Synchronizing lip movements and facial expressions with cloned audio remains technically challenging, but advances in AI suggest it is only a matter of time before full audiovisual impersonation becomes viable.
Earlier, Kazinform News Agency reported that the police warned against viral “AI Homeless Man” prank.