Voice Cloning as an Assistive Tech: Leveraging Technology for the Vocally Impaired « Where It's AT

Oct 26th, 2021 by MDTAP Blog

Every day we use gadgets for work, leisure, communication, and our hobbies. Technology empowers us and makes getting things done easier.

Now more than ever, those with disabilities can choose from a variety of gadgets, mobile applications, and other types of devices, all designed to make their life easier. Thanks to technological innovations, people can get around many of the limitations caused by their health issues.

Today we’re looking at voice cloning, a technology that allows people to generate a realistic copy of a person’s voice, and how it can help people who have complications with their speech.

Voice cloning for amyloidosis: the case of Michael York

About ten years ago, the actor Michael York sadly discovered he was suffering from amyloidosis. This is the name of a group of rare conditions caused by the accumulation of amyloid, an abnormal protein that makes it difficult for tissues and organs to work properly, disrupting their vital activity and functionality. The condition caused by the illness is quite similar to Alzheimer’s and Parkinson’s diseases.

In York’s case, he received qualified treatment in time and is able to monitor his health so that he can receive treatment whenever he needs it.

However, one of the most common symptoms of amyloidosis is swelling of the tongue, which makes it more difficult for people to speak clearly.

“I lost my voice completely. Which, for an actor, is a bit alarming. Now it’s sort of back.”, said York in one of his interviews.

The actor decided to promote awareness of the disease by producing and narrating a short animated film to help physicians, medical students, patients, and people all around the world better understand the condition. At the time, his voice was already a little raspy, but that did not interfere with the recording. Since then, the project has had a meaningful impact at conferences, universities, clinics, congregations, and communities.

As medicine and science are constantly improving, the animated film had to be updated with a new dialog. The team hoped to record York’s voice to make edits. But it quickly became clear that the current state of his voice was drastically different from his pre-amyloidosis voice. And there was little chance of finding an actor capable of imitating York’s unique vocal quality.

However, Michael York’s vocals got the chance for a revival thanks to voice cloning technology. The production team cooperated with a voice cloning software company to create a synthetic copy of the actor’s voice.

How is this possible? Recent advances in technology have allowed computers to sound incredibly accurate. The program can not only capture a person’s accent, but also timbre, pitch, tempo, speech flow, and breathing. Cloned voices can be customized to display any desired emotion, such as anger, fear, happiness, love, or boredom.

For Michael York, sixty minutes of source training audio and new dialogues were recorded. A voice cloning program then created an AI model to match the source voice with York’s target voice, using data from the original recording session.

To make speech synthesis technology work, it is important to have high-quality recordings of a person’s voice. In case of different conditions affecting voice (age, disabilities, etc.), it may be too late to record vocals. This is where voice banking becomes extremely helpful.

Voice banking is a service that allows individuals to record their speech and create a digital version of it to help people communicate in case they lose the ability to speak. The system is crucial for people who suffer from diseases that affect their speech.

These diseases are Amyotrophic lateral sclerosis (ALS or Lou Gehrig’s disease), Spinal bulbar muscular atrophy (SBMA), Primary lateral sclerosis (PLS), and Progressive muscular atrophy (PMA).

Individuals who suffer from multiple sclerosis (MS) and Parkinson’s disease may also come to rely on voice banking as these conditions weaken physical and mental function and thus affect someone’s ability to speak.

Also, those with head and neck conditions and who have undergone specific procedures may need to use voice banking. These procedures include:

laryngectomy – a surgery where all or part of the voice box is removed
tracheostomy – a surgery that involves the insertion of a tube through the neck and into the windpipe
glossectomy – a surgery where all or part of the tongue is removed

Legal complications of voice cloning and banking

While voice cloning and voice banking have obvious benefits and commercial potential, the technology raises concerns among security experts. For example, the technology can be used by cybercriminals.

Scammers can use voice cloning to trick companies into transferring money to criminals’ accounts. Two years ago, The Wall Street Journal reported that the chief executive of a British energy company had been tricked into transferring €200,000 to a Hungarian supplier.

He was confident that he was receiving instructions from his boss. Energy insurance company Euler Hermes Group SA told WSJ that the fraudster used artificial intelligence software to mimic the voice of an executive of the company.

“The program was able to mimic the voice, as well as tonality, intonation, punctuation, and German accent,” a spokesman for Euler Hermes later told The Washington Post. The phone call was accompanied by an email and the CEO of the power company did what was asked of him. The money itself disappeared irrevocably, it was transferred through accounts in Hungary and Mexico.

So what to expect from voice cloning?

Voice cloning technology delivers many benefits as an assistive tech, in the filmmaking industry, dubbing processes, and so on.

For example, with the help of speech synthesis, the NFL brought American football legend, Vincent Lombardi, back to life on screen. The technology revived Rivera Morales’ memorable voice and synthesized young Mark Hamill’s (Luke Skywalker) voice in The Mandalorian.

Technology isn’t bad by default, it becomes bad when it falls into the wrong hands and is used for nefarious purposes. Everyone has the opportunity to benefit from voice cloning or any other technology so long as they choose software that is committed to specific ethical principles. For example, it doesn’t use the voices of private individuals without their consent, adds a unique audio watermark on its products, and does not provide any public API for creating voices.

One thing is clear: in the future, anyone will be able to create their own AI voice clone if they want to. As for facial deepfakes, laws and ethics just can’t keep up with new technology. The only way out of this dilemma is to be honest with yourself and choose tools that have proven legal standing.

Contributed by Alex Serdiuk, CEO of Respeecher, Inc.

Posted in MD Technology Assistance Program

Where It's AT – Assistive Technology Blog