Microsoft's AI speech generator achieves human parity but is too dangerous for the public

Shawn Knight · 2024-07-10T16:18:00-0400

Too Real: Microsoft has developed a new iteration of its neural codec language model, Vall-E, that surpasses previous efforts in terms of naturalness, speech robustness, and speaker similarity. It is the first of its kind to reach human parity in a pair of popular benchmarks, and is apparently so lifelike that Microsoft has no plans to grant access to the public.

Leveraging Vall-E's groundwork, the new AI voice tool integrates two major enhancements that greatly improve performance. Grouped code modeling allows Microsoft to better organize codec codes, resulting in shorter sequence lengths that boost inference speed and help overcome challenges associated with long sequence modeling.

Repetition aware sampling, meanwhile, rethinks the original nucleus sampling process to look for token repetition when decoding. Microsoft said this process helps stabilize decoding and prevents the infinite loop issue that was present in the original Vall-E.

Microsoft put Vall-E 2 to the test using the LibriSpeech and VCTK datasets, and it passed them both with flying colors. When Redmond claims the AI tool achieves human parity, they mean Vall-E 2 performed better than ground truth samples in robustness, similarity, and naturalness. In other words, the tool can produce natural speech that is virtually identical to the original speaker.

Microsoft shared dozens of samples from Vall-E 2, which can be found over on the project summary page. Indeed, Vall-E 2 samples are incredibly lifelike and indistinguishable from the human speaker. The AI tool even masters subtleties like putting emphasis on the correct word in a sentence as people subconsciously do when speaking.

Microsoft said Vall-E 2 is purely a research project, adding that it has no plans to incorporate the tech into a consumer product or release the tool to the general public. Redmond further noted that it carries potential risk for misuse, such as impersonating a specific person or spoofing voice identification.

That said, the company believes it could have applications in education, translation, accessibility, journalism, self-authored content, and chatbots, among others.

Image credit: Rootnot Creations

Permalink to story:

Microsoft's AI speech generator achieves human parity but is too dangerous for the public

VitalyT · 2024-07-10T16:35:24-0400

Since it's called after Wall-E, which did nothing but screeching, how good can it be?

umeng2002 · 2024-07-10T17:05:29-0400

I'm sure they'll easily sell it governments.

FaTaL · 2024-07-10T18:45:51-0400

Its microsoft. it'll be on your onedrive in matter of weeks.

NumberNine · 2024-07-10T21:29:09-0400

Open source communities will crack this open and never look back. Tick tock, Microsoft.

hk2000 · 2024-07-10T21:57:16-0400

Very impressive! I'm sure certain government agencies already have it, if not already used it in one of their clandestine operations!
How long will it be before this becomes another weapon in the cyber criminal's arsenal?

Watzupken · 2024-07-10T22:13:26-0400

MS self validation that their own AI speech generator and said it is so good. Tha kind of raised many red flags to me. And by the way, nobody have access to it, nor does anyone know how it works in the background. They only shared a dozen of samples, and it was concluded that it reached human parity. Wow.

RudyBob · 2024-07-10T23:09:04-0400

"More human than human"

Sounds like a waterless waterfryer

nitebird · 2024-07-10T23:44:43-0400

Imitating someone's likeness and voice honestly should be illegal.

human7 · 2024-07-11T00:38:17-0400

Meh, this tactic is old and is just a way to get people hyped up about it for when they do release it someday. They knew the capability that they were after when they developed this, it isn't like it is an accident. Like other models, it remains to be seen if the model's capabilities truly generalize, and how difficult it is to get it out of the uncanny valley.

opckieran · 2024-07-11T00:58:57-0400

“Cool AI is too cool for the general public”

Gotta say this line is getting a little old…

Microsoft's AI speech generator achieves human parity but is too dangerous for the public

Shawn Knight

Posts: 15,385 +193

VitalyT

Posts: 6,910 +8,226

umeng2002

Posts: 94 +174

FaTaL

Posts: 315 +423

NumberNine

Posts: 203 +313

hk2000

Posts: 194 +103

Watzupken

Posts: 1,072 +873

RudyBob

Posts: 1,579 +1,689

nitebird

Posts: 52 +46

human7

Posts: 524 +513

opckieran

Posts: 250 +418

Similar threads

Latest posts