AI can mimic human voices

June 14, 2019
(AI | deepfakes | ideas of interest)

What the fuck:

There’s a company called Dessa that made a software called RealTalk that can learn how to speak in someone’s voice. Apparently it wasn’t that hard to train it.

I don’t listen to Rogan’s podcast regularly so I don’t know if I would be able to realize it was fake, but it’s close enough and it’ll just get better anyway.¹

The company isn’t releasing it to the public but it’s only a matter of time before someone else does the same thing.

My first thought was “oh this sounds like a terrifying political propaganda/slander tool!” and my second thought was that this will render useless any voice-based work.

Do you need a news reader on the radio? No, you just need someone to speak to the robot enough to train it. No need to show up to the studio every morning (eventually followed by a robot to write the news report?).

Voiceover artists too, basically unnecessary now. A company could have software trained with a library of thousands (or millions) of voices.

Maybe I’m being naive but I’m skeptical that society will collapse because of technology like this. We have Photoshop and yes there are implications to only seeing women in magazines who have been Photoshopped, but it’s not like people are creating fake images of say a presidential candidate doing cocaine.

Why not? If you were a strategist recommending this, people would probably say “people won’t believe it.”

And I think we’ve developed a kind of norm of skepticism around this kind of thing, like anything scandalous we immediately think “is this fake?” or “what’s the PR angle here?”.

We already have a norm for “don’t believe everything you read” and this does make modern life difficult: it would be nice to know that if a newspaper prints something then it’s 100% true, but that’s not the case and here we are.

I think there’s a dangerous window while that norm is forming — like the 2020 elections could be wild, but then a norm forms where eventually people say “oh you can’t trust video anymore, especially video of politicians.”

It’s still powerful, but I think the power is in making people less trusting of recorded images (or in the near future, recorded audio) that don’t come from pre-vetted sources.

From a creator’s perspective, it’s interesting to think about what could be done without needing to go into a studio to record. Just as music can be composed on a computer now, so can voices.

Take the Joe Turing Test here! ↩

robert
bruce
carter