Deepfakes

After posting on deep voicefakes, I saw this (and a few others) on Kottke:

There’s still something uncanny about it, but the technology is obviously very close. I mean, maybe the uncanny thing is just that it’s Steve Buscemi in Jennifer Lawrence’s body with her voice.

The lip-syncing videos are close too, but something still feels a little off to them:

We’re entering a world where you can make a live-action film without any traditional “production”. You can have just writing and post-production.

Write the script.

Then build it with AI-generated voices and images.

Production could be simply a matter of recording the samples to build the voices and recording some video to build the video.

You might not even need that — you could pulls the voices and imagery from a library.

We could bring back dead actors to play leading roles. Want to reunite Bogey and Bacall? Just deepfake it.

We’re going to need a whole new area of IP law for this. Who owns the rights to their image or voice? Do you have rights for that? Does it expire or go into the public domain? How much would the rights to use, say, Bradley Cooper’s digital avatar in perpetuity?

AI can mimic human voices

What the fuck:

There’s a company called Dessa that made a software called RealTalk that can learn how to speak in someone’s voice. Apparently it wasn’t that hard to train it.

I don’t listen to Rogan’s podcast regularly so I don’t know if I would be able to realize it was fake, but it’s close enough and it’ll just get better anyway.1

The company isn’t releasing it to the public but it’s only a matter of time before someone else does the same thing.

My first thought was “oh this sounds like a terrifying political propaganda/slander tool!” and my second thought was that this will render useless any voice-based work.

Do you need a news reader on the radio? No, you just need someone to speak to the robot enough to train it. No need to show up to the studio every morning (eventually followed by a robot to write the news report?).

Voiceover artists too, basically unnecessary now. A company could have software trained with a library of thousands (or millions) of voices.

Maybe I’m being naive but I’m skeptical that society will collapse because of technology like this. We have Photoshop and yes there are implications to only seeing women in magazines who have been Photoshopped, but it’s not like people are creating fake images of say a presidential candidate doing cocaine.

Why not? If you were a strategist recommending this, people would probably say “people won’t believe it.”

And I think we’ve developed a kind of norm of skepticism around this kind of thing, like anything scandalous we immediately think “is this fake?” or “what’s the PR angle here?”.

We already have a norm for “don’t believe everything you read” and this does make modern life difficult: it would be nice to know that if a newspaper prints something then it’s 100% true, but that’s not the case and here we are.

I think there’s a dangerous window while that norm is forming — like the 2020 elections could be wild, but then a norm forms where eventually people say “oh you can’t trust video anymore, especially video of politicians.”

It’s still powerful, but I think the power is in making people less trusting of recorded images (or in the near future, recorded audio) that don’t come from pre-vetted sources.

From a creator’s perspective, it’s interesting to think about what could be done without needing to go into a studio to record. Just as music can be composed on a computer now, so can voices.


  1. Take the Joe Turing Test here





<< | >>