A simple OpenAI Jukebox tutorial for non-engineers


Skip to part III if you’re thirsty for music-making. You can come back and read this during your 12-hour render.

Jukebox is a neural net that generates music in a variety of genres and styles of existing bands or musicians. It also generates singing (or something like singing anyway).

Jukebox is pretty amazing because it can sorta almost create original songs by bands like The Beatles or the Gypsy Kings or anyone in the available artist list.

I don’t understand neural nets well enough to understand what it’s doing under the hood, but the results are pretty amazing, despite the rough edges.

This technology has the potential to unleash a wave of artistic (and legal) creativity.

It’s easy to imagine this thing getting really fucking good in a few years and all of a sudden you can create ‘original’ songs ‘The Beatles’ or ‘Taylor Swift’ for the price of a few hundred GPU hours.

Who has the right to publish or use those songs? Who owns what? What does ownership mean in this context? Does anyone think that our current judicial/legal systems are prepared to even understand these questions, let alone adjudicate them? Who is Spain? Why is Hitler? Where are the Snowdens of yesteryear?

Anyway, you can read more about it on the OpenAI website.

II. What I made with Jukebox

Early on in the lockdown, I made an experimental short film built mostly with stock footage.

I went looking for free music at the usual stock sites and as usual, came back disappointed. So I started looking for ways to generate music with AI because maybe it would create a kind of artificial or mediated feeling that I was looking to create with the short.

Here are the some of the songs I created, of varying quality:

Or check out OpenAI’s collection of samples. It’s wild how good some of these are.

I don’t have the compute power that they have, but the samples I created were enough to create a surreal soundscape for the film:

II. Overview and limitations

OpenAI uses a supercomputer to train their models and maybe to generate the songs too, and well, unless you also have a supercomputer or at least a very sweet GPU setup, your creativity will be a bit limited.

When I started playing with Jukebox, I wanted to created 3-minute songs from scratch, which turned out to be more than Google Colab (even with the pro upgrade) could handle.

If you’re going to do this with Google Colab, then you’ll want to upgrade to the Pro version. It’s $10 a month and recommended for everyone that does not enjoy losing their progress when the runtime times out after six hours.

Because it took me about 12 hours to generate each 45-second song, my experimentation was limited, but after a lot of trial and error, I was able to consistently generate 45-second clips of new songs in the style of many musicians in a variety of genres.

Another challenge is the lack of artist-friendly documentation.

AI researchers tend to publish their models and accompanying documentation for a computer-science-researcher-type audience — I’m trying to bridge that gap so that artists and non-technical people can play with the technology.

Hopefully, in the future, we’ll see more user-friendly documentation for these kinds of tools — if musicians had to be electrical engineers and wire their own amplifiers, we probably wouldn’t have rock music.

III. Getting set up

Step one is to open up the Google Colab notebook that I created, Jukebox the Continuator. This is a modified version of the Colab notebook that OpenAI released.

My version of the notebook has been trimmed down to remove some features that I couldn’t get to work and I think it’s an easier way to get started, but feel free to experiment with their version.

If you want to save your edits, save a copy of the notebook to your Google Drive.

If you’re new to all this, Google Colab is an interactive coding notebook that is free to use. It’s built on the open-source software called Jupyter Notebook, which is commonly used to run machine learning experiments. It allows you to run Python code in your browser that is executed on a virtual machine, on some Google server somewhere. Much easier than building your own supercomputer.

You might want to check out Google’s intro to Google Colab or google around for a tutorial.

You should be able to run my notebook all the way through with the current settings, but the fun is in experimenting with different lyrics, song lengths, genres, and sample artists.

And as I mentioned above, you’ll want to upgrade to Google Colab Pro. It’s $9.99/month and you can cancel whenever you want. Getting the Pro version means you’ll have access to faster GPUs, more memory, and longer runtimes, which will make your song generating faster and less prone to meltdown.

Go and try it out.

IV. Lyrics generation with GPT-2

Jukebox allows you to write your own lyrics for the songs you generate but I decided to let an AI generate lyrics for me, with a little help from the creative writers of stock footage clip descriptions.

I was going for a sort of surreal dystopian aesthetic so I pulled the descriptions of some random stock footage clips, e.g. “A lady bursting a laugh moves around holding a microphone as bait as a guy creeps”:

Then I loaded Max Woolf’s aitextgen to generate lyrics based on the seeded text. Here’s a tutorial for aitextgen or you can use the more user-friendly TalkToTransformer.com.

You can train aitextgen with the complete works of Shakespeare or your middle school essays or whatever text you want.

Happy generating.