When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.
If you can type it, the robot can play it
You may well be aware of Stable Diffusion, the much-discussed open-source AI model that can generate images from text. Well, as a “hobby project”, a couple of developers – Seth Forsgren and Hayk Martiros – have now created Riffusion, which uses the same model to turn text into music.
Riffusion works by generating images from spectograms, which are then converted into audio clips. We’re told that it can generate infinite variations of a text prompt by varying the ‘seed’.
Riffusion’s creators explain (opens in new tab) that a spectogram can be computed from audio using what’s known as the Short-time Fourier transform (STFT), which approximates the audio as a combination of sine waves of varying amplitudes and phases.
However, in the case of Riffusion, the STFT is inverted so that the audio can be reconstructed from a spectogram. Here, the images from the AI model only contain the amplitude of the sine waves and not the phases – these are appromixmated by something called the Griffin-Lim algorithm when reconstructing the audio clip.
As well as short loops, Riffusion is also capable of creating longer jams, which are based on subtle variations of one image.RiffusionThe web app enables you to type in prompts and will keep on generating interpolated content in realtime for as long as you let it, while giving you a visual 3D representation of the spectrogram. You can also skip immediately to the next prompt; if there isn’t one, Riffusion will interpolate between different seeds of the same prompt.
We can’t pretend to understand exactly how it all works but Riffusion is impressive and terrifying in equal measure. This kind of technology is in its infancy but it’s not hard to imagine how capable it will become in the future.
See and hear for yourself on the Riffusion (opens in new tab) website
Don’t miss the latest deals, news, reviews, features and tutorials
I’m the Deputy Editor of MusicRadar, having worked on the site since its launch in 2007. I previously spent eight years working on our sister magazine, Computer Music. I’ve been playing the piano, gigging in bands and failing to finish tracks at home for more than 30 years, 24 of which I’ve also spent writing about music and the ever-changing technology used to make it. 
MusicRadar is part of Future plc, an international media group and leading digital publisher. Visit our corporate site (opens in new tab).
© Future Publishing Limited Quay House, The Ambury, Bath BA1 1UA. All rights reserved. England and Wales company registration number 2008885.