![]() Evans explained that the text model is using a technique known as Contrastive Language Audio Pretraining (CLAP). The text model used for prompts to generate audio was all built and trained by Stability AI. Learning the right prompts for text to audio generationĪs a diffusion model, Evans said that the Stable Audio model has approximately 1.2 billion parameters, which is roughly on par with the original release of Stable Diffusion for image generation. Newton-Rex noted that in his experience, most musicians do not want to start a new audio piece by asking for something in the style of The Beatles or any other specific musical group, rather they want to be more creative. “We haven’t trained on the Beatles,” Newton-Rex said.”With audio sample generation for musicians, that has tended not to be what people want to go for.” For Stable Audio however, users will not be able to ask the AI model to generate new music, that for example sounds like a classic Beatles tune. One of the common things that users do with image generation models is to create images in the style of a specific artist. “That’s one of the really hard things to do when you’re doing these text based models is having audio data that is not only high quality audio, but also has good corresponding metadata.” Don’t expect to use Stable Audio to make a new Beatles tune “Having that much data, it’s very complete metadata,” Evans said. The model was trained on over 800,000 pieces of licensed music from audio library AudioSparks. Stable Audio works directly with raw audio samples for higher quality output. The generative AI power of Stable Audio is something different, enabling users to create new music that goes beyond the repetitive notes that are common with MIDI and symbolic generation. He explained that symbolic generation commonly works with MIDI (Musical Instrument Digital Interface) files that can represent something like a drum roll for example. Individuals have been able to use what Evans referred to as ‘symbolic generation’ techniques in the past. The ability to generate base audio tracks with technology is not a new thing. “Harmonai is the research lab that I started and it is fully part of Stability AI and it is a basically a way to have this generative audio research happening as a community effort in the open.” “It’s a lot of taking the same ideas technologically from the image generation space and applying them to the domain of audio,” Evans told VentureBeat. The technology behind Stable Audio however does not have its roots in Jukedeck, but rather in Stability AI’s internal research studio for music generation called Harmonai, which was created by Zach Evans. ![]() ![]() Newton-Rex is no stranger to the world of computer generated music, having built his own startup called Jukedeck in 2011, which he sold to TikTok in 2019. “The concept is really simple, you describe the music or audio that you want to hear in text and our system generates it for you.” How Stable Audio works to generate new pieces of music, not MIDI files “Stability AI is best known for its work in images, but now we’re launching our first product for music and audio generation, which is called Stable Audio,”Ed Newton-Rex, VP of Audio at Stability AI told VentureBeat.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |