From Prompt to Release: How Producers Are Using Stable Audio 3 to Create Full Tracks in 45 Minutes
Stable Audio 3, an open-weights music generation model from Stability AI, can create a full 6-minute instrumental track from a text prompt in under 10 seconds on consumer hardware, but turning that raw output into a release-ready song requires stem separation, arrangement, and mastering that takes roughly 45 minutes total. The workflow combines AI generation with traditional music production tools, giving producers a new way to sketch ideas and speed up the creative process without replacing the human decisions that define a finished track.
What Makes Stable Audio 3 Different From Other Music AI Tools?
Unlike vocal-focused competitors such as Suno and Udio, Stable Audio 3 generates instrumental music only and ships with open weights, meaning producers can run the model locally on their own hardware and own the stems they generate. The Medium variant uses 2 billion parameters and is available under the Stability AI Community License for non-commercial use, with commercial licensing available separately.
The model was trained on over 1.2 million audio files, including licensed recordings from AudioSparx and Creative Commons clips from Freesound, giving it broad coverage of musical genres and sound effects. This training data foundation means the model responds best to specific prompts that include genre, tempo, instrumentation, mood, and production era references.
How to Create a Production-Ready Track From Stable Audio 3 Output
- Prompt Design: Use a template that specifies tempo in BPM, genre, lead instrument, supporting instrumentation, percussion description, mood, and era or production reference in a single sentence. For example: "105 BPM downtempo electronic track with detuned analog synth lead, deep sub bass, dusty boom-bap drums, melancholic mood, late-90s trip-hop production."
- Generation Settings: Set the model to generate the maximum length (up to 6 minutes and 20 seconds), use 8 inference steps with a classifier-free guidance scale between 4 and 6, and generate three to five candidate versions per prompt to increase the likelihood of finding usable material.
- Stem Separation: Run the stereo output through a stem separator tool such as Demucs (open source), LALAL.AI, or RipX to split the track into drums, bass, and harmonic content stems that can be edited independently in a digital audio workstation (DAW).
- Arrangement and Mixing: Import the separated stems into a DAW such as Ableton Live, FL Studio, Logic Pro, or free Audacity, tempo-map the project to match your original prompt, and arrange the track into intro, verse, chorus, bridge, and outro sections by rearranging the generated audio regions.
- Mastering: Apply subtractive EQ to each stem, compress the drum bus to glue the kit together, and route the final mix through a mastering tool such as iZotope Ozone or the open-source matchering library to hit standard streaming loudness targets of minus 14 LUFS (Loudness Units relative to Full Scale).
What Hardware and Software Do You Need?
Stable Audio 3 Medium runs on a range of hardware. An Apple Silicon Mac with an M4 chip generates audio in a few seconds, while NVIDIA GPUs from the RTX 4090 and newer handle inference comfortably. Older GPUs with 12 gigabytes of video RAM (VRAM) or more will work but generate more slowly. The model requires roughly 8 gigabytes of VRAM with mixed precision settings.
For the full production workflow, you will need a DAW and a stem-separation tool. Free options include Audacity for basic editing and Demucs for stem separation. Paid alternatives such as iZotope Ozone offer more advanced mastering features. If you do not want to run the model locally, you can use a hosted demo through a web interface, which costs under $1 per generation using the Medium model.
How Long Does the Entire Workflow Actually Take?
Active production time from prompt to mastered track totals roughly 45 minutes on a laptop, according to the workflow documentation. This assumes you are working with the generated stems and making arrangement decisions in your DAW. The actual generation step takes only seconds to a few minutes depending on your hardware. The bulk of the time goes to stem separation, arrangement, mixing, and mastering, which are the same steps a human producer would take with any raw audio source.
The cost to generate tracks is zero if you use the open-weights Medium model locally, or under $1 per generation if you use a hosted service. Commercial release requires a separate license from Stability AI, though non-commercial and research use is covered under the Community License.
What Happens When the Model Does Not Deliver What You Expected?
Stable Audio 3 includes an inpainting feature that lets you regenerate just a specific section of a track without redrawing the entire piece. If a four-bar section feels weak or off-tempo, you can target just that window for regeneration while keeping the rest of the track intact. This gives producers a way to refine outputs without starting from scratch.
Common issues include output that is too short, muddy drums after stem separation, or the model drifting off your target tempo. If the output is shorter than expected, check your inference configuration; some forks default to a 47-second window instead of the full 6-minute, 20-second maximum. If drums sound muddy, re-run the stem separator with a fine-tuned model or high-pass filter the harmonic stem to remove kick bleed. If tempo drifts, use your DAW's beat-detection features such as Ableton's Warp or Logic's Smart Tempo to lock everything to a grid.
Is This Replacing Human Music Producers?
The workflow makes clear that Stable Audio 3 is a sketching tool, not a replacement for production decisions. The model generates the raw material; the producer chooses which candidate to develop, decides how to arrange the stems, picks which sections to keep or regenerate, and applies the mixing and mastering that turns a sketch into a finished track. Producers can also layer multiple Stable Audio 3 renders at the same tempo and key for fuller arrangements, or generate sound effects with the Small SFX variant to weave into transitions.
For artists who want a finished song with sung vocals, Suno and Udio remain faster paths because they optimize for complete tracks rather than producer raw material. Stable Audio 3 is designed for producers who want to own the stems, arrange them themselves, and integrate the AI output into a traditional production workflow.