ACE-Step Music Generator
ACE-Step Music Generator
Section titled “ACE-Step Music Generator”Overview
Section titled “Overview”The ACE-Step Music Generator operator integrates the ACE-Step model into TouchDesigner, enabling powerful text-to-music, audio-to-audio, and audio editing workflows. It functions as a client for the SideCar
operator, which handles the intensive processing.
Features
Section titled “Features”- Automatic Repository Cloning: The first time you generate, the operator will automatically prompt you to download and clone the required ACE-Step code repository.
- Full ACE-Step Integration: Access all core ACE-Step features, including text-to-music, audio-to-audio, repaint, retake, and extend.
- SideCar Architecture: All intensive computation (model loading, inference, dependency management) is handled by the external
SideCar
process, ensuring TouchDesigner remains responsive. - Real-time Visualization: Includes a professional, real-time audio waveform visualizer.
Requirements
Section titled “Requirements”- SideCar Environment Setup: The
SideCar
operator runs in its own Python environment. You are responsible for installing all necessary dependencies for the ACE-Step model within that environment. This includestorch
,torchaudio
, and all packages listed in the official ACE-Steprequirements.txt
. This operator does not manage Python packages. - Git: Git must be installed and accessible in your system’s PATH. The operator uses it to clone the ACE-Step repository.
- Running SideCar: The
SideCar
server must be running and connected for this operator to function.
Input/Output
Section titled “Input/Output”- Input: Text prompts, lyrics, and optional reference audio files.
- Output: Generated audio files (WAV format) and real-time audio waveform visualizations.
Parameters
Section titled “Parameters”ACE-Step Page
Section titled “ACE-Step Page”op('acestep').par.Status
Str Displays the current status of the operator.
- Default:
-
op('acestep').par.Active
Toggle Indicates if a generation request is currently active.
- Default:
Off
op('acestep').par.Currentaudio
File Path to the currently loaded audio file. Used by Load Settings.
- Default:
"" (Empty String)
op('acestep').par.Playhead
Float Controls the playback position of the current audio (0.0 to 1.0).
- Default:
0
op('acestep').par.Autoplay
Toggle Automatically plays the audio after generation.
- Default:
On
op('acestep').par.Generate
Pulse Triggers the music generation process based on current settings.
- Default:
None
op('acestep').par.Prompt
Str Descriptive tags, genres, or scene descriptions. Used for text2music, audio2audio, and as a basis for edit/repaint.
- Default:
upbeat pop, catchy melody, female singer
op('acestep').par.Lyrics
Str Enter lyrics with structure tags like [verse], [chorus]. Use \n for newlines. Used for text2music, audio2audio, and as a basis for edit/repaint.
- Default:
[verse]\nSun is shining bright today\nFeeling happy, come what may
op('acestep').par.Duration
Float Desired duration of the generated audio in seconds.
- Default:
10
op('acestep').par.Infersteps
Int Number of inference steps. Higher can improve quality but takes longer.
- Default:
60
op('acestep').par.Manualseed
Int Seed for reproducibility. -1 for random. Affects initial generation.
- Default:
-1
op('acestep').par.Guidancescale
Float Main classifier-free guidance scale. Used if CFG Type is not 'Double Condition'.
- Default:
15
op('acestep').par.Omegascale
Float Omega scale factor for APG guidance type.
- Default:
10
op('acestep').par.Guidancescaletext
Float Guidance scale for text prompt when CFG Type is 'Double Condition'.
- Default:
7.5
op('acestep').par.Guidancescalelyric
Float Guidance scale for lyrics when CFG Type is 'Double Condition'.
- Default:
7.5
op('acestep').par.Audio2audioenable
Toggle Enable audio-to-audio generation. Uses Prompt & Lyrics as guidance if provided.
- Default:
Off
op('acestep').par.Refaudioinput
File Path to the reference audio file for Audio2Audio mode.
- Default:
"" (Empty String)
op('acestep').par.Refaudiostrength
Float Strength of the reference audio influence (0.0 to 1.0).
- Default:
0.6
op('acestep').par.Outputfolder
Folder Folder to save the generated WAV file. Relative to project or absolute.
- Default:
audio_out
op('acestep').par.Outputfilename
Str Name of the generated WAV file.
- Default:
ace_step_output.wav
op('acestep').par.Uniquesuffix
Toggle If True, appends a timestamp to the filename to prevent overwriting.
- Default:
On
op('acestep').par.Initialize
Pulse Initializes the ACE-Step Model. This parameter is read-only and handled internally.
- Default:
None
op('acestep').par.Unloadmodel
Pulse Releases the model from memory via SideCar.
- Default:
None
op('acestep').par.Loadsettings
Pulse Load generation parameters from the JSON associated with the Current Audio file.
- Default:
None
Edit Page
Section titled “Edit Page”op('acestep').par.Editaudio
Toggle Master toggle to enable audio editing modes on this page.
- Default:
Off
op('acestep').par.Srcaudiopath
File Path to the source audio file for Edit, Repaint, Retake, Extend tasks.
- Default:
"" (Empty String)
op('acestep').par.Retakeseeds
Int Seed for retake/repaint/extend variations. -1 for random.
- Default:
-1
op('acestep').par.Retakevariance
Float Amount of variance for retake/repaint (0.0 to 1.0).
- Default:
0
op('acestep').par.Repaintstart
Float Start time in seconds for repaint. For extend, negative values pad left. 0 for retake.
- Default:
0
op('acestep').par.Repaintend
Float End time in seconds for repaint. For extend, values beyond original duration extend right. Original duration for retake.
- Default:
5
op('acestep').par.Transitiontime
Float Duration of the transition/crossfade in seconds for repaint/extend modes. 0 for abrupt change.
- Default:
0
op('acestep').par.Editoriginalprompt
Str The original prompt used to generate the Source Audio. Required for 'Edit Audio Content' mode.
- Default:
"" (Empty String)
op('acestep').par.Editoriginallyrics
Str The original lyrics used to generate the Source Audio. Required for 'Edit Audio Content' mode.
- Default:
"" (Empty String)
op('acestep').par.Edittargetprompt
Str Target prompt for 'Edit Audio Content' mode. If empty, uses main prompt.
- Default:
"" (Empty String)
op('acestep').par.Edittargetlyrics
Str Target lyrics for 'Edit Audio Content' mode. If empty, uses main lyrics.
- Default:
"" (Empty String)
op('acestep').par.Editnmin
Float Min influence for audio editing (0.0 to 1.0).
- Default:
0.65
op('acestep').par.Editnmax
Float Max influence for audio editing (0.0 to 1.0).
- Default:
0.95
op('acestep').par.Editnavg
Int Averaging window size for editing.
- Default:
10
op('acestep').par.Loadsrccredentials
Pulse Loads prompt and lyrics from the _input_params.json associated with the Src Audio Path.
- Default:
None
Advanced Page
Section titled “Advanced Page”op('acestep').par.Guidanceinterval
Float Guidance interval for CFG.
- Default:
0.98
op('acestep').par.Guidanceintervaldecay
Float Decay rate for guidance interval.
- Default:
1
op('acestep').par.Minguidancescale
Float Minimum guidance scale.
- Default:
1
op('acestep').par.Usergtag
Toggle Enable ERG (Exponentially Smoothed Moving Average Guidance) for prompt/tags.
- Default:
Off
op('acestep').par.Userglyric
Toggle Enable ERG for lyrics.
- Default:
Off
op('acestep').par.Usergdiffusion
Toggle Enable ERG for diffusion process.
- Default:
Off
op('acestep').par.Useoss
Toggle Enable Optimal Step Size scheduling. Only effective if Scheduler Type is Euler.
- Default:
Off
op('acestep').par.Osssteps
Str Steps for OSS scheduling, comma-separated. Active if 'Use Optimal Step Size' is ON and Scheduler is Euler.
- Default:
50,100,150,200
op('acestep').par.Deviceid
Int GPU device ID to use (e.g., 0, 1). Requires re-initialize.
- Default:
0
op('acestep').par.Usebf16
Toggle Use bfloat16 for faster inference (if supported). Uncheck for macOS or if errors occur. Requires re-initialize.
- Default:
On
op('acestep').par.Torchcompile
Toggle Optimize model with torch.compile() for faster inference (Not supported on Windows by ACE-Step). Requires re-initialize.
- Default:
Off
op('acestep').par.Modelpath
Folder ACE-Step Repository Path. This parameter is read-only and automatically set.
- Default:
"" (Empty String)
op('acestep').par.Checkpointdir
Folder Optional directory for model checkpoints.
- Default:
"" (Empty String)
About Page
Section titled “About Page”op('acestep').par.Bypass
Toggle Bypass the operator's functionality.
- Default:
Off
op('acestep').par.Showbuiltin
Toggle Show built-in TouchDesigner parameters.
- Default:
Off
op('acestep').par.Version
Str Version of the operator.
- Default:
None
op('acestep').par.Lastupdated
Str Date of the last update.
- Default:
None
op('acestep').par.Creator
Str Creator of the operator.
- Default:
None
op('acestep').par.Website
Str Website for more information.
- Default:
None
op('acestep').par.Chattd
OP Reference to the ChatTD operator.
- Default:
None
op('acestep').par.Sidecaroperator
OP Reference to the SideCar operator handling requests.
- Default:
None
Usage Examples
Section titled “Usage Examples”Quick Start: Generating Music
Section titled “Quick Start: Generating Music”- Set up the SideCar: Ensure the
SideCar
is running and its Python environment is fully configured with all ACE-Step dependencies. - Press Generate: In the ACE-Step operator’s parameter panel, click the
Generate Music
pulse. - Clone the Repo: If this is your first time, a dialog will ask for permission to download the ACE-Step repository. Click Download.
- Generate: The request will be sent to the SideCar for processing. The generated audio will appear in the visualizer and can be automatically played.
Integration Examples
Section titled “Integration Examples”The ACE-Step operator is designed to integrate seamlessly with the SideCar
operator for offloading heavy computation. It also connects with the ChatTD
operator for managing Python environments and asynchronous operations.
Best Practices
Section titled “Best Practices”- Dependency Management: Ensure your
SideCar
’s Python environment has all necessary ACE-Step dependencies installed. The operator does not manage these. - Git Installation: Have Git installed and in your system’s PATH for automatic repository cloning.
- Responsible Use: Be mindful of the ACE-Step model’s disclaimer regarding potential copyright infringement, cultural sensitivity, and harmful content generation. Verify originality and disclose AI involvement.
Troubleshooting
Section titled “Troubleshooting”- SideCar Not Connected: If generation fails, ensure the
SideCar
server is running and connected. Check theSideCar Operator
parameter on the About page to confirm it’s referencing the correct SideCar instance. - Missing Dependencies: If you encounter errors related to missing Python packages (e.g.,
torch
,librosa
), install them manually in yourSideCar
’s Python environment. - Repository Cloning Issues: If the repository fails to clone, check your internet connection and Git installation. Review the TouchDesigner console for detailed error messages.
Research Citation
Section titled “Research Citation”The ACE-Step model is a significant contribution to the field of AI music generation. If you use this operator or the underlying model in your research, please consider citing the original work.
Research & Licensing
ACE-STEP Project
The ACE-STEP project is an open-source initiative focused on advancing AI music generation.
ACE-Step: A Step Towards Music Generation Foundation Model
ACE-Step is a foundation model for music generation designed to overcome limitations of existing approaches by integrating diffusion-based generation with advanced encoding and transformation techniques.
Technical Details
- Combines diffusion with DCAE and linear transformer.
- Uses MERT and m-hubert for semantic alignment (REPA).
- Outperforms LLM-based models in speed and coherence.
- Supports various music generation tasks including text-to-music and audio-to-audio.
Research Impact
- Overcomes limitations of existing approaches in music generation.
- Provides a holistic architectural design for state-of-the-art performance.
- Enables original music generation across diverse genres for creative production, education, and entertainment.
Citation
@misc{gong2025acestep, title={ACE-Step: A Step Towards Music Generation Foundation Model}, author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo}, howpublished={\url{https://github.com/ace-step/ACE-Step}}, year={2025}, note={GitHub repository} }
Key Research Contributions
- Novel open-source foundation model for music generation.
- Integrates diffusion-based generation with Sana's Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer.
- Leverages MERT and m-hubert to align semantic representations (REPA) during training for rapid convergence.
- Achieves faster synthesis (up to 4 minutes of music in 20 seconds on A100 GPU) and superior musical coherence compared to LLM-based models.
- Preserves fine-grained acoustic details, enabling advanced control mechanisms like voice cloning, lyric editing, remixing, and track generation.
License
Apache License 2.0 - This model is freely available for research and commercial use.