ACE-Step Music Generator
Ace Step Music Generator v2.0.0 [ July 31, 2025 ]
- Major update with enhanced music generation capabilities
ACE-Step Music Generator
Section titled “ACE-Step Music Generator”Overview
Section titled “Overview”The ACE-Step Music Generator operator integrates the ACE-Step model into TouchDesigner, enabling powerful text-to-music, audio-to-audio, and audio editing workflows. It functions as a client for the SideCar operator, which handles the intensive processing.
Features
Section titled “Features”- Automatic Repository Cloning: The first time you generate, the operator will automatically prompt you to download and clone the required ACE-Step code repository.
- Full ACE-Step Integration: Access all core ACE-Step features, including text-to-music, audio-to-audio, repaint, retake, and extend.
- SideCar Architecture: All intensive computation (model loading, inference, dependency management) is handled by the external
SideCarprocess, ensuring TouchDesigner remains responsive. - Real-time Visualization: Includes a professional, real-time audio waveform visualizer.
Requirements
Section titled “Requirements”- SideCar Environment Setup: The
SideCaroperator runs in its own Python environment. You are responsible for installing all necessary dependencies for the ACE-Step model within that environment. This includestorch,torchaudio, and all packages listed in the official ACE-Steprequirements.txt. This operator does not manage Python packages. - Git: Git must be installed and accessible in your system’s PATH. The operator uses it to clone the ACE-Step repository.
- Running SideCar: The
SideCarserver must be running and connected for this operator to function.
Input/Output
Section titled “Input/Output”- Input: Text prompts, lyrics, and optional reference audio files.
- Output: Generated audio files (WAV format) and real-time audio waveform visualizations.
Parameters
Section titled “Parameters”ACE-Step Page
Section titled “ACE-Step Page”op('acestep').par.Status Str Displays the current status of the operator.
- Default:
-
op('acestep').par.Active Toggle Indicates if a generation request is currently active.
- Default:
Off
op('acestep').par.Currentaudio File Path to the currently loaded audio file. Used by Load Settings.
- Default:
"" (Empty String)
op('acestep').par.Playhead Float Controls the playback position of the current audio (0.0 to 1.0).
- Default:
0
op('acestep').par.Autoplay Toggle Automatically plays the audio after generation.
- Default:
On
op('acestep').par.Generate Pulse Triggers the music generation process based on current settings.
- Default:
None
op('acestep').par.Prompt Str Descriptive tags, genres, or scene descriptions. Used for text2music, audio2audio, and as a basis for edit/repaint.
- Default:
upbeat pop, catchy melody, female singer
op('acestep').par.Lyrics Str Enter lyrics with structure tags like [verse], [chorus]. Use \n for newlines. Used for text2music, audio2audio, and as a basis for edit/repaint.
- Default:
[verse]\nSun is shining bright today\nFeeling happy, come what may
op('acestep').par.Duration Float Desired duration of the generated audio in seconds.
- Default:
10
op('acestep').par.Infersteps Int Number of inference steps. Higher can improve quality but takes longer.
- Default:
60
op('acestep').par.Manualseed Int Seed for reproducibility. -1 for random. Affects initial generation.
- Default:
-1
op('acestep').par.Guidancescale Float Main classifier-free guidance scale. Used if CFG Type is not 'Double Condition'.
- Default:
15
op('acestep').par.Omegascale Float Omega scale factor for APG guidance type.
- Default:
10
op('acestep').par.Guidancescaletext Float Guidance scale for text prompt when CFG Type is 'Double Condition'.
- Default:
7.5
op('acestep').par.Guidancescalelyric Float Guidance scale for lyrics when CFG Type is 'Double Condition'.
- Default:
7.5
op('acestep').par.Audio2audioenable Toggle Enable audio-to-audio generation. Uses Prompt & Lyrics as guidance if provided.
- Default:
Off
op('acestep').par.Refaudioinput File Path to the reference audio file for Audio2Audio mode.
- Default:
"" (Empty String)
op('acestep').par.Refaudiostrength Float Strength of the reference audio influence (0.0 to 1.0).
- Default:
0.6
op('acestep').par.Outputfolder Folder Folder to save the generated WAV file. Relative to project or absolute.
- Default:
audio_out
op('acestep').par.Outputfilename Str Name of the generated WAV file.
- Default:
ace_step_output.wav
op('acestep').par.Uniquesuffix Toggle If True, appends a timestamp to the filename to prevent overwriting.
- Default:
On
op('acestep').par.Initialize Pulse Initializes the ACE-Step Model. This parameter is read-only and handled internally.
- Default:
None
op('acestep').par.Unloadmodel Pulse Releases the model from memory via SideCar.
- Default:
None
op('acestep').par.Loadsettings Pulse Load generation parameters from the JSON associated with the Current Audio file.
- Default:
None
Edit Page
Section titled “Edit Page”op('acestep').par.Editaudio Toggle Master toggle to enable audio editing modes on this page.
- Default:
Off
op('acestep').par.Srcaudiopath File Path to the source audio file for Edit, Repaint, Retake, Extend tasks.
- Default:
"" (Empty String)
op('acestep').par.Retakeseeds Int Seed for retake/repaint/extend variations. -1 for random.
- Default:
-1
op('acestep').par.Retakevariance Float Amount of variance for retake/repaint (0.0 to 1.0).
- Default:
0
op('acestep').par.Repaintstart Float Start time in seconds for repaint. For extend, negative values pad left. 0 for retake.
- Default:
0
op('acestep').par.Repaintend Float End time in seconds for repaint. For extend, values beyond original duration extend right. Original duration for retake.
- Default:
5
op('acestep').par.Transitiontime Float Duration of the transition/crossfade in seconds for repaint/extend modes. 0 for abrupt change.
- Default:
0
op('acestep').par.Editoriginalprompt Str The original prompt used to generate the Source Audio. Required for 'Edit Audio Content' mode.
- Default:
"" (Empty String)
op('acestep').par.Editoriginallyrics Str The original lyrics used to generate the Source Audio. Required for 'Edit Audio Content' mode.
- Default:
"" (Empty String)
op('acestep').par.Edittargetprompt Str Target prompt for 'Edit Audio Content' mode. If empty, uses main prompt.
- Default:
"" (Empty String)
op('acestep').par.Edittargetlyrics Str Target lyrics for 'Edit Audio Content' mode. If empty, uses main lyrics.
- Default:
"" (Empty String)
op('acestep').par.Editnmin Float Min influence for audio editing (0.0 to 1.0).
- Default:
0.65
op('acestep').par.Editnmax Float Max influence for audio editing (0.0 to 1.0).
- Default:
0.95
op('acestep').par.Editnavg Int Averaging window size for editing.
- Default:
10
op('acestep').par.Loadsrccredentials Pulse Loads prompt and lyrics from the _input_params.json associated with the Src Audio Path.
- Default:
None
Advanced Page
Section titled “Advanced Page”op('acestep').par.Guidanceinterval Float Guidance interval for CFG.
- Default:
0.98
op('acestep').par.Guidanceintervaldecay Float Decay rate for guidance interval.
- Default:
1
op('acestep').par.Minguidancescale Float Minimum guidance scale.
- Default:
1
op('acestep').par.Usergtag Toggle Enable ERG (Exponentially Smoothed Moving Average Guidance) for prompt/tags.
- Default:
Off
op('acestep').par.Userglyric Toggle Enable ERG for lyrics.
- Default:
Off
op('acestep').par.Usergdiffusion Toggle Enable ERG for diffusion process.
- Default:
Off
op('acestep').par.Useoss Toggle Enable Optimal Step Size scheduling. Only effective if Scheduler Type is Euler.
- Default:
Off
op('acestep').par.Osssteps Str Steps for OSS scheduling, comma-separated. Active if 'Use Optimal Step Size' is ON and Scheduler is Euler.
- Default:
50,100,150,200
op('acestep').par.Deviceid Int GPU device ID to use (e.g., 0, 1). Requires re-initialize.
- Default:
0
op('acestep').par.Usebf16 Toggle Use bfloat16 for faster inference (if supported). Uncheck for macOS or if errors occur. Requires re-initialize.
- Default:
On
op('acestep').par.Torchcompile Toggle Optimize model with torch.compile() for faster inference (Not supported on Windows by ACE-Step). Requires re-initialize.
- Default:
Off
op('acestep').par.Modelpath Folder ACE-Step Repository Path. This parameter is read-only and automatically set.
- Default:
"" (Empty String)
op('acestep').par.Checkpointdir Folder Optional directory for model checkpoints.
- Default:
"" (Empty String)
About Page
Section titled “About Page”op('acestep').par.Bypass Toggle Bypass the operator's functionality.
- Default:
Off
op('acestep').par.Showbuiltin Toggle Show built-in TouchDesigner parameters.
- Default:
Off
op('acestep').par.Version Str Version of the operator.
- Default:
None
op('acestep').par.Lastupdated Str Date of the last update.
- Default:
None
op('acestep').par.Creator Str Creator of the operator.
- Default:
None
op('acestep').par.Website Str Website for more information.
- Default:
None
op('acestep').par.Chattd OP Reference to the ChatTD operator.
- Default:
None
op('acestep').par.Sidecaroperator OP Reference to the SideCar operator handling requests.
- Default:
None
Usage Examples
Section titled “Usage Examples”Quick Start: Generating Music
Section titled “Quick Start: Generating Music”- Set up the SideCar: Ensure the
SideCaris running and its Python environment is fully configured with all ACE-Step dependencies. - Press Generate: In the ACE-Step operator’s parameter panel, click the
Generate Musicpulse. - Clone the Repo: If this is your first time, a dialog will ask for permission to download the ACE-Step repository. Click Download.
- Generate: The request will be sent to the SideCar for processing. The generated audio will appear in the visualizer and can be automatically played.
Integration Examples
Section titled “Integration Examples”The ACE-Step operator is designed to integrate seamlessly with the SideCar operator for offloading heavy computation. It also connects with the ChatTD operator for managing Python environments and asynchronous operations.
Best Practices
Section titled “Best Practices”- Dependency Management: Ensure your
SideCar’s Python environment has all necessary ACE-Step dependencies installed. The operator does not manage these. - Git Installation: Have Git installed and in your system’s PATH for automatic repository cloning.
- Responsible Use: Be mindful of the ACE-Step model’s disclaimer regarding potential copyright infringement, cultural sensitivity, and harmful content generation. Verify originality and disclose AI involvement.
Troubleshooting
Section titled “Troubleshooting”- SideCar Not Connected: If generation fails, ensure the
SideCarserver is running and connected. Check theSideCar Operatorparameter on the About page to confirm it’s referencing the correct SideCar instance. - Missing Dependencies: If you encounter errors related to missing Python packages (e.g.,
torch,librosa), install them manually in yourSideCar’s Python environment. - Repository Cloning Issues: If the repository fails to clone, check your internet connection and Git installation. Review the TouchDesigner console for detailed error messages.
Research Citation
Section titled “Research Citation”The ACE-Step model is a significant contribution to the field of AI music generation. If you use this operator or the underlying model in your research, please consider citing the original work.
Research & Licensing
ACE-STEP Project
The ACE-STEP project is an open-source initiative focused on advancing AI music generation.
ACE-Step: A Step Towards Music Generation Foundation Model
ACE-Step is a foundation model for music generation designed to overcome limitations of existing approaches by integrating diffusion-based generation with advanced encoding and transformation techniques.
Technical Details
- Combines diffusion with DCAE and linear transformer.
- Uses MERT and m-hubert for semantic alignment (REPA).
- Outperforms LLM-based models in speed and coherence.
- Supports various music generation tasks including text-to-music and audio-to-audio.
Research Impact
- Overcomes limitations of existing approaches in music generation.
- Provides a holistic architectural design for state-of-the-art performance.
- Enables original music generation across diverse genres for creative production, education, and entertainment.
Citation
@misc{gong2025acestep,
title={ACE-Step: A Step Towards Music Generation Foundation Model},
author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
howpublished={\url{https://github.com/ace-step/ACE-Step}},
year={2025},
note={GitHub repository}
} Key Research Contributions
- Novel open-source foundation model for music generation.
- Integrates diffusion-based generation with Sana's Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer.
- Leverages MERT and m-hubert to align semantic representations (REPA) during training for rapid convergence.
- Achieves faster synthesis (up to 4 minutes of music in 20 seconds on A100 GPU) and superior musical coherence compared to LLM-based models.
- Preserves fine-grained acoustic details, enabling advanced control mechanisms like voice cloning, lyric editing, remixing, and track generation.
License
Apache License 2.0 - This model is freely available for research and commercial use.