Skip to content

ACE-Step Music Generator

The ACE-Step Music Generator operator integrates the ACE-Step model into TouchDesigner, enabling powerful text-to-music, audio-to-audio, and audio editing workflows. It functions as a client for the SideCar operator, which handles the intensive processing.

  • Automatic Repository Cloning: The first time you generate, the operator will automatically prompt you to download and clone the required ACE-Step code repository.
  • Full ACE-Step Integration: Access all core ACE-Step features, including text-to-music, audio-to-audio, repaint, retake, and extend.
  • SideCar Architecture: All intensive computation (model loading, inference, dependency management) is handled by the external SideCar process, ensuring TouchDesigner remains responsive.
  • Real-time Visualization: Includes a professional, real-time audio waveform visualizer.
  1. SideCar Environment Setup: The SideCar operator runs in its own Python environment. You are responsible for installing all necessary dependencies for the ACE-Step model within that environment. This includes torch, torchaudio, and all packages listed in the official ACE-Step requirements.txt. This operator does not manage Python packages.
  2. Git: Git must be installed and accessible in your system’s PATH. The operator uses it to clone the ACE-Step repository.
  3. Running SideCar: The SideCar server must be running and connected for this operator to function.
  • Input: Text prompts, lyrics, and optional reference audio files.
  • Output: Generated audio files (WAV format) and real-time audio waveform visualizations.
Status (Status) op('acestep').par.Status Str

Displays the current status of the operator.

Default:
-
Active (Active) op('acestep').par.Active Toggle

Indicates if a generation request is currently active.

Default:
Off
Currentaudio (Currentaudio) op('acestep').par.Currentaudio File

Path to the currently loaded audio file. Used by Load Settings.

Default:
"" (Empty String)
Playhead (Playhead) op('acestep').par.Playhead Float

Controls the playback position of the current audio (0.0 to 1.0).

Default:
0
Autoplay (Autoplay) op('acestep').par.Autoplay Toggle

Automatically plays the audio after generation.

Default:
On
Generate (Generate) op('acestep').par.Generate Pulse

Triggers the music generation process based on current settings.

Default:
None
Core Generation Header
Prompt (Prompt) op('acestep').par.Prompt Str

Descriptive tags, genres, or scene descriptions. Used for text2music, audio2audio, and as a basis for edit/repaint.

Default:
upbeat pop, catchy melody, female singer
Lyrics (Lyrics) op('acestep').par.Lyrics Str

Enter lyrics with structure tags like [verse], [chorus]. Use \n for newlines. Used for text2music, audio2audio, and as a basis for edit/repaint.

Default:
[verse]\nSun is shining bright today\nFeeling happy, come what may
Duration (Duration) op('acestep').par.Duration Float

Desired duration of the generated audio in seconds.

Default:
10
Infersteps (Infersteps) op('acestep').par.Infersteps Int

Number of inference steps. Higher can improve quality but takes longer.

Default:
60
Manualseed (Manualseed) op('acestep').par.Manualseed Int

Seed for reproducibility. -1 for random. Affects initial generation.

Default:
-1
Schedulertype (Schedulertype) op('acestep').par.Schedulertype Menu

Scheduler type for diffusion process.

Default:
euler
Cfgtype (Cfgtype) op('acestep').par.Cfgtype Menu

Type of Classifier-Free Guidance.

Default:
apg
Guidancescale (Guidancescale) op('acestep').par.Guidancescale Float

Main classifier-free guidance scale. Used if CFG Type is not 'Double Condition'.

Default:
15
Omegascale (Omegascale) op('acestep').par.Omegascale Float

Omega scale factor for APG guidance type.

Default:
10
Guidancescaletext (Guidancescaletext) op('acestep').par.Guidancescaletext Float

Guidance scale for text prompt when CFG Type is 'Double Condition'.

Default:
7.5
Guidancescalelyric (Guidancescalelyric) op('acestep').par.Guidancescalelyric Float

Guidance scale for lyrics when CFG Type is 'Double Condition'.

Default:
7.5
Audio2Audio Mode [ Euler Scheduler Only ] Header
Audio2audioenable (Audio2audioenable) op('acestep').par.Audio2audioenable Toggle

Enable audio-to-audio generation. Uses Prompt & Lyrics as guidance if provided.

Default:
Off
Refaudioinput (Refaudioinput) op('acestep').par.Refaudioinput File

Path to the reference audio file for Audio2Audio mode.

Default:
"" (Empty String)
Refaudiostrength (Refaudiostrength) op('acestep').par.Refaudiostrength Float

Strength of the reference audio influence (0.0 to 1.0).

Default:
0.6
Output Settings Header
Outputfolder (Outputfolder) op('acestep').par.Outputfolder Folder

Folder to save the generated WAV file. Relative to project or absolute.

Default:
audio_out
Outputfilename (Outputfilename) op('acestep').par.Outputfilename Str

Name of the generated WAV file.

Default:
ace_step_output.wav
Uniquesuffix (Uniquesuffix) op('acestep').par.Uniquesuffix Toggle

If True, appends a timestamp to the filename to prevent overwriting.

Default:
On
Initialize (Initialize) op('acestep').par.Initialize Pulse

Initializes the ACE-Step Model. This parameter is read-only and handled internally.

Default:
None
Unloadmodel (Unloadmodel) op('acestep').par.Unloadmodel Pulse

Releases the model from memory via SideCar.

Default:
None
Loadsettings (Loadsettings) op('acestep').par.Loadsettings Pulse

Load generation parameters from the JSON associated with the Current Audio file.

Default:
None
Editaudio (Editaudio) op('acestep').par.Editaudio Toggle

Master toggle to enable audio editing modes on this page.

Default:
Off
Audio Editing Configuration Header
Editmode (Editmode) op('acestep').par.Editmode Menu

Select the audio manipulation task.

Default:
edit
Srcaudiopath (Srcaudiopath) op('acestep').par.Srcaudiopath File

Path to the source audio file for Edit, Repaint, Retake, Extend tasks.

Default:
"" (Empty String)
Extend / Repaint / Retake Header
Retakeseeds (Retakeseeds) op('acestep').par.Retakeseeds Int

Seed for retake/repaint/extend variations. -1 for random.

Default:
-1
Retakevariance (Retakevariance) op('acestep').par.Retakevariance Float

Amount of variance for retake/repaint (0.0 to 1.0).

Default:
0
Repaintstart (Repaintstart) op('acestep').par.Repaintstart Float

Start time in seconds for repaint. For extend, negative values pad left. 0 for retake.

Default:
0
Repaintend (Repaintend) op('acestep').par.Repaintend Float

End time in seconds for repaint. For extend, values beyond original duration extend right. Original duration for retake.

Default:
5
Transitiontime (Transitiontime) op('acestep').par.Transitiontime Float

Duration of the transition/crossfade in seconds for repaint/extend modes. 0 for abrupt change.

Default:
0
Edit Audio Content [ Slower ] Header
Editoriginalprompt (Editoriginalprompt) op('acestep').par.Editoriginalprompt Str

The original prompt used to generate the Source Audio. Required for 'Edit Audio Content' mode.

Default:
"" (Empty String)
Editoriginallyrics (Editoriginallyrics) op('acestep').par.Editoriginallyrics Str

The original lyrics used to generate the Source Audio. Required for 'Edit Audio Content' mode.

Default:
"" (Empty String)
Edittargetprompt (Edittargetprompt) op('acestep').par.Edittargetprompt Str

Target prompt for 'Edit Audio Content' mode. If empty, uses main prompt.

Default:
"" (Empty String)
Edittargetlyrics (Edittargetlyrics) op('acestep').par.Edittargetlyrics Str

Target lyrics for 'Edit Audio Content' mode. If empty, uses main lyrics.

Default:
"" (Empty String)
Editnmin (Editnmin) op('acestep').par.Editnmin Float

Min influence for audio editing (0.0 to 1.0).

Default:
0.65
Editnmax (Editnmax) op('acestep').par.Editnmax Float

Max influence for audio editing (0.0 to 1.0).

Default:
0.95
Editnavg (Editnavg) op('acestep').par.Editnavg Int

Averaging window size for editing.

Default:
10
Loadsrccredentials (Loadsrccredentials) op('acestep').par.Loadsrccredentials Pulse

Loads prompt and lyrics from the _input_params.json associated with the Src Audio Path.

Default:
None
Advanced Guidance Control Header
Guidanceinterval (Guidanceinterval) op('acestep').par.Guidanceinterval Float

Guidance interval for CFG.

Default:
0.98
Guidanceintervaldecay (Guidanceintervaldecay) op('acestep').par.Guidanceintervaldecay Float

Decay rate for guidance interval.

Default:
1
Minguidancescale (Minguidancescale) op('acestep').par.Minguidancescale Float

Minimum guidance scale.

Default:
1
ERG Control Header
Usergtag (Usergtag) op('acestep').par.Usergtag Toggle

Enable ERG (Exponentially Smoothed Moving Average Guidance) for prompt/tags.

Default:
Off
Userglyric (Userglyric) op('acestep').par.Userglyric Toggle

Enable ERG for lyrics.

Default:
Off
Usergdiffusion (Usergdiffusion) op('acestep').par.Usergdiffusion Toggle

Enable ERG for diffusion process.

Default:
Off
Other Advanced Parameters Header
Useoss (Useoss) op('acestep').par.Useoss Toggle

Enable Optimal Step Size scheduling. Only effective if Scheduler Type is Euler.

Default:
Off
Osssteps (Osssteps) op('acestep').par.Osssteps Str

Steps for OSS scheduling, comma-separated. Active if 'Use Optimal Step Size' is ON and Scheduler is Euler.

Default:
50,100,150,200
Device & Precision Header
Deviceid (Deviceid) op('acestep').par.Deviceid Int

GPU device ID to use (e.g., 0, 1). Requires re-initialize.

Default:
0
Usebf16 (Usebf16) op('acestep').par.Usebf16 Toggle

Use bfloat16 for faster inference (if supported). Uncheck for macOS or if errors occur. Requires re-initialize.

Default:
On
Torchcompile (Torchcompile) op('acestep').par.Torchcompile Toggle

Optimize model with torch.compile() for faster inference (Not supported on Windows by ACE-Step). Requires re-initialize.

Default:
Off
Model Configuration Header
Modelpath (Modelpath) op('acestep').par.Modelpath Folder

ACE-Step Repository Path. This parameter is read-only and automatically set.

Default:
"" (Empty String)
Checkpointdir (Checkpointdir) op('acestep').par.Checkpointdir Folder

Optional directory for model checkpoints.

Default:
"" (Empty String)
Bypass (Bypass) op('acestep').par.Bypass Toggle

Bypass the operator's functionality.

Default:
Off
Showbuiltin (Showbuiltin) op('acestep').par.Showbuiltin Toggle

Show built-in TouchDesigner parameters.

Default:
Off
Version (Version) op('acestep').par.Version Str

Version of the operator.

Default:
None
Lastupdated (Lastupdated) op('acestep').par.Lastupdated Str

Date of the last update.

Default:
None
Creator (Creator) op('acestep').par.Creator Str

Creator of the operator.

Default:
None
Website (Website) op('acestep').par.Website Str

Website for more information.

Default:
None
Chattd (Chattd) op('acestep').par.Chattd OP

Reference to the ChatTD operator.

Default:
None
Sidecaroperator (Sidecaroperator) op('acestep').par.Sidecaroperator OP

Reference to the SideCar operator handling requests.

Default:
None
  1. Set up the SideCar: Ensure the SideCar is running and its Python environment is fully configured with all ACE-Step dependencies.
  2. Press Generate: In the ACE-Step operator’s parameter panel, click the Generate Music pulse.
  3. Clone the Repo: If this is your first time, a dialog will ask for permission to download the ACE-Step repository. Click Download.
  4. Generate: The request will be sent to the SideCar for processing. The generated audio will appear in the visualizer and can be automatically played.

The ACE-Step operator is designed to integrate seamlessly with the SideCar operator for offloading heavy computation. It also connects with the ChatTD operator for managing Python environments and asynchronous operations.

  • Dependency Management: Ensure your SideCar’s Python environment has all necessary ACE-Step dependencies installed. The operator does not manage these.
  • Git Installation: Have Git installed and in your system’s PATH for automatic repository cloning.
  • Responsible Use: Be mindful of the ACE-Step model’s disclaimer regarding potential copyright infringement, cultural sensitivity, and harmful content generation. Verify originality and disclose AI involvement.
  • SideCar Not Connected: If generation fails, ensure the SideCar server is running and connected. Check the SideCar Operator parameter on the About page to confirm it’s referencing the correct SideCar instance.
  • Missing Dependencies: If you encounter errors related to missing Python packages (e.g., torch, librosa), install them manually in your SideCar’s Python environment.
  • Repository Cloning Issues: If the repository fails to clone, check your internet connection and Git installation. Review the TouchDesigner console for detailed error messages.

The ACE-Step model is a significant contribution to the field of AI music generation. If you use this operator or the underlying model in your research, please consider citing the original work.

Research & Licensing

ACE-STEP Project

The ACE-STEP project is an open-source initiative focused on advancing AI music generation.

ACE-Step: A Step Towards Music Generation Foundation Model

ACE-Step is a foundation model for music generation designed to overcome limitations of existing approaches by integrating diffusion-based generation with advanced encoding and transformation techniques.

Technical Details

  • Combines diffusion with DCAE and linear transformer.
  • Uses MERT and m-hubert for semantic alignment (REPA).
  • Outperforms LLM-based models in speed and coherence.
  • Supports various music generation tasks including text-to-music and audio-to-audio.

Research Impact

  • Overcomes limitations of existing approaches in music generation.
  • Provides a holistic architectural design for state-of-the-art performance.
  • Enables original music generation across diverse genres for creative production, education, and entertainment.

Citation

@misc{gong2025acestep,
  title={ACE-Step: A Step Towards Music Generation Foundation Model},
  author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo}, 
  howpublished={\url{https://github.com/ace-step/ACE-Step}},
  year={2025},
  note={GitHub repository}
}

Key Research Contributions

  • Novel open-source foundation model for music generation.
  • Integrates diffusion-based generation with Sana's Deep Compression AutoEncoder (DCAE) and a lightweight linear transformer.
  • Leverages MERT and m-hubert to align semantic representations (REPA) during training for rapid convergence.
  • Achieves faster synthesis (up to 4 minutes of music in 20 seconds on A100 GPU) and superior musical coherence compared to LLM-based models.
  • Preserves fine-grained acoustic details, enabling advanced control mechanisms like voice cloning, lyric editing, remixing, and track generation.

License

Apache License 2.0 - This model is freely available for research and commercial use.