AudioMorphix: Training-free audio editing using diffusion models







[ArXiv paper]        [Github repo]        [HF space]        

AudioMorphix

As a training-free audio editor, the porposed AudioMorphix presents the following features: (1) Tuning-free: The AudioMorphix is a zero-shot editing method that does not require extra training to fit task-specific data; (2) Audio-referenced: Instead of text instruction which could be ambiguous in some use cases, the AudioMorphix takes an extra audio as reference for editing; (3) Versatile: the AudioMorphix is an universal framework capabable of diverse editing tasks, including addition, removal, replacement, and style transferring; (4) Region-specific: The AudioMorphix enables to edit a particular region of audio spectrogram while keeping the rest unchanged during editing.

Overview of the proposed AudioMorphix.



Demos on the addition task

Text instruction
Input audio
Ground truth
AudioMorphix
DDIM inversion
DDPM inversion
AUDIT
"add music"
fname
fname
fname
fname
fname
fname
"add the sound of sine wave"
fname
fname
fname
fname
fname
fname
"add scissors sound"
fname
fname
fname
fname
fname
fname


Demos on the removal task

Text instruction
Input audio
Ground truth
AudioMorphix
DDIM inversion
DDPM inversion
AUDIT
"remove bird song"
fname
fname
fname
fname
fname
fname
"remove laughter"
fname
fname
fname
fname
fname
fname
"remove growling sound"
fname
fname
fname
fname
fname
fname


Demos on the replacement task

Text instruction
Input audio
Ground truth
AudioMorphix
DDIM inversion
DDPM inversion
AUDIT
"replace stomach rumble with speech synthesizer"
fname
fname
fname
fname
fname
fname
"replace heartbeat sound with slap sound"
fname
fname
fname
fname
fname
fname
"replace heartbeat sound with slap sound"
fname
fname
fname
fname
fname
fname


Demos on the time shift & streching tasks

Text instruction
Source region
Target region
Input audio
AudioMorphix
"Add a 4-second delay to the thunder sound."
fname
fname
fname
fname
"Stretch the acoustic guitar sound by 2.0 times and add a 0.2-second delay to the output."
fname
fname
fname
fname
"Stretch the automobile horn sound by 2.5 times."
fname
fname
fname
fname


Demos on the pitch shift task

Text instruction
Source region
Target region
Input audio
AudioMorphix
"Increase the pitch of the piano sound."
fname
fname
fname
fname
"Lower the pitch of the high-pitched sound."
fname
fname
fname
fname
"Increase the pitch of the woman’s speech."
fname
fname
fname
fname

Page updated on 1 Aug 2025.