import numpy, scipy, matplotlib.pyplot as plt, sklearn, stanford_mir, IPython.display
%matplotlib inline
plt.rcParams['figure.figsize'] = (14, 5)

Harmonic-Percussive Source Separation¶

Load two files: one harmonic, and one percussive.

yh, fs = librosa.load('prelude_cmaj_10s.wav')

yp, fs = librosa.load('125_bounce.wav')

Add the two signals together, and rescale:

N = min(len(yh), len(yp))
x = yh[:N]/yh.max() + yp[:N]/yp.max()
x = 0.5 * x/x.max()

x.max()

0.5

Listen to the combined audio signal:

IPython.display.Audio(x, rate=fs)

Compute the STFT:

X = librosa.stft(x)

Take the log-ampllitude for display purposes:

Xmag = librosa.logamplitude(X)

Display the log-magnitude spectrogram:

librosa.display.specshow(Xmag, sr=fs, x_axis='time', y_axis='log')

<matplotlib.image.AxesImage at 0x10dbe9a10>

Perform harmonic-percussive source separation:

H, P = librosa.decompose.hpss(X)

Compute the log-amplitudes of the outputs:

Hmag = librosa.logamplitude(H)
Pmag = librosa.logamplitude(P)

Display each output:

librosa.display.specshow(Hmag, sr=fs, x_axis='time', y_axis='log')

<matplotlib.image.AxesImage at 0x10dd75850>

librosa.display.specshow(Pmag, sr=fs, x_axis='time', y_axis='log')

<matplotlib.image.AxesImage at 0x10dd91250>

Transform the harmonic output back to the time domain:

h = librosa.istft(H)

Listen to the harmonic output:

IPython.display.Audio(h, rate=fs)

Transform the percussive output back to the time domain:

p = librosa.istft(P)

Listen to the percussive output:

IPython.display.Audio(p, rate=fs)