%matplotlib inline
import seaborn
import numpy, scipy, sklearn, matplotlib.pyplot as plt, IPython.display as ipd, librosa, urllib
After you have performed signal decomposition (e.g. using librosa.decompose.decompose), try to classify and combine the separated signals.
Download an audio file containing bass drum, snare drum, and hi-hat:
filename = 'Classic Rock Beat 06.wav'
urllib.urlretrieve(
'http://audio.musicinformationretrieval.com/Jam Pack 1/' + filename,
filename=filename
)
Load the audio file into an array:
x, fs = librosa.load(filename)
Listen to the audio:
ipd.Audio(x, rate=fs)
Compute the spectrogram:
X = librosa.stft(x)
Save the magnitude and phase of the spectrogram for later use:
Xabs = numpy.absolute(X)
Xphase = numpy.angle(X)
Perform nonnegative matrix factorization on the signal:
n_components = 30
W, H = librosa.decompose.decompose(Xabs, n_components=n_components, sort=True)
print W.shape
print H.shape
Let's create a function that reconstructs a time-domain signal from the NMF outputs:
def reconstruct_signal(w, h, Xphase, length=None):
Y = scipy.outer(w, h)*numpy.exp(1j*Xphase)
y = librosa.istft(Y)
if length:
y = librosa.util.fix_length(y, length)
return y
Reconstruct each signal component:
y_signals = numpy.array([reconstruct_signal(W[:,n], H[n], Xphase, length=len(x)) for n in range(n_components)])
Listen to one of the signal components:
ipd.Audio(y_signals[29], rate=fs)
Compute features for each signal component.
def normalize(h):
return numpy.array(h)/scipy.linalg.norm(h)
window_sz = 20
hop_sz = 10
features = numpy.array([
[H[n, i:i+window_sz].mean() for i in range(0, H.shape[1]-window_sz, hop_sz)]
for n in range(n_components)
])
features.shape
plt.plot(H[0])
plt.plot(features[0])
model = sklearn.cluster.KMeans(n_clusters=3)
labels = model.fit_predict(features)
print labels
bass_drum = y_signals[labels==0, :].sum(axis=0)
ipd.Audio(bass_drum, rate=fs)
snare_drum = y_signals[labels==2, :].sum(axis=0)
ipd.Audio(snare_drum, rate=fs)
hi_hat = y_signals[labels==1, :].sum(axis=0)
ipd.Audio(hi_hat, rate=fs)
import numpy as np
labels = np.empty(20, np.int32)
labels[0:9] = 1 # First 10 are the first sample type, e.g. snare
labels[10:20] = 2 # Second 10 are the second sample type, e.g kick
model_snare = KNeighborsClassifier(n_neighbors = 1)
model.fit(scaledTrainingFeatures, labels.take(train_index, 0))
model_output = model_snare.predict(scaledTestingFeatures)
Extract features from the drum signals that you separated in Lab 4 Section 1.
Classify them using the K-NN model that you built.
Does K-NN accurately classify the separated signals?
Repeat for different numbers of separated signals (i.e., the parameter K
in NMF).
Overseparate the signal using K = 20
or more. For those separated components that are classified as snare, add them together using `sum}. The listen to the sum signal. Is it coherent, i.e., does it sound like a single separated drum?
...and more!
Good luck!