In [1]:
import numpy, scipy, matplotlib.pyplot as plt, pandas, librosa

Segmentation

In audio processing, it is common to operate on one frame at a time using a constant frame size and hop size (i.e. increment). Frames are typically chosen to be 10 to 100 ms in duration.

Let's create an audio sweep signal that is frequency modulated from 110 Hz to 880 Hz. Then, we will segment the signal and compute the zero crossing rate for each frame.

First, set our parameters:

In [2]:
T = 3.0      # duration in seconds
fs = 44100.0 # sampling rate in Hertz
f0 = 440*numpy.logspace(-2, 1, T*fs, endpoint=False, base=2.0) # time-varying frequency
print f0.min(), f0.max() # starts at 110 Hz, ends at 880 Hz
110.0 879.9861686

Create the signal:

In [3]:
import essentia
t = numpy.linspace(0, T, T*fs, endpoint=False)
x = essentia.array(0.01*numpy.sin(2*numpy.pi*f0*t))

Listen to the signal:

In [4]:
from IPython.display import Audio
Audio(x, rate=fs)
Out[4]:

Segmentation Using Python List Comprehensions

In Python, you can use a standard list comprehension to perform segmentation of a signal:

In [5]:
from essentia.standard import ZeroCrossingRate
zcr = ZeroCrossingRate()
frame_sz = 1024
hop_sz = 512
plt.semilogy([zcr(x[i:i+frame_sz]) for i in range(0, len(x), hop_sz)])
Out[5]:
[<matplotlib.lines.Line2D at 0x106473290>]

essentia.standard.FrameGenerator

We can also use essentia.standard.FrameGenerator to segment our audio signal.

For each frame, compute the zero crossing rate, and display:

In [6]:
from essentia.standard import FrameGenerator
plt.semilogy([zcr(frame) for frame in FrameGenerator(x, frameSize=frame_sz, hopSize=hop_sz)])
Out[6]:
[<matplotlib.lines.Line2D at 0x1061b3490>]

Example: Spectrogram

Let's create a spectrogram. For each frame in the signal, we will window it by applying a Hamming window, and then compute its spectrum.

In [7]:
from essentia.standard import Spectrum, Windowing, FrameGenerator
hamming_window = Windowing(type='hamming')
spectrum = Spectrum()  # we just want the magnitude spectrum

spectrogram = numpy.array([spectrum(hamming_window(frame))
                     for frame in FrameGenerator(x, frameSize=1024, hopSize=500)])

This spectrogram has 266 frames, each containing 513 frequency bins.

In [8]:
print spectrogram.shape
(266, 513)

Finally, plot the spectrogram. We must transpose the spectrogram array such that time is displayed along the horizontal axis, and frequency is along the vertical axis.

In [9]:
plt.imshow(spectrogram.T, origin='lower', aspect='auto', interpolation='nearest')
plt.ylabel('Spectral Bin Index')
plt.xlabel('Frame Index')
Out[9]:
<matplotlib.text.Text at 0x10d55ff50>

(There are easier ways to display a spectrogram, e.g. using Matplotlib or librosa. This example was just used to illustrate segmentation in Essentia.)