In [1]:
%matplotlib inline
import seaborn
import numpy, scipy, matplotlib.pyplot as plt, IPython.display as ipd
import librosa, librosa.display
plt.rcParams['figure.figsize'] = (15, 5)

Magnitude Scaling¶

Often, the raw amplitude of a signal in the time- or frequency-domain is not as perceptually relevant to humans as the amplitude converted into other units, e.g. using a logarithmic scale.

For example, let's consider a pure tone whose amplitude grows louder linearly. Define the time variable:

In [2]:
T = 4.0      # duration in seconds
sr = 22050   # sampling rate in Hertz
t = numpy.linspace(0, T, int(T*sr), endpoint=False)

Create a signal whose amplitude grows linearly:

In [3]:
amplitude = numpy.linspace(0, 1, int(T*sr), endpoint=False) # time-varying amplitude
x = amplitude*numpy.sin(2*numpy.pi*440*t)

Listen:

In [4]:
ipd.Audio(x, rate=sr)
Out[4]:

Plot the signal:

In [5]:
librosa.display.waveplot(x, sr=sr)
Out[5]:
<matplotlib.collections.PolyCollection at 0x11ccff690>

Now consider a signal whose amplitude grows exponentially, i.e. the logarithm of the amplitude is linear:

In [6]:
amplitude = numpy.logspace(-2, 0, int(T*sr), endpoint=False, base=10.0)
x = amplitude*numpy.sin(2*numpy.pi*440*t)
In [7]:
ipd.Audio(x, rate=sr)
Out[7]:
In [8]:
librosa.display.waveplot(x, sr=sr)
Out[8]:
<matplotlib.collections.PolyCollection at 0x11ce92310>

Even though the amplitude grows exponentially, to us, the increase in loudness seems more gradual. This phenomenon is an example of the Weber-Fechner law (Wikipedia) which states that the relationship between a stimulus and human perception is logarithmic.

Linear Amplitude¶

In [9]:
x, sr = librosa.load('audio/latin_groove.mp3', duration=8)
ipd.Audio(x, rate=sr)
Out[9]:
In [10]:
X = librosa.stft(x)
X.shape
Out[10]:
(1025, 345)

Raw amplitude:

In [11]:
Xmag = abs(X)
librosa.display.specshow(Xmag, sr=sr, x_axis='time', y_axis='log')
plt.colorbar()
Out[11]:
<matplotlib.colorbar.Colorbar at 0x11fc76f90>

Log Amplitude¶

In [12]:
Xmag = librosa.logamplitude(X)
librosa.display.specshow(Xmag, sr=sr, x_axis='time', y_axis='log')
plt.colorbar()
Out[12]:
<matplotlib.colorbar.Colorbar at 0x11cfa3650>

One common variant is the $\log (1 + \lambda x)$ function, sometimes known as logarithmic compression (FMP, p. 125). This function operates like $y = \lambda x$ when $\lambda x$ is small, but it operates like $y = \log \lambda x$ when $\lambda x$ is large.

In [13]:
Xmag = numpy.log10(1+10*abs(X))
librosa.display.specshow(Xmag, sr=sr, x_axis='time', y_axis='log')
plt.colorbar()
Out[13]:
<matplotlib.colorbar.Colorbar at 0x11d1cd590>

Decibels¶

Decibel (Wikipedia)

In [14]:
Xmag = librosa.amplitude_to_db(abs(X))
librosa.display.specshow(Xmag, sr=sr, x_axis='time', y_axis='log')
plt.colorbar()
Out[14]:
<matplotlib.colorbar.Colorbar at 0x11f4d3450>

Perceptual Weighting¶

In [15]:
freqs = librosa.core.fft_frequencies(sr=sr)
In [16]:
Xmag = librosa.perceptual_weighting(abs(X)**2, freqs)
librosa.display.specshow(Xmag, sr=sr, x_axis='time', y_axis='log')
plt.colorbar()
/Users/steve/miniconda2/lib/python2.7/site-packages/librosa/core/time_frequency.py:955: RuntimeWarning: divide by zero encountered in log10
  - 0.5 * np.log10(f_sq + const[3]))
Out[16]:
<matplotlib.colorbar.Colorbar at 0x11f80bbd0>