<p>In performance, musicians convert sheet music representations into <strong>sound</strong> which is transmitted through the air as air pressure oscillations. In essence, sound is simply air vibrating (<a href="https://en.wikipedia.org/wiki/Sound">Wikipedia</a>). Sound vibrates through the air as <strong>longitudinal waves</strong>, i.e. the oscillations are parallel to the direction of propagation.</p>
<p><strong>Audio</strong> refers to the production, transmission, or reception of sounds that are audible by humans. An <strong>audio signal</strong> is a representation of sound that represents the fluctuation in air pressure caused by the vibration as a function of time. Unlike sheet music or symbolic representations, audio representations encode everything that is necessary to reproduce an acoustic realization of a piece of music. However, note parameters such as onsets, durations, and pitches are not encoded explicitly. This makes converting from an audio representation to a
symbolic representation a difficult and ill-defined task.</p>
<p>The basic representation of an audio signal is in the <em>time domain</em>.</p>
<p>The basic representation of an audio signal is in the <strong>time domain</strong>.</p>
<p><ahref="https://en.wikipedia.org/wiki/Sound">Sound is air vibrating</a>. An audio signal represents the fluctuation in air pressure caused by the vibration as a function of time.</p>
<p>To plot a signal in the time domain, use <ahref="http://bmcfee.github.io/librosa/generated/librosa.display.waveplot.html"><code>librosa.display.waveplot</code></a>:</p>
<p>The change in air pressure at a certain time is graphically represented by a <strong>pressure-time plot</strong>, or simply <strong>waveform</strong>.</p>
<p>To plot a waveform, use <a href="http://bmcfee.github.io/librosa/generated/librosa.display.waveplot.html"><code>librosa.display.waveplot</code></a>:</p>
<p>Digital computers can only capture this data at discrete moments in time. The rate at which a computer captures audio data is called the <em>sampling frequency</em> (abbreviated <code>fs</code>) or <em>sample rate</em> (abbreviated <code>sr</code>). For this workshop, we will mostly work with a sampling frequency of 44100 Hz.</p>
<p>Digital computers can only capture this data at discrete moments in time. The rate at which a computer captures audio data is called the <strong>sampling frequency</strong> (often abbreviated <code>fs</code>) or <strong>sampling rate</strong> (often abbreviated <code>sr</code>). For this workshop, we will mostly work with a sampling frequency of 44100 Hz, the sampling rate of CD recordings.</p>
<p><strong>Timbre</strong> is the quality of sound that distinguishes the tone of different instruments and voices even if the sounds have the same pitch and loudness.</p>
<p>One characteristic of timbre is its temporal evolution. The <strong>envelope</strong> of a signal is a smooth curve that approximates the amplitude extremes of a waveform over time.</p>
<p>Envelopes are often modeled by the <strong>ADSR model</strong> (<a href="https://en.wikipedia.org/wiki/Synthesizer#Attack_Decay_Sustain_Release_.28ADSR.29_envelope">Wikipedia</a>) which describes four phases of a sound: attack, decay, sustain, release.</p>
<p>During the attack phase, the sound builds up, usually with noise-like components over a broad frequency range. Such a noise-like short-duration sound at the start of a sound is often called a transient.</p>
<p>During the decay phase, the sound stabilizes and reaches a steady periodic pattern.</p>
<p>During the sustain phase, the energy remains fairly constant.</p>
<p>During the release phase, the sound fades away.</p>
<p>The ADSR model is a simplification and does not necessarily model the amplitude envelopes of all sounds.</p>
<p>Another property used to characterize timbre is the existence of partials and their relative strengths. <strong>Partials</strong> are the dominant frequencies in a musical tone with the lowest partial being the <strong>fundamental frequency</strong>.</p>
<p>The partials of a sound are visualized with a <strong>spectrogram</strong>. A spectrogram shows the intensity of frequency components over time.</p>
"import numpy, scipy, matplotlib.pyplot as plt, pandas, librosa"
"import numpy, scipy, matplotlib.pyplot as plt, pandas, librosa, IPython.display as ipd, urllib"
]
]
},
},
{
{
...
@@ -36,6 +36,25 @@
...
@@ -36,6 +36,25 @@
"# Audio Representation"
"# Audio Representation"
]
]
},
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"source": [
"In performance, musicians convert sheet music representations into **sound** which is transmitted through the air as air pressure oscillations. In essence, sound is simply air vibrating ([Wikipedia](https://en.wikipedia.org/wiki/Sound)). Sound vibrates through the air as **longitudinal waves**, i.e. the oscillations are parallel to the direction of propagation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Audio** refers to the production, transmission, or reception of sounds that are audible by humans. An **audio signal** is a representation of sound that represents the fluctuation in air pressure caused by the vibration as a function of time. Unlike sheet music or symbolic representations, audio representations encode everything that is necessary to reproduce an acoustic realization of a piece of music. However, note parameters such as onsets, durations, and pitches are not encoded explicitly. This makes converting from an audio representation to a\n",
"symbolic representation a difficult and ill-defined task."
]
},
{
{
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {
"metadata": {
...
@@ -44,7 +63,7 @@
...
@@ -44,7 +63,7 @@
}
}
},
},
"source": [
"source": [
"## Time Domain"
"## Waveforms and the Time Domain"
]
]
},
},
{
{
...
@@ -55,9 +74,7 @@
...
@@ -55,9 +74,7 @@
}
}
},
},
"source": [
"source": [
"The basic representation of an audio signal is in the *time domain*. \n",
"The basic representation of an audio signal is in the **time domain**. "
"\n",
"[Sound is air vibrating](https://en.wikipedia.org/wiki/Sound). An audio signal represents the fluctuation in air pressure caused by the vibration as a function of time."
"The change in air pressure at a certain time is graphically represented by a **pressure-time plot**, or simply **waveform**."
]
]
},
},
{
{
...
@@ -118,7 +138,7 @@
...
@@ -118,7 +138,7 @@
}
}
},
},
"source": [
"source": [
"To plot a signal in the time domain, use [`librosa.display.waveplot`](http://bmcfee.github.io/librosa/generated/librosa.display.waveplot.html):"
"To plot a waveform, use [`librosa.display.waveplot`](http://bmcfee.github.io/librosa/generated/librosa.display.waveplot.html):"
]
]
},
},
{
{
...
@@ -210,7 +230,86 @@
...
@@ -210,7 +230,86 @@
}
}
},
},
"source": [
"source": [
"Digital computers can only capture this data at discrete moments in time. The rate at which a computer captures audio data is called the *sampling frequency* (abbreviated `fs`) or *sample rate* (abbreviated `sr`). For this workshop, we will mostly work with a sampling frequency of 44100 Hz."
"Digital computers can only capture this data at discrete moments in time. The rate at which a computer captures audio data is called the **sampling frequency** (often abbreviated `fs`) or **sampling rate** (often abbreviated `sr`). For this workshop, we will mostly work with a sampling frequency of 44100 Hz, the sampling rate of CD recordings."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Frequency and Pitch"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dynamics, Intensity, and Loudness"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Timbre"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Timbre** is the quality of sound that distinguishes the tone of different instruments and voices even if the sounds have the same pitch and loudness."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One characteristic of timbre is its temporal evolution. The **envelope** of a signal is a smooth curve that approximates the amplitude extremes of a waveform over time.\n",
"\n",
"Envelopes are often modeled by the **ADSR model** ([Wikipedia](https://en.wikipedia.org/wiki/Synthesizer#Attack_Decay_Sustain_Release_.28ADSR.29_envelope)) which describes four phases of a sound: attack, decay, sustain, release. \n",
"\n",
"During the attack phase, the sound builds up, usually with noise-like components over a broad frequency range. Such a noise-like short-duration sound at the start of a sound is often called a transient.\n",
"\n",
"During the decay phase, the sound stabilizes and reaches a steady periodic pattern.\n",
"\n",
"During the sustain phase, the energy remains fairly constant.\n",
"\n",
"During the release phase, the sound fades away.\n",
"\n",
"The ADSR model is a simplification and does not necessarily model the amplitude envelopes of all sounds."
"Another property used to characterize timbre is the existence of partials and their relative strengths. **Partials** are the dominant frequencies in a musical tone with the lowest partial being the **fundamental frequency**.\n",
"\n",
"The partials of a sound are visualized with a **spectrogram**. A spectrogram shows the intensity of frequency components over time."