Commit 19bb5d91 authored by Steve Tjoa's avatar Steve Tjoa

lcs; dtw

parent f8568285
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed.
......@@ -11940,6 +11940,7 @@ div#notebook {
<div class="text_cell_render border-box-sizing rendered_html">
<ol>
<li><a href="dp.html">Dynamic Programming</a> (<a href="dp.ipynb">ipynb</a>)</li>
<li><a href="lcs.html">Longest Common Subsequence</a> (<a href="lcs.ipynb">ipynb</a>)</li>
<li><a href="dtw.html">Dynamic Time Warping</a> (<a href="dtw.ipynb">ipynb</a>)</li>
</ol>
......
......@@ -137,6 +137,7 @@
"metadata": {},
"source": [
"1. [Dynamic Programming](dp.html) ([ipynb](dp.ipynb))\n",
"1. [Longest Common Subsequence](lcs.html) ([ipynb](lcs.ipynb))\n",
"1. [Dynamic Time Warping](dtw.html) ([ipynb](dtw.ipynb))"
]
},
......
This diff is collapsed.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[&larr; Back to Index](index.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Longest Common Subsequence"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To motivate dynamic time warping, let's look at a classic dynamic programming problem: find the **longest common subsequence (LCS)** of two strings ([Wikipedia](https://en.wikipedia.org/wiki/Longest_common_subsequence_problem)). A subsequence is not required to maintain consecutive positions in the original strings, but they must retain their order. Examples:\n",
"\n",
" lcs('cake', 'baker') -> 'ake'\n",
" lcs('cake', 'cape') -> 'cae'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can solve this using recursion. We must find the optimal substructure, i.e. decompose the problem into simpler subproblems. \n",
"\n",
"Let `x` and `y` be two strings. Let `z` be the true LCS of `x` and `y`. \n",
"\n",
"If the first characters of `x` and `y` were the same, then that character must also be the first character of the LCS, `z`. In other words, if `x[0] == y[0]`, then `z[0]` must equal `x[0]` (which equals `y[0]`). Therefore, append `x[0]` to `lcs(x[1:], y[1:])`.\n",
"\n",
"If the first characters of `x` and `y` differ, then solve for both `lcs(x, y[1:])` and `lcs(x[1:], y)`, and keep the result which is longer."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is the recursive solution:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def lcs(x, y):\n",
" if x == \"\" or y == \"\":\n",
" return \"\"\n",
" if x[0] == y[0]:\n",
" return x[0] + lcs(x[1:], y[1:])\n",
" else:\n",
" z1 = lcs(x[1:], y)\n",
" z2 = lcs(x, y[1:])\n",
" return z1 if len(z1) > len(z2) else z2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Test:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"pairs = [\n",
" ('cake', 'baker'),\n",
" ('cake', 'cape'),\n",
" ('catcga', 'gtaccgtca'),\n",
" ('zxzxzxmnxzmnxmznmzxnzm', 'nmnzxmxzmnzmx'),\n",
" ('dfkjdjkfdjkjfdkfdkfjd', 'dkfjdjkfjdkjfkdjfkjdkfjdkfj'),\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ake\n",
"cae\n",
"ctca\n",
"zxmxzmnzmx\n",
"dfjdjkfdjkjfdkfdkfj\n"
]
}
],
"source": [
"for x, y in pairs:\n",
" print lcs(x, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The time complexity of the above recursive method is $O(2^{n_x+n_y})$. That is slow because we might compute the solution to the same subproblem multiple times. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Memoization"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can do better through memoization, i.e. storing solutions to previous subproblems in a table."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, we create a table where cell `(i, j)` stores the length `lcs(x[:i], y[:j])`. When either `i` or `j` is equal to zero, i.e. an empty string, we already know that the LCS is the empty string. Therefore, we can initialize the table to be equal to zero in all cells. Then we populate the table from the top left to the bottom right."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def lcs_table(x, y):\n",
" nx = len(x)\n",
" ny = len(y)\n",
" \n",
" # Initialize a table.\n",
" table = [[0 for _ in range(ny+1)] for _ in range(nx+1)]\n",
" \n",
" # Fill the table.\n",
" for i in range(1, nx+1):\n",
" for j in range(1, ny+1):\n",
" if x[i-1] == y[j-1]:\n",
" table[i][j] = 1 + table[i-1][j-1]\n",
" else:\n",
" table[i][j] = max(table[i-1][j], table[i][j-1])\n",
" return table "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's visualize this table:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[[0, 0, 0, 0, 0, 0],\n",
" [0, 0, 0, 0, 0, 0],\n",
" [0, 0, 1, 1, 1, 1],\n",
" [0, 0, 1, 2, 2, 2],\n",
" [0, 0, 1, 2, 3, 3]]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = 'cake'\n",
"y = 'baker'\n",
"table = lcs_table(x, y)\n",
"table"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" b a k e r\n",
" 0 0 0 0 0 0\n",
"c 0 0 0 0 0 0\n",
"a 0 0 1 1 1 1\n",
"k 0 0 1 2 2 2\n",
"e 0 0 1 2 3 3\n"
]
}
],
"source": [
"xa = ' ' + x\n",
"ya = ' ' + y\n",
"print ' '.join(ya)\n",
"for i, row in enumerate(table):\n",
" print xa[i], ' '.join(str(z) for z in row)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we backtrack, i.e. read the table from the bottom right to the top left:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def lcs(x, y, table, i=None, j=None):\n",
" if i is None:\n",
" i = len(x)\n",
" if j is None:\n",
" j = len(y)\n",
" if table[i][j] == 0:\n",
" return \"\"\n",
" elif x[i-1] == y[j-1]:\n",
" return lcs(x, y, table, i-1, j-1) + x[i-1]\n",
" elif table[i][j-1] > table[i-1][j]:\n",
" return lcs(x, y, table, i, j-1)\n",
" else:\n",
" return lcs(x, y, table, i-1, j)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ake\n",
"cae\n",
"ctca\n",
"zxmxzmnzmx\n",
"dfjdjkfdjkjfdkfdkfj\n"
]
}
],
"source": [
"for x, y in pairs:\n",
" table = lcs_table(x, y)\n",
" print lcs(x, y, table)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Table construction has time complexity $O(mn)$, and backtracking is $O(m+n)$. Therefore, the overall running time is $O(mn)$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[&larr; Back to Index](index.html)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.13"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment