2. Dynamic Programming: name and story
• Richard Bellman coined the term “Dynamic Programming”
Bellman autobiography
“The face of Wilson (the secretory of defense) would turn red, and he would get
violent if people used the term RESEARCH in his presence. You can imagine how he
felt, then, about the term MATHEMATICAL …. I had to do something to shield Wilson
and the Air Force from the fact that I was really doing MATHEMATICS inside the
RAND Corporation…. I decided therefore to use the word “PROGRAMMING". I
wanted to get across the idea that this was DYNAMIC, this was multistage, this was
time-varying…. I thought dynamic programming was a good name. It was something
not even a Congressman could object to..."
10. Fibonacci sequence
• Recursion:
• F(n) = F(n – 1) + F(n – 2)
• Starts from n
• When computing F(n), F(n-1) and F(n-2) is not known yet
• DP:
• F(n) = F(n – 1) + F(n – 2)
• Starts from 0 and 1
• When computing F(n), F(n-1) and F(n-2) has been stored in array
• Dynamic programming: partial result stored to save time
11. Longest common subsequence
• To find the longest subsequence common to two or more sequences
• String1: “AGCAT”
• String2: “GAC”
• Common subsequence: “A”, “C”, “G”, “AC”, “GA”,
• LCS: “AC”, or “GA”
• To use a table to find LCS:
• First column: string1(“AGCAT”)
• First row: string2(“GAC”)
• Table[i, j]: LCS of string1.substring(0, i) and string2.substring(0, j)
16. Wildcard matching
• Linux command-line:
user@bash: ls b*
barry.txt, blan.txt bob.txt
• Complicated example:
string = "adcab“
pattern = “*a*b“
• DP solution:
• Definition: table[i][j]
• Base case:
table[0][0] = true
first row: table[0][i + 1] = table[0][i] (pattern[i]=*)
• Induction rule:
(1) if string[i] equals pattern[j] or pattern[j] equals ?
table[i + ][j + 1] = table[i][j]
(2) if (pattern[j] equals *
table[i + 1][j + 1] = table [i + 1][j] or table [i][j + 1]
- * a * b
- T T F F F
a
d
c
a
b
17. Wildcard matching
- * a * b
- T T F F F
a F T T T F
d F T F T F
c F T F T F
a F T T
b
• Linux command-line:
user@bash: ls b*
barry.txt, blan.txt bob.txt
• Complicated example:
string = "adcab“
pattern = “*a*b“
• DP solution:
• Definition: table[i][j]
• Base case:
table[0][0] = true
first row: table[0][i + 1] = table[0][i] (pattern[i]=*)
• Induction rule:
(1) if string[i] equals pattern[j] or pattern[j] equals ?
table[i + ][j + 1] = table[i][j]
(2) if (pattern[j] equals *
table[i + 1][j + 1] = table [i + 1][j] or table [i][j + 1]j + 1]
18. Wildcard matching
- * a * b
- T T F F F
a F T T T F
d F T F T F
c F T F T F
a F T T T F
b F T F T
• Linux command-line:
user@bash: ls b*
barry.txt, blan.txt bob.txt
• Complicated example:
string = "adcab“
pattern = “*a*b“
• DP solution:
• Definition: table[i][j]
• Base case:
table[0][0] = true
first row: table[0][i + 1] = table[0][i] (pattern[i]=*)
• Induction rule:
(1) if string[i] equals pattern[j] or pattern[j] equals ?
table[i + ][j + 1] = table[i][j]
(2) if (pattern[j] equals *
table[i + 1][j + 1] = table [i + 1][j] or table [i][j + 1] j + 1]
19. Wildcard matching
- * a * b
- T T F F F
a F T T T F
d F T F T F
c F T F T F
a F T T T F
b F T F T T
• Linux command-line:
user@bash: ls b*
barry.txt, blan.txt bob.txt
• Complicated example:
string = "adcab“
pattern = “*a*b“
• DP solution:
• Definition: table[i][j]
• Base case:
table[0][0] = true
first row: table[0][i + 1] = table[0][i] (pattern[i]=*)
• Induction rule:
(1) if string[i] equals pattern[j] or pattern[j] equals ?
table[i + ][j + 1] = table[i][j]
(2) if (pattern[j] equals *
table[i + 1][j + 1] = table [i + 1][j] or table [i][j + 1] j + 1]
20. Longest common subsequence and wildcard
matching
• DP starts from initial condition to the end of string:
• From left to right at each row
• From top to bottom at each cloumn
• State transition from table[i - 1][j - 1], table[i][j - 1], table[i - 1][j] to
table[i][j]
• Each time: move forward by one step
• State at each is the global optimum of that step
• Table (or diagram) is the best tool to simulate the processing
21. Matrix chain multiplication
• Multiple two matrices: A(10 x 100) and B(100 x 5)
• OUT[p][r] += A[p][q] * B[q][r]
• Computation = 10 x 100 x 5
• Multiple three matrices: A1(10 x 100), A2(100 X 5), and A3(5 x 50)
• ((A1 A2) A3) : 10 x 100 x 5 (A1 A2) + 10 x 5 x 50 = 7500
• (A1 (A2 A3)) : 100 x 5 x 50 (A2 A3) + 10 x 100 x 50 = 75000
• ((A1 A2) A3) is 10 times faster than (A1 (A2 A3)) in regarding to scalar
computation
22. Matrix chain multiplication
• How to optimize the chain multiplication of matrices ( A1, A2, A3, ….
An)
• DP induction rule:
23. Matrix chain multiplication: DP solution
• Six matrices multiplication:
• Status:
• M[i, j]: the min number of computations for the matrices (i to j) multiplication
• S[i, j]: the last-layer break-point for M[i, j]
31. Matrix chain multiplication: DP solution
• State hard to define:
• M[i, j]
• S[i, j]
• State transition complicated:
• By row and column not work
• From previous state to current state by the matrices length (Induction rule)
32. Framework of dynamic programming
• Three key components of dynamic programming algorithm:
• Definition of state
• Initial condition (base)
• Induction rule (state transition)
• Induction rule: difficult to find
• 1D/2D table for the thinking process
33. What is part of speech tagging?
• Identify parts of the speech (syntactic categories):
This is a simple sentence
DET VB DET ADJ NOUN
• POS tagging is a first step towards syntactic analysis (sematic analysis)
• Faster than full parsing
• Text classification and word disambiguation
• How to decide the correct label:
• Word to be labeled: chair is probably a noun
• Labels of surrounding word: if preceding word is a modal verb (.e.g., will) then this
word is more likely to be a verb
• Hidden Markov models can be used to work on this problem
34. Why is POS tagging hard?
• Ambiguity
glass of water/NOUN vs. water/VERB the plants
lie/VERB down vs. tell a lie/NOUN
wind/VERB down vs. a mighty wind/NOUN(homographs)
How about time flies like an arrow?
• Sparse data:
• Words we haven’t seen before
• Word-Tag pairs we haven’t seen before
35. Example transition probabilities
• Probabilities estimated from tagged WSJ corpus:
• Proper nouns (NNP) often begin sentences:P(NNP|<s>) = 0.28
• Modal verbs (MD) nearly always followed by bare verbs (VB).
• Adjectives (JJ) are often followed by nouns (NN).
36. Example output probabilities
• Probabilities estimated from tagged WSJ corpus:
• 0.0032% of proper nouns are Janet: P(Janet|NNP) = 0.000032
• About half of determiners (DT) are the.
• the can also be a proper noun.
37. Hidden Markov Chain
• A set of states (tags)
• An output alphabet (words)
• Initial state (beginning of sentence)
• State transition probabilities ( P(ti|ti-1) )
• Symbol emission probabilities ( P(wi|ti) )
39. Computation estimation of POS
• Suppose we have C possible tags for each of the n words in the
sentence
• There are C^n possible tag sequences: the number grows
exponentially in the length n
• Viterbi algorithm: use dynamic programming to solve it
40. Viterbi algorithm:
• Target: argmaxT P(T|W)
• Intuition: best path of length (i) at state of t must include best path of
length (i-1) to the previous state
• Use a table to store the partial result:
• TXN table, v(t, i) is the prob of best state sequence for w1 … wi ending at
state i
• Fill in columns from left to right, the max is over each possible previous t’
V(t, i) = max { v (t’, i – 1) P(t|t’) P(wi|ti) }
50. Dynamic programming: take-home message
• Why fast: use memory to store partial result
• DP algorithm component: state definition, initial condition, and
induction rule
• Solve DP problem with a table
51. Top ten DP problems
• Longest common subsequence
• Shortest common subsequence
• Longest increasing subsequence
• Edit distance
• Matrix chain multiplication
• 0-1 knapsack problem
• Partition problem
• Rod cutting
• Coin change problem
• Word break problem