Knuth–Morris–Pratt string search algorithm
Start at LHS of string, string[0], trying to match pattern, working right.
Trying to match string[i] == pattern[j].
How to build the table
Everything else below is just how to build the table.
Construct a table showing where to reset j to
- If mismatch string[i] != pattern[0], just move string to i+1, j = 0
- If mismatch string[i] != pattern[1], we leave i the same, j = 0
pattern = 10
string = ... 1100000 - If mismatch string[i] != pattern[2], we leave i the same, and change j, but we need to consider repeats in pattern[0] .. pattern[1]
pattern = 110
string = ... 11100000
i stays same, j goes from 2 back to 1pattern = 100
string = ... 10100000
i stays same, j goes from 2 back to 0 - If mismatch string[i] != pattern[j], we leave i the same, and change j, but we need to consider repeats in pattern[0] .. pattern[j-1]
Given a certain pattern, construct a table showing where to reset j to.
Construct a table of next[j]
For each j, figure out:
next[j] = length of longest prefix in "pattern[0] .. pattern[j-1]" that matches the suffix of "pattern[1] .. pattern[j]”
That is:
- prefix must include pattern[0]
- suffix must include pattern[j]
- prefix and suffix are different
next[j] = length of longest prefix in "pattern[0] .. pattern[j-1]" that matches the suffix of "pattern[1] .. pattern[j]”
当j+1位与s[k]位比较,不匹配时
j'=next[j], j’和s[k]比较了,j’移到了原j+1的位置
j | 0 | 1 | 2 | 3 | 4 | 5 |
substring 0 to j | A | AB | ABA | ABAB | ABABA | ABABAC |
longest prefix-suffix match | none | none | A | AB | ABA | none |
next[j] | 0 | 0 | 1 | 2 | 3 | 0 |
notes | no prefix and suffix that are different i.e. next[0]=0 for all patterns |
Given j, let n = next[j]
"pattern[0] .. pattern[n-1]" = "pattern[j-(n-1)] .. pattern[j]"
"pattern[0] .. pattern[next[j]-1]" = "pattern[j-(next[j]-1)] .. pattern[j]"
e.g. j = 4, n = 3,
"pattern[0] .. pattern[2]" = "pattern[2] .. pattern[4]"
If match fails at position j+1(compare with s[j+1]), keep i same, reset pattern to position n(next[j]).
Have already matched pattern[0] .. pattern[n-1], pattern[0] .. pattern[n-1]=pattern[1] .. pattern[n]
e.g. We have matched ABABA so far.
If next one fails, say we have matched ABA so far and then see if next one matches.
That is, keep i same, just reset j to 3 (= precisely length of longest prefix-suffix match)
Then, if match after ABA fails too, by the same rule we say we have matched A so far, reset to j = 1, and try again from there.
In other words, it starts by trying to match the longest prefix-suffix, but if that fails it works down to the shorter ones until exhausted (no prefix-suffix matches left).
Algorithm to construct table of next[j]
pattern[0] ... pattern[m-1]
Here, i and j both index pattern.
next[0] = 0 i = 1 // on 1 step i=1,j=0 // 比如[0],[1],[2] === [4],[5][6] // 这时 [3] <> [7] //maybe there is another pattern we can shift right though,就是前缀和后缀 j = next[j-1] // 因为next[j]就是给j+1用的,这个可记为定律,并且用j-1的原因还有0到[j-1]才有前后缀匹配的概念, // j是没有和模式串中的前缀匹配的,画画图就知道了 } // 模式串的下标为0时,与文本串s的下标i的值不匹配,i右移一位,模式串右移一位,0右移还是0 next[i] = 0 |