Boyer-Moore
The Boyer-Moore algorithm searches for occurrences of P in T by performing explicit character comparisons at different alignments. Instead of a brute-force search of all alignments (of which there are m - n + 1), Boyer-Moore uses information gained by preprocessing P to skip as many alignments as possible.
The algorithm begins at alignment k = n, so the start of P is aligned with the start of T. Characters in P and T are then compared starting at index n in P and k in T, moving downward: the strings are matched from the end and toward the beginning of P. The comparisons continue until either a mismatch occurs or the beginning of P is reached (which means there is a match), after which the alignment is shifted to the right according to the maximum value permitted by a number of rules. The comparisons are performed again at the new alignment, and the process repeats until the alignment is shifted past the end of T.
The shift rules are implemented as constant-time table lookups, using tables generated during the preprocessing of P.
Knuth-Morris-Pratt.
A string matching algorithm wants to find the starting index m in string S[] that matches the search word W[].
A straightforward algorithm simply tries successive values of m until it finds a match or fails. Each trial involves using a loop that checks S[m+i] = W[i] for each character W[i] in the search word.
Usually, the trial check will quickly reject the trial match. If the strings are uniformly distributed random letters, then the chance that characters match is 1 in 26. In most cases, the trial check will reject the match at the initial letter. The chance that the first two letters will match is 1 in 26^2 (1 in 676). So if the characters are random, then the expected complexity of searching string S[] of length k is on the order of k comparisons or O(k). The expected performance is very good. If S[] is 1 billion characters and W[] is 1000 characters, then the string search should complete after about 1 billion character comparisons (which might take a few seconds).
here the code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import random | |
import sys | |
import time | |
def Chain(size): | |
word = '' | |
for i in range(size): | |
letter = chr(random.randint(97,101)) | |
word+=letter | |
return word | |
def String(size): | |
pattern ='' | |
for i in range(size): | |
letter = chr(random.randint(97,101)) | |
pattern += letter | |
return pattern | |
def boyerMoore(text, pattern): | |
Txt = len(text) | |
Pat = len(pattern) | |
fail = failure(text,pattern,Txt,Pat) | |
fix = suff(pattern,Pat) | |
matches = [] | |
j = 0 | |
cont = 0 | |
while (j <= Txt - Pat): | |
i = Pat-1 | |
while True: | |
cont +=1 | |
if i>=0 and pattern[i] == text[i+j]: | |
i -= 1 | |
else: | |
break | |
if i<0: | |
matches.append(j) | |
j+= fix[0] | |
else: | |
j += max(fix[i], fail[text[i+j]]-Pat+1+i) | |
return cont | |
def failure(text,pattern,t,p): | |
'''Dictionary that tells us | |
''' | |
wrongchar = {} | |
for i in range(t-1): | |
if text[i] not in pattern: | |
wrongchar[text[i]] = p | |
for j in range(p-1): | |
wrongchar[pattern[j]] = p-j-1 | |
return wrongchar | |
def suffixes(pattern, x): | |
a = x - 1 | |
b = a | |
suff = [] | |
for i in range(x): | |
suff.append(0) | |
suff[x-1] = x | |
for i in range(x-2,-1,-1): | |
if i > a and suff[i+x-1-b] < i-a: | |
suff[i]= suff[i+x-1-b] | |
else: | |
if i < a: | |
a = i | |
b = i | |
while (a>=0 and pattern[a] == pattern[a+x-1-b]): | |
a-=1 | |
suff[i] = b - a | |
return suff | |
def suff(pattern,x): | |
Suff = [] | |
s = suffixes(pattern,x) | |
for i in range(x): | |
Suff.append(x) | |
for i in range(x-2,-1,-1): | |
if (s[i] == i+1): | |
for j in range(x-2): | |
if Suff[j] == x: | |
Suff[j] = x-1-i | |
for i in range(x-1): | |
Suff[x-1-s[i]] = x-1-i | |
return Suff | |
def main(): | |
text = 'abbaaabbbabbabbabaaabaabbaabbbabba' | |
pattern = 'abba' | |
initialTime = time.time() | |
cont = boyerMoore(text,pattern) | |
endTime = time.time() - initialTime | |
print cont, ' ', endTime | |
if __name__ == "__main__": | |
main() |
makes 44 comparasions, and just take a few time
Now the code of the bruteforce
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/python | |
import time | |
def bruteforce(t,p): | |
i = 0 | |
while i <= len(t)-len(p): | |
if t[i:i+len(p)] == p: | |
return i | |
else: | |
i += 1 | |
t = "abbabababbabbbbaabbabbbabaabababbababaababaaababababba" | |
p = "abba" | |
initialTime = time() | |
print bruteforce(t, p) | |
print "Finish %fs"%(time()-initialTime) |
Bibliografias:
http://cs.indstate.edu/~kmandumula/presentation.pdf
http://www-igm.univ-mlv.fr/~lecroq/string/node14.html
The second algorithm is missing. When the experiment shows that a naive algorithm outperforms a fancy one, you could conclude that your experimental design is bad and you should feel bad. Code: 3 pts, report 2 pts.
ResponderEliminar