jueves, 21 de febrero de 2013

Homework 2

To this entry undertook analyze algorithms "String matching" seen in class, these are the algorimo Boyer-Moore and Knuth-Morris-Pratt.

Boyer-Moore
The Boyer-Moore algorithm searches for occurrences of P in T by performing explicit character comparisons at different alignments. Instead of a brute-force search of all alignments (of which there are m - n + 1), Boyer-Moore uses information gained by preprocessing P to skip as many alignments as possible.

The algorithm begins at alignment k = n, so the start of P is aligned with the start of T. Characters in P and T are then compared starting at index n in P and k in T, moving downward: the strings are matched from the end and toward the beginning of P. The comparisons continue until either a mismatch occurs or the beginning of P is reached (which means there is a match), after which the alignment is shifted to the right according to the maximum value permitted by a number of rules. The comparisons are performed again at the new alignment, and the process repeats until the alignment is shifted past the end of T.

The shift rules are implemented as constant-time table lookups, using tables generated during the preprocessing of P.


Knuth-Morris-Pratt.

A string matching algorithm wants to find the starting index m in string S[] that matches the search word W[].

A straightforward algorithm simply tries successive values of m until it finds a match or fails. Each trial involves using a loop that checks S[m+i] = W[i] for each character W[i] in the search word.

Usually, the trial check will quickly reject the trial match. If the strings are uniformly distributed random letters, then the chance that characters match is 1 in 26. In most cases, the trial check will reject the match at the initial letter. The chance that the first two letters will match is 1 in 26^2 (1 in 676). So if the characters are random, then the expected complexity of searching string S[] of length k is on the order of k comparisons or O(k). The expected performance is very good. If S[] is 1 billion characters and W[] is 1000 characters, then the string search should complete after about 1 billion character comparisons (which might take a few seconds).


here the code:

here a image of the result of the code
makes 44 comparasions, and just take a few time

Now the code of the bruteforce

I have an error but this algorithm is more fast that Boyer

Bibliografias:
http://cs.indstate.edu/~kmandumula/presentation.pdf
http://www-igm.univ-mlv.fr/~lecroq/string/node14.html

1 comentario:

  1. The second algorithm is missing. When the experiment shows that a naive algorithm outperforms a fancy one, you could conclude that your experimental design is bad and you should feel bad. Code: 3 pts, report 2 pts.

    ResponderEliminar