Zhu-Takaoka algorithmESMAJQuicksearch algorithmContents
Next: Zhu-Takaoka algorithm Up: ESMAJ Prev: Quicksearch algorithm

Tuned Boyer-Moore algorithm


Main features
Description

The Tuned Boyer-Moore is a implementation of a simplified version of the Boyer-Moore algorithm which is very fast in practice. The most costly part of a string-matching algorithm is to check whether the character of the pattern match the character of the window. To avoid doing this part too often, it is possible to unrolled several shifts before actually comparing the characters. The algorithm used the bad-character shift function to find x[m-1] in y and keep on shifting until finding it, doing blindly three shifts in a row. This required to save the value of bmBc[x[m-1]] in a variable shift and then to set bmBc[x[m-1]] to 0. This required also to add m occurrences of x[m-1] at the end of y. When x[m-1] is found the m-1 other characters of the window are checked and a shift of length shift is applied.

The comparisons between pattern and text characters during each attempt can be done in any order. This algorithm has a quadratic worst-case time complexity but a very good practical behaviour.

The C code

The function preBmBc is given chapter Boyer-Moore algorithm.

void TUNEDBM(char *x, int m, char *y, int n) {
   int j, k, shift, bmBc[ASIZE];
 
   /* Preprocessing */
   preBmBc(x, m, bmBc);
   shift = bmBc[x[m - 1]];
   bmBc[x[m - 1]] = 0;
   memset(y + n, x[m - 1], m);
 
   /* Searching */
   j = 0;
   while (j < n) {
      k = bmBc[y[j + m -1]];
      while (k !=  0) {
         j += k; k = bmBc[y[j + m -1]];
         j += k; k = bmBc[y[j + m -1]];
         j += k; k = bmBc[y[j + m -1]];
      }
      if (memcmp(x, y + j, m - 1) == 0 && j < n)
         OUTPUT(j);
      j += shift;                          /* shift */
   }
}

The example

Preprocessing phase

bmBc table
bmBc table used by Tuned Boyer-Moore algorithm.

Searching phase


References


Zhu-Takaoka algorithmESMAJQuicksearch algorithmContents
Next: Zhu-Takaoka algorithm Up: ESMAJ Prev: Quicksearch algorithm

e-mails: {Christian.Charras, Thierry.Lecroq }@laposte.net
Tue Jan 14 15:03:31 MET 1997