Contents:
![]()
Search Tips
- Bookworm does not distinguish between a query "all of these words" and a query "at least one word" - instead, matches containing all of the search terms will be listed first, followed by those containing fewer and fewer search terms, or fragments thereof.
- There are no escape sequences or similar: Everything you type is searched for in just the way you typed. This makes it easy to search for the unusual character sequences (like delete[]() or \cmd{}) in books on programming languages.
- Matches are considered better if two adjacent words in the query are also adjacent in the match.
- Similarly, matches are also considered better if characters at the beginning/end of the query match the beginning/end of the match.
- In fact, the Bookworm search is not word-oriented at all: If you're too lazy to type operator precedence, just try op prec or similar!
- Bookworm is semi-case-sensitive - matches with correct case are considered better, but matches with differing case will also be found.
- If two matches score equal after all the rules above have been considered, the match whose length is closer to that of the query is listed first.
- Queries which consist of only 1 or 2 characters are not handled too well - if you get too few matches, add another character (not space).
- The threshold for a match to be listed is set very low. Thus, if the search results seem to bear no resemblance at all to your query, this means there is no real match.
All these properties, which may seem unusual for a search engine, are the result of the "fuzzy search" algorithm used by Bookworm. Unlike with other search engines, the query is not subdivided into words to search for. Instead, it is divided into n-grams, chunks of n characters. For example, when tri-grams (i.e. chunks of length 3) are used, the term "a query" is turned into the tri-grams "a q", " qu", "que", "uer" and "ery". Each line of the text to be searched is then checked for any occurrances of those tri-grams.
If you only counted the number of matching tri-grams in the text to be searched, the results would sometimes be wrong, simply because the match may contain all the tri-grams, but in a different order. In these cases, the quality of the results can be increased by repeating the process with longer n-grams. Bookworm performs checks with 2-, 3-, 4- and 5-grams to measure the similarity between query and match.
The fuzzy search algorithm is based on an article in the 4/1997 issue of c't magazine. Many thanks!
![]()
