My favorites | Sign in
Project Logo
                
Search
for
Updated Mar 29, 2008 by ben.motmans
Labels: Phase-Support
DocumentationSearching  
Instructions on how to use the text search algorithms.

Class: Anculus.Core.Search and Anculus.Core.SetSearch

Documentation

The text search algorithms can be divided into 2 categories:

Unless otherwise specified, all search algorithms offer full support for both ascii and unicode (utf-8) strings.

The regular search algorithm defines 3 different methods:

int[] SearchAll (string text, string keyword)
int[] SearchAll (string text, int start, string keyword)
int[] SearchAll (string text, int start, int count, string keyword)

int SearchFirst (string text, string keyword)
int SearchFirst (string text, int start, string keyword)

bool Contains (string text, string keyword)
bool Contains (string text, int start, string keyword)

The set search algorithm defines the same 3 methods, but uses a SearchResult structure instead of an integer value to represent the result.

struct SearchResult
{
	int Index { get; }
	int Length { get; }
	string Match { get; }
}
SearchResult[] SearchAll (string text, params string[] keywords);
SearchResult[] SearchAll (string text, int start, params string[] keywords);
SearchResult[] SearchAll (string text, int start, int count, params string[] keywords);

SearchResult SearchFirst (string text, params string[] keywords);
SearchResult SearchFirst (string text, int start, params string[] keywords);

bool Contains (string text, params string[] keywords);
bool Contains (string text, int start, params string[] keywords);

For convenience, Anculus defines 2 utility classes to simplify the use of string search algorithms.

The Search utility is used to search a text for a single keywords, it implements the same methods as defined in ISearchAlgorithm using the best algorithm for each scenario. for the SearchFirst and Contains methods, the regular string.IndexOf method is used. SearchAll on the other hand uses the Boyer-Moore algorithm which is several times faster.

The SetSearch utility is used to search for multiple keywords in a given text. It implements the same methods as defined in ISetSearchAlgorithm. All methods use the Aho-Corasick set matching algorithm.

Usage

string unicodeText = "休 忘 男 沈 酒"; //some random Kanji characters from wikipedia

int[] results = Search.SearchAll (unicodeText, "男");
int first = Search.Search (unicodeText, "休");


string asciiText = "The quick brown fox jumps over the lazy dog";

SearchResult[] set = SetSearch.SearchAll (asciiText, "fox", "dog");

Sign in to add a comment
Hosted by Google Code