|
Class: Anculus.Core.Search and Anculus.Core.SetSearch DocumentationThe text search algorithms can be divided into 2 categories: - ISearchAlgorithm implementations, used to search for a single keyword in a given text
- ISetSearchAlgorithm implementations, used to search for several keywords at the same time
Unless otherwise specified, all search algorithms offer full support for both ascii and unicode (utf-8) strings. The regular search algorithm defines 3 different methods: int[] SearchAll (string text, string keyword)
int[] SearchAll (string text, int start, string keyword)
int[] SearchAll (string text, int start, int count, string keyword)
int SearchFirst (string text, string keyword)
int SearchFirst (string text, int start, string keyword)
bool Contains (string text, string keyword)
bool Contains (string text, int start, string keyword) The set search algorithm defines the same 3 methods, but uses a SearchResult structure instead of an integer value to represent the result. struct SearchResult
{
int Index { get; }
int Length { get; }
string Match { get; }
}SearchResult[] SearchAll (string text, params string[] keywords);
SearchResult[] SearchAll (string text, int start, params string[] keywords);
SearchResult[] SearchAll (string text, int start, int count, params string[] keywords);
SearchResult SearchFirst (string text, params string[] keywords);
SearchResult SearchFirst (string text, int start, params string[] keywords);
bool Contains (string text, params string[] keywords);
bool Contains (string text, int start, params string[] keywords); For convenience, Anculus defines 2 utility classes to simplify the use of string search algorithms. The Search utility is used to search a text for a single keywords, it implements the same methods as defined in ISearchAlgorithm using the best algorithm for each scenario. for the SearchFirst and Contains methods, the regular string.IndexOf method is used. SearchAll on the other hand uses the Boyer-Moore algorithm which is several times faster. The SetSearch utility is used to search for multiple keywords in a given text. It implements the same methods as defined in ISetSearchAlgorithm. All methods use the Aho-Corasick set matching algorithm. Usagestring unicodeText = "休 忘 男 沈 酒"; //some random Kanji characters from wikipedia
int[] results = Search.SearchAll (unicodeText, "男");
int first = Search.Search (unicodeText, "休");
string asciiText = "The quick brown fox jumps over the lazy dog";
SearchResult[] set = SetSearch.SearchAll (asciiText, "fox", "dog");
|