My favorites | Sign in
Project Logo
             
New issue | Search
for
| Advanced search | Search tips
Issue 22: Regex expressions in stopword file
1 person starred this issue and may be notified of changes. Back to list
Status:  Done
Owner:  dp.maxime
Closed:  Dec 13
Type-Enhancement
Priority-Medium


Sign in to add a comment
 
Reported by Imlbrown, Nov 06, 2009
This is more a question or feature request.

We have many words in our database (non-cached mode) that are irrelevant to
the search engine and we would like an easy mechanism to exclude them from
the index. For example, dict16 and dict32 have thousand of words, with
multiple occurrences, that begin with "$##" and then a string of numbers.
For us, words of this pattern are irrelevant and we would like to not index
them. Is there a way to use regular expressions in the stopwords file? Any
other way to achieve the same result without brute forcing the stopwords
file with every combination we find?

Thanks!

Comment 1 by dp.maxime, Nov 09, 2009
It's a good idea. I'll implement the StopMatch command in next snapshot (within few
days).
Thanks for suggestion.
Status: Started
Labels: -Type-Defect Type-Enhancement
Comment 2 by dp.maxime, Dec 13, 2009
Sorry, it took more time than expected at first sight.
Try fresh snapshot
http://dataparksearch.googlecode.com/files/dpsearch-4.53-13122009.tar.bz2

You can use Match: command in a stopwordfile to specify regular expression for
stopwords. NB: they are very primitive regex, but you can use any charset supported
by DataparkSearch to specify them.

E.g. for your case the command is:
Match: regex ^\$##

Status: Done
Owner: dp.maxime
Sign in to add a comment

Hosted by Google Code