| Issue 22: | Regex expressions in stopword file | |
| 1 person starred this issue and may be notified of changes. | Back to list |
Sign in to add a comment
|
This is more a question or feature request. We have many words in our database (non-cached mode) that are irrelevant to the search engine and we would like an easy mechanism to exclude them from the index. For example, dict16 and dict32 have thousand of words, with multiple occurrences, that begin with "$##" and then a string of numbers. For us, words of this pattern are irrelevant and we would like to not index them. Is there a way to use regular expressions in the stopwords file? Any other way to achieve the same result without brute forcing the stopwords file with every combination we find? Thanks! |
||||||||||
,
Nov 09, 2009
It's a good idea. I'll implement the StopMatch command in next snapshot (within few days). Thanks for suggestion.
Status: Started
Labels: -Type-Defect Type-Enhancement |
|||||||||||
,
Dec 13, 2009
Sorry, it took more time than expected at first sight. Try fresh snapshot http://dataparksearch.googlecode.com/files/dpsearch-4.53-13122009.tar.bz2 You can use Match: command in a stopwordfile to specify regular expression for stopwords. NB: they are very primitive regex, but you can use any charset supported by DataparkSearch to specify them. E.g. for your case the command is: Match: regex ^\$##
Status: Done
Owner: dp.maxime |
|||||||||||
|
|
|||||||||||