Export to GitHub

cleartk - issue #416

handle unknown words in cosine similarity function


Posted on Feb 3, 2015 by Grumpy Horse

In vector representations unknown words are sometimes modeled by having all low-frequency words map to a string like "unk" during training. Right now unknown words are handled by always returning 0 similarity. If a map passed in has an "unk" string then use it when the words passed in are not in the map.

Comment #1

Posted on Feb 3, 2015 by Grumpy Horse

This issue was closed by revision 25c9287cfafd.

Status: Fixed

Labels:
Type-Defect Priority-Medium Component-ml-tksvmlight