
tfidf - issue #1
Error with "Reads "term:frequency" from each subsequent line in the file" part of code
What steps will reproduce the problem? 1. A term such as the following <a href="http: /www.pamil-visions.net/author/laura/" title="posts by laura spencer">
What is the expected output? In the line frequency = int(tokens[1].strip()) frequency should return a numner What do you see instead? ValueError: invalid literal for int() with base 10: '/www.pamil-visions.net/author/laura/" title="posts by laura spencer">'
On what operating system? Windows vista
I think to correct this you can do the following:
# Reads "term:frequency" from each subsequent line in the file.
for line in corpus_file:
tokens = line.rpartition(":")
term = tokens[0].strip()
frequency = int(tokens[2].strip())
self.term_num_docs[term] = frequency
Comment #1
Posted on Jan 19, 2010 by Happy HorseThank you for pointing this out and suggesting a fix. I've taken the fix, and it's in version 1.1. Thanks!
Comment #2
Posted on Oct 6, 2010 by Grumpy Dogwww.sbh.h-gz.com/vb/
Comment #3
Posted on Oct 6, 2010 by Grumpy DogROR Sitemap for http://www_sbh.h-gz.com/vb/ http://www_sbh.h-gz.com/vb/ ROR Sitemap for http://www_sbh.h-gz.com/vb/ http://www_sbh.h-gz.com/vb/ sitemap SiteMap http://www_sbh.h-gz.com/vb/ week 0 sitemap
Status: New
Labels:
Type-Defect
Priority-Medium