|
Project Information
Links
|
Iowa State University, Com S 472 project: NewsPet SummaryNewsPet is a news-reader web application that categorizes RSS news items using a trainable engine. Goals
(Application overview):
ApproachDesignCategorizationFor each read news item, a vector of per-category probabilities is retrieved from a Naive Bayes classifier. The most probable category is then assigned to the item, provided it meets some lower bound (depending on the number of categories). FeedbackFor every item in every category, there will be the ability to say that the article is accurately categorized and the system should be more confident in accepting documents like this one, or that the article should be categorized differently. LogisticsFor the main categorization portion of the application, we are utilizing a Naive Bayes classifier, (in particular, we are using Mallet as a library in our application). We are using Java for the classification and classifier training services, and Django, a python web framework, as our front end web-based UI. Informa is used as an RSS retriever and parser. TestingWe have tested the classification logic of our application with data from the Reuters 21578 collection. ReportThe most recent draft of the report for this project can be viewed here. Presentation slides are viewable here. |