|
quickstart
#the steps to quick start Introductionthe steps to quick start Details快速使用本project的步骤:
|-common.cpp //所有程序都可能用到的一些公共函数
-commonHeader.h //所有程序都可能用到的一些头文件,训练集,测试集节点 结构
-mlBase.cpp //movielens 数据集可能用到的一些公共函数,包括load 测试集数据和load 训练集数据这两个函数
-netflixBase.cpp //netflix 数据集可能用到的一些公共函数,包括load测试集和训练集的两个函数
-dataset-|
-movielens-|
-u1.base(训练集)
-u1.test(测试集)
-netflix-|
-data_without_prob.txt(测试集)
-probe_real.txt(测试集)
-other files(构造以上两个文件的程序)
-svd-|
-svdBase.cpp svd model的一些公共函数,包括svd的所有相关内容
-svd_ml.cpp svd model使用movielens的情况,中间有main函数,这个是最终需要执行的文件
-svd_netflix.cpp svd model 使用netflix的情况,中间有main函数,程序入口
-svdplusplus(svd++ model,内容从svd类推)
-asymSvd(asymmetric svd model,内容从svd类推)
-gNbr(global neighborhood based model,内容从svd类推)
-baseline(baseline model, 内容从svd类推)
-combine(asymSvd+svdplusplus,内容从svd类推)
-knn([knnstep 详情点击这里])
-stat-|
-statBase.cpp(统计数据集宏观信息,目前仅仅统计item和user打分数量)
-stat_ml.cpp(统计movielens数据集的宏观信息)
-stat_netflix.cpp(统计netflix数据集宏观信息)
cd ./svd/ 想知道你的结果是否正确,可以参考koren SIGKDD‘08的论文,或者 Ma, C C(Guide to Singular Value Decomposition for Collaborative Filtering)的结果。本人运行的结果在这里:svd结果,knn结果。欢迎大家把自己运行的结果贴上来 English vesionThe step of quick use this project:
|-common.cpp //The common functions used by most of the codes
-commonHeader.h //The common header,including the node structure of training set and test set
-mlBase.cpp //the common function of movielens dataset, including the function of loading testing dataset and the function of loading training set
-netflixBase.cpp //the common function of movielens dataset, including the function of loading testing dataset and the function of loading training set
-dataset-|
-movielens-|
-u1.base(test set)
-u1.test(training set)
-netflix-|
-data_without_prob.txt(training set)
-probe_real.txt(test set)
-other files(for generating data_without_prob.txt and probe_real.txt[netflixpreprocess click to get detail])
-svd-|
-svdBase.cpp the common function for svd model
-svd_ml.cpp use movielens dataset to test the svd model,there is a "main" function in this file and it is the entrance of the programme.
-svd_netflix.cpp svd model use movielens dataset to test the svd model,there is a "main" function in this file and it is the entrance of the programme.
-svdplusplus(svd++ model, similar as svd model)
-asymSvd(asymmetric svd model, similar as svd model)
-gNbr(global neighborhood based model, similar as svd model)
-baseline(baseline model, similar as svd model)
-combine(asymSvd+svdplusplus, similar as svd model)
-knn([knnstep details described here])
-stat-|
-statBase.cpp(stat the overall info of the dataset. For now, just stat the num of the item rated and the num of the ratings of user)
-stat_ml.cpp(stat the overall info of movielens dataset)
-stat_netflix.cpp(stat the overall info of netflix dataset)
cd ./svd/ To get whether your result is right or wrong, you can reference the paper of koren in SIGKDD'08,or the paper of Ma, C C(Guide to Singular Value Decomposition for Collaborative Filtering)。My result of running the model is here: the result of svd,the result of knn. Welcome to paste the result of yours. |
► Sign in to add a comment
不错,支持一下
好东西
太好了
knn算法要怎样设置?运行的时候出现了begin initialization: can't open operation failed!
The knn model should follow the steps:knnstep. The error you encountered is the wrong input file path.
支持,希望越做越好
谢谢,我会尽力使这个project越来越好,希望有更多的人加入这个project,一起完善这个project!
Good!
download the movielens 100K dataset,decompress the dataset,and put the two files u1.test,u1.base into the directory "./dataset/movielens/"
直接复制u1.test和u1.base后不能直接运行,你代码里define的文件名是ua.test&ua.base,建议在说明里加一句。或者改下define~ 刚跑了一下代码,慢慢学习。 希望这个项目能越做越好!!!
楼主辛苦了,今天学习了一天您写的代码,并且用java重新写了一个,您的项目对我现在学RS很有帮助,虽然我个人觉得您写的代码不够规范,特别是注释有点乱,我阅读起来有点吃力,如果在每一个算法上标明是那篇论文多少页的话会好一点。
在KDDCup上代码里knn的粗放算法不行厄。。算两组20w个元素间的交集太慢了。
在运行svd++ml.cpp 时,使用u1.test和u1.base做数据,结果是RMSE没有收敛,结果是训练集的rmse为2.多,有关注这个项目的同学请重复下试验,希望能得到你们的结果。