My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Downloads
Wiki pages
Links

Developing a platform can use variant data-mining algorithms to get results from a source(like matrix in csv source, or Chinese text documents).

这是一个能够根据源数据(比如说用csv格式表示的矩阵,或者中文文档)使用多种多样的算法去得到结果的一个平台。

Algorithms can using xml configuration to make them run one-by-one. For example, at first, we may run principal components analysis(PCA) for feature selection, then we may run random forest for classification.

算法能够通过xml配置文件去一个一个的运行,比如在开始的时候,我们可以先运行一下主成分分析算法去做特种选择,然后我们再运行随机森林算法来做分类。

Now, algorithms are mainly design for tasks can be done in a single computer, good scalability of the architecture allows you in a very short period of time to complete the algorithm you want, and use it in your project(believe me, it's faster, better, and easier than Weka). The another important feature is this platfrom can support Chinese text classification or clustering operation very good.

目前算法主要是针对那些单机能够完成的任务,该架构良好的扩展性能够让你在很短的时间内完成自己想要的算法,并且用于工程之中(相信我,肯定比Weka更快更好)。该项目的另一个特色是能够很好的支持中文文本的分类、聚类等操作。

Just write code like this, you will get amazing result:

只需要写下下面的程序,就能够得到神奇的结果:

    #load config
    config = Configuration.FromFile("conf/test.xml")
    PyMining.Init(config, "__global__")
	
    #get matrix from source text
    matCreater = ClassifierMatrix(config, "__matrix__")
    [trainx, trainy] = matCreater.CreateTrainMatrix("data/train.txt")
	
    #get chi square filter
    chiFilter = ChiSquareFilter(config, "__filter__")
    chiFilter.TrainFilter(trainx, trainy)
	
    #runs naive-bayes model to get model
    nbModel = TwcNaiveBayes(config, "twc_naive_bayes")
    nbModel.Train(trainx, trainy)

    #using the model to predict an unseen doc to target class
    [testx, testy] = matCreater.CreatePredictMatrix("data/test.txt")
    [testx, testy] = chiFilter.MatrixFilter(testx, testy)
    retY = nbModel.TestMatrix(testx, testy)
Powered by Google Project Hosting