My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
OutputProbabilities  
How do I get probabilities out of icsiboost
Featured
Updated Dec 16, 2010 by benoit.f...@gmail.com

icsiboost does not really generate anything near probabilities. There is a paper that studied the question: Obtaining calibrated probabilities from boosting by Niculescu-Mizil and Caruana.

They advise three solutions:

  • Logistic Correction
  • Platt Calibration
  • Isotonic regression

The first one consists in transforming the scores using this formula: 1/(1+exp(-2*n*score)), where n is the number of weak learners.

It is implemented in icsiboost through the --posteriors option. For instance, on the adult dataset, it results in:

icsiboost -S adult -C --posteriors < adult.test | head
0 1 0.000676516588 0.999323483412
0 1 0.142914079015 0.857085920985
1 0 0.346835918704 0.653164081296
1 0 0.996016904305 0.003983095695
0 1 0.000004176785 0.999995823215
0 1 0.003001997215 0.996998002785
0 1 0.014896068044 0.985103931956
1 0 0.788795652673 0.211204347327
0 1 0.003583447587 0.996416552413
0 1 0.060653451950 0.939346548050

Note that while these scores are between 0 and 1, they are not guaranteed to sum to 1 over all classes (when you have more than 2 classes), so you should normalize them for each example.

Platt Calibration and Logistic Regression work better in some cases (skewed label prior...). It's also possible to get good results by just moving the decision boundary using a development set (for instance with the --max-fmeasure <label> --optimal-iterations options or with the optimal_threshold.pl script).


Sign in to add a comment
Powered by Google Project Hosting