My favorites | Sign in
Project Logo
                
Details: Show all Hide all

Last 7 days

  • Dec 10, 2009
    r121 (fixed typo and license stuff; added a few python scripts) committed by benoit.favre   -   fixed typo and license stuff; added a few python scripts
    fixed typo and license stuff; added a few python scripts

Earlier this year

  • Oct 19, 2009
    r120 (Added a java version of the classifier) committed by benoit.favre   -   Added a java version of the classifier
    Added a java version of the classifier
  • Oct 19, 2009
    r119 (modified SAMME. do not use! it does't work.) committed by benoit.favre   -   modified SAMME. do not use! it does't work.
    modified SAMME. do not use! it does't work.
  • Oct 09, 2009
    FAQ (Frequently asked questions) Wiki page edited by benoit.favre   -   Revision r118 Edited wiki page through web user interface.
    Revision r118 Edited wiki page through web user interface.
  • Oct 09, 2009
    optimal_threshold.pl (optimal_threshold.pl (computes decision threshold for max f-...) file uploaded by benoit.favre   -  
    Labels: Type-Executable OpSys-All
    Labels: Type-Executable OpSys-All
  • Oct 09, 2009
    icsiboost.macosx10.6.gz (icsiboost for mac osx 10.6) file uploaded by benoit.favre   -  
    Labels: Type-Executable OpSys-OSX
    Labels: Type-Executable OpSys-OSX
  • Aug 30, 2009
    issue 4 (The test (maybe also train) classification error while train...) changed by benoit.favre   -   Basically, on multiclass problems, icsiboost considers an example to be correctly classified if the only class to have a positive score is the correct class (all other classes shall have a negative score). Boostexter, on the other hand, only looks at the argmax. On multilabel problems both work the same way. This does not affect performance. This is mentioned on the main page: "2009-04-08 WARNING: On multiclass problems, icsiboost does not compute the error rates the same way boostexter does. This does not result in lower performing models, and an option for getting compatible values will be implemented in the future."
    Status: WontFix
    Labels: Priority-Low Priority-Medium
    Basically, on multiclass problems, icsiboost considers an example to be correctly classified if the only class to have a positive score is the correct class (all other classes shall have a negative score). Boostexter, on the other hand, only looks at the argmax. On multilabel problems both work the same way. This does not affect performance. This is mentioned on the main page: "2009-04-08 WARNING: On multiclass problems, icsiboost does not compute the error rates the same way boostexter does. This does not result in lower performing models, and an option for getting compatible values will be implemented in the future."
    Status: WontFix
    Labels: Priority-Low Priority-Medium
  • Aug 27, 2009
    issue 4 (The test (maybe also train) classification error while train...) commented on by stanislas.oger   -   I precise that this example does not involve n/s-grams. Moreover, the score difference is 14% absolute and it cannot be explained by a "normal" difference of implementation between icsiboost and boostexter.
    I precise that this example does not involve n/s-grams. Moreover, the score difference is 14% absolute and it cannot be explained by a "normal" difference of implementation between icsiboost and boostexter.
  • Aug 27, 2009
    issue 4 (The test (maybe also train) classification error while train...) reported by stanislas.oger   -   What steps will reproduce the problem? 1. get the sample data attached to the ticket and uncompress it 2. run "icsiboost -S essai7 -n 100" 3. Note the last test error (0.407143 at round 100) 4. then run "boostexter -C -S essai7 < essai7.test" 5. Note the test classification error (0.267857) What is the expected output? What do you see instead? I think that the output of the last round test error should be the same as the one obtained with boostexter on the same test data. What version of the product are you using? On what operating system? I use the last SVN release (v0.3c, revision 117), compiled on linux 64 bits (2.6.24.3-mosix64). The output of the command "icsiboost --version" : ------------------------------- icsiboost v0.3c, Boosting decision stumps. Written by Benoit Favre. Copyright (C) 2007 International Computer Science Institute. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Build: Aug 26 2009 at 13:06:06, gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu3), 64bit Subversion info: $URL: http://icsiboost.googlecode.com/svn/trunk/icsiboost/src/icsiboost.c $ $Date: 2009-07-29 23:46:35 +0200 (mer 29 jui 2009) $ $Revision: 117 $ $Author: benoit.favre $ ------------------------------ The output of the command "uname -a" : ------------------------------ Linux xxxx 2.6.24.3-mosix64 #3 SMP Sun Oct 26 19:17:09 CET 2008 x86_64 GNU/Linux ------------------------------ Please provide any additional information below. I trained the same classifier with 2 classes instead of 7, and the test classification error was correct.
    What steps will reproduce the problem? 1. get the sample data attached to the ticket and uncompress it 2. run "icsiboost -S essai7 -n 100" 3. Note the last test error (0.407143 at round 100) 4. then run "boostexter -C -S essai7 < essai7.test" 5. Note the test classification error (0.267857) What is the expected output? What do you see instead? I think that the output of the last round test error should be the same as the one obtained with boostexter on the same test data. What version of the product are you using? On what operating system? I use the last SVN release (v0.3c, revision 117), compiled on linux 64 bits (2.6.24.3-mosix64). The output of the command "icsiboost --version" : ------------------------------- icsiboost v0.3c, Boosting decision stumps. Written by Benoit Favre. Copyright (C) 2007 International Computer Science Institute. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Build: Aug 26 2009 at 13:06:06, gcc 4.2.4 (Ubuntu 4.2.4-1ubuntu3), 64bit Subversion info: $URL: http://icsiboost.googlecode.com/svn/trunk/icsiboost/src/icsiboost.c $ $Date: 2009-07-29 23:46:35 +0200 (mer 29 jui 2009) $ $Revision: 117 $ $Author: benoit.favre $ ------------------------------ The output of the command "uname -a" : ------------------------------ Linux xxxx 2.6.24.3-mosix64 #3 SMP Sun Oct 26 19:17:09 CET 2008 x86_64 GNU/Linux ------------------------------ Please provide any additional information below. I trained the same classifier with 2 classes instead of 7, and the test classification error was correct.
  • Jul 29, 2009
    r117 (added support for continuous/ngram features in the python de...) committed by benoit.favre   -   added support for continuous/ngram features in the python decoder
    added support for continuous/ngram features in the python decoder
  • Jun 01, 2009
    r116 (added a full python decoder) committed by benoit.favre   -   added a full python decoder
    added a full python decoder
  • May 27, 2009
    r115 (updated samme with threaded training with deterministic outp...) committed by benoit.favre   -   updated samme with threaded training with deterministic output
    updated samme with threaded training with deterministic output
  • May 27, 2009
    r114 (implemented 'Multiclass Adaboost' from Ji Zhu in samme.c) committed by benoit.favre   -   implemented 'Multiclass Adaboost' from Ji Zhu in samme.c
    implemented 'Multiclass Adaboost' from Ji Zhu in samme.c
  • May 18, 2009
    FAQ (Frequently asked questions) Wiki page edited by benoit.favre
  • May 18, 2009
    FAQ (Frequently asked questions) Wiki page edited by benoit.favre
  • May 12, 2009
    issue 3 (ICSIBoost with threads activated does not use all processor ...) commented on by fabiosoaresf   -   I tend to avoid using shell-level parallelism because, in my problem, one thread uses aproximately 1GB of memory, and this value may vary. I am afraid that in some situations this value get higher than the physical memory if I used many threads, which leads to use virtual memory and drastically decreases of performance. Anyhow, I'll think about it. It may be safe to run 2 or perhaps 3 threads this way. Thank you very much and congratulations for your work. Your algorithm gave us results at the same level of Linear SVM (SVM-Perf), but was much faster. Best Regards, Fabio Figueiredo
    I tend to avoid using shell-level parallelism because, in my problem, one thread uses aproximately 1GB of memory, and this value may vary. I am afraid that in some situations this value get higher than the physical memory if I used many threads, which leads to use virtual memory and drastically decreases of performance. Anyhow, I'll think about it. It may be safe to run 2 or perhaps 3 threads this way. Thank you very much and congratulations for your work. Your algorithm gave us results at the same level of Linear SVM (SVM-Perf), but was much faster. Best Regards, Fabio Figueiredo
  • May 12, 2009
    issue 3 (ICSIBoost with threads activated does not use all processor ...) commented on by fabiosoaresf   -   I tend to avoid using shell-level parallelism because, in my problem, one thread uses aproximately 1GB of memory, and this value may vary. I am afraid that in some situations this value get higher than the physical memory, which leads to use virtual memory and drastically decreases of performance. Anyhow, I'll think about it. It may be safe to run 2 or perhaps 3 threads this way. Thank you very much and congratulations for your work. Your algorithm gave us results at the same level of Linear SVM (SVM-Perf), but was much faster. Best Regards, Fabio Figueiredo
    I tend to avoid using shell-level parallelism because, in my problem, one thread uses aproximately 1GB of memory, and this value may vary. I am afraid that in some situations this value get higher than the physical memory, which leads to use virtual memory and drastically decreases of performance. Anyhow, I'll think about it. It may be safe to run 2 or perhaps 3 threads this way. Thank you very much and congratulations for your work. Your algorithm gave us results at the same level of Linear SVM (SVM-Perf), but was much faster. Best Regards, Fabio Figueiredo
  • May 12, 2009
    issue 3 (ICSIBoost with threads activated does not use all processor ...) commented on by benoit.favre   -   Actually, only the weak learner selection part is parallelized. Loading the data from disk, updating the example weights and computing the error rate at each iteration are not parallelized. This means that unless you have a lot of features that occur in a lot of examples, the parallel version is going to be dominated by sequential operations. On a mult-day training time (for one model only), I do get a boost in duration which is sub-linear in the number of threads. From the timing you give me, if each model takes 30 seconds to be trained, it's most likely spending time reading from disk. This is especially true if you use a number of iterations under 1000. In your case, you might get better performance by disabling threading altogether (comment out #define USE_THREADS in the source code), which enables the compiler to activate additional optimization. Since training seems short, why don't you train multiple models at the same time by using shell-level parallelism?
    Actually, only the weak learner selection part is parallelized. Loading the data from disk, updating the example weights and computing the error rate at each iteration are not parallelized. This means that unless you have a lot of features that occur in a lot of examples, the parallel version is going to be dominated by sequential operations. On a mult-day training time (for one model only), I do get a boost in duration which is sub-linear in the number of threads. From the timing you give me, if each model takes 30 seconds to be trained, it's most likely spending time reading from disk. This is especially true if you use a number of iterations under 1000. In your case, you might get better performance by disabling threading altogether (comment out #define USE_THREADS in the source code), which enables the compiler to activate additional optimization. Since training seems short, why don't you train multiple models at the same time by using shell-level parallelism?
  • May 12, 2009
    issue 3 (ICSIBoost with threads activated does not use all processor ...) commented on by fabiosoaresf   -   Hello, Sure. I have made the tests. But before show them, a few comments: (1) A Multi Thread system will likely be faster than a Sigle Thread if it parallelizes other resources rather than CPU, like I/O, for example. As a result, a Multi Thread system that employs just one CPU Core will likely be faster than the Single Threaded version for reasons not related to CPU. However, for CPU intensive tasks, like ICSI, it could be much more faster if it parallelizes the CPU, also. (2) My tests consisted of: (2.1) To create models in order to be able to classify texts; (2.2) More than a hundred models were created because my collection has sets of hierarchical categories; (2.3) The results below were the time needed for training those hundreds of models. (3) Results: Multi Thread Clock Time: 9m32.178s Single Thread Clock Time: 8m29.954s Multi Thread User Time: 8m7.494s Single Thread User Time: 7m44.997s Multi Thread System Time: 1m58.563s Single Thread System Time: 0m45.127s Machine: Linux kernel 2.6.28-11. Ubuntu, 64 bits. Intel Quad 2 Core, model Q8200. 4GB RAM. Observe that, at least in my experiments, the Multi Thread system was slower. User Time was comparable between the two versions, but the overhead needed to instantiate and control the threads increases the overall time. Best Regards, Fabio
    Hello, Sure. I have made the tests. But before show them, a few comments: (1) A Multi Thread system will likely be faster than a Sigle Thread if it parallelizes other resources rather than CPU, like I/O, for example. As a result, a Multi Thread system that employs just one CPU Core will likely be faster than the Single Threaded version for reasons not related to CPU. However, for CPU intensive tasks, like ICSI, it could be much more faster if it parallelizes the CPU, also. (2) My tests consisted of: (2.1) To create models in order to be able to classify texts; (2.2) More than a hundred models were created because my collection has sets of hierarchical categories; (2.3) The results below were the time needed for training those hundreds of models. (3) Results: Multi Thread Clock Time: 9m32.178s Single Thread Clock Time: 8m29.954s Multi Thread User Time: 8m7.494s Single Thread User Time: 7m44.997s Multi Thread System Time: 1m58.563s Single Thread System Time: 0m45.127s Machine: Linux kernel 2.6.28-11. Ubuntu, 64 bits. Intel Quad 2 Core, model Q8200. 4GB RAM. Observe that, at least in my experiments, the Multi Thread system was slower. User Time was comparable between the two versions, but the overhead needed to instantiate and control the threads increases the overall time. Best Regards, Fabio
  • May 12, 2009
    issue 3 (ICSIBoost with threads activated does not use all processor ...) commented on by benoit.favre   -   Can you confirm that an instance with 1 job finishes in the same time as an instance with n jobs?
    Can you confirm that an instance with 1 job finishes in the same time as an instance with n jobs?
  • May 12, 2009
    issue 3 (ICSIBoost with threads activated does not use all processor ...) reported by fabiosoaresf   -   What steps will reproduce the problem? 1. Running the program using threads (--jobs=N ; n >= 2) What is the expected output? What do you see instead? - We may expect from a multi thread system that runs on a CPU which has more than one Processor Core that explores the whole processor power. Instead, although ICSIBoost allows multi threading, the CPU utilization never goes beyond one processor core. For example, in a Quad Core processor, the CPU utilization is, at most, 25%. What version of the product are you using? On what operating system? - (1) ICSI-Boost, r104, 64bits - (2) ICSI-Boost, r102, 32bits - Operating system: Linux 2.6.28-11, Ubuntu, 64bits. Please provide any additional information below. - Picture attached
    What steps will reproduce the problem? 1. Running the program using threads (--jobs=N ; n >= 2) What is the expected output? What do you see instead? - We may expect from a multi thread system that runs on a CPU which has more than one Processor Core that explores the whole processor power. Instead, although ICSIBoost allows multi threading, the CPU utilization never goes beyond one processor core. For example, in a Quad Core processor, the CPU utilization is, at most, 25%. What version of the product are you using? On what operating system? - (1) ICSI-Boost, r104, 64bits - (2) ICSI-Boost, r102, 32bits - Operating system: Linux 2.6.28-11, Ubuntu, 64bits. Please provide any additional information below. - Picture attached
  • Apr 10, 2009
    r111 (Added the --drop <regex> option to ignore ngrams which match...) committed by benoit.favre   -   Added the --drop <regex> option to ignore ngrams which match a regular expression. Use <name>:text:drop=<regex> in your names file for a per-column effect. Also fixed a bug with --no- unknown-stump (and added the per-column option no_unk) which is, by the way, equivalent to --drop '\?'.
    Added the --drop <regex> option to ignore ngrams which match a regular expression. Use <name>:text:drop=<regex> in your names file for a per-column effect. Also fixed a bug with --no- unknown-stump (and added the per-column option no_unk) which is, by the way, equivalent to --drop '\?'.
  • Apr 08, 2009
    FileFormats (description of the file formats used by icsiboost) Wiki page edited by benoit.favre
  • Apr 08, 2009
    FileFormats (description of the file formats used by icsiboost) Wiki page edited by benoit.favre
  • Apr 08, 2009
    FileFormats (description of the file formats used by icsiboost) Wiki page edited by benoit.favre
  • Apr 08, 2009
    FAQ (Frequently asked questions) Wiki page edited by benoit.favre
  • Apr 08, 2009
    FAQ (Frequently asked questions) Wiki page added by benoit.favre
  • Apr 07, 2009
    r105 (added debug info to classification mode (use -V)) committed by benoit.favre   -   added debug info to classification mode (use -V)
    added debug info to classification mode (use -V)
  • Mar 31, 2009
    icsiboost-64bit-static-r104.gz (Precompiled static executable for Linux x86_64 (64bit)) file uploaded by benoit.favre   -  
    Labels: Featured Type-Executable OpSys-Linux
    Labels: Featured Type-Executable OpSys-Linux
  • Mar 31, 2009
    icsiboost-32bit-static-r104.gz (Precompiled static executable for Linux x86 (32bit)) file uploaded by benoit.favre   -  
    Labels: Featured Type-Executable OpSys-Linux
    Labels: Featured Type-Executable OpSys-Linux
  • Mar 31, 2009
    r104 (text expert options can be set in the names file on a per-co...) committed by benoit.favre   -   text expert options can be set in the names file on a per-column basis. ex: "bag_of_words_feature:text:expert_type=ngram expert_length=4 cutoff=10.". the count cutoff was ignored, also fixed that.
    text expert options can be set in the names file on a per-column basis. ex: "bag_of_words_feature:text:expert_type=ngram expert_length=4 cutoff=10.". the count cutoff was ignored, also fixed that.
 
Hosted by Google Code