|
README
INTRODUCTIONThis package implements two significance tests for comparing digital gene profiles, described in the article: Varuzza et al. "Significance tests for comparing digital gene expression profiles" KempBasu package comprises two programs: Kemp for the frequentist test and Basu for the Bayesian test, and some auxiliary scripts. SYSTEM REQUIREMENTSKempbasu dependencies are listed below:
GSL, GLIB and pkg-config are available in all Linux distributions. These libraries are also availiable for Mac OS X and Windows through MacPorts and Gygwin, respectively. Judy source code is distributed with KempBasu source code. To run the auxiliary scripts (including the wrapper script) the Ruby Programming Language Runtime is required. COMPILINGDownload and unpack KempBasu source code. Then unpack and compile Judy library: cd <kempbasu directory> cd ext-libs/ tar zxvf Judy-1.0.4.tar.gz cd Judy-1.0.4 ./configure make You may need administrator privileges (root) to install it. Use the following commands: sudo make install or su make install Optionally, you can install Judy in your account running configure with the prefix option: ./configure --prefix=$HOME And then simply installing it with the command: make install If you decide to install Judy in your HOME, remember to type the commands below before running the configure script: export CFLAGS=-I$HOME/include export LDFLAGS=-L$HOME/lib After installing Judy, the same procedure should be used to compile and install kempbasu. cd <kempbasu-directory> ./configure make ...and then make install BINARIESThe package provides two binaries: kemp.bin and basu.bin. These programs are intended to be integrated with other programs that can provide a friendlier interface to the end user. KempBasu's input/output are very simple, thus allowing an easy parsing by other programs. A set of scripts is provided to facilitate the use of KempBasu. For those who are interested in using KempBasu embedded in other programs, the description of the kemp.bin and basu.bin input and output files are provided in the "RUNNING LOW LEVEL" section. RUNNINGFor running both programs, the input file must be formatted in a table, with fields separated by tabs, as follows:
...where S_j is the sum of corresponding library and c_ij is the count of tag i in the library j. For example, the file examples/example.dat of distribution has the following content: TEST T1 T2 SUM 10000 10000 tag1 1 3 tag2 7 21 tag3 10 30 To run Kemp, type: kemp <filename> or to run Basu, type: basu <filename> This commands invokes a wrapper ruby script which converts the input file for the format needed by the underlying C program. In Linux, the script also determines the number of available processor cores and then runs the C program with the maximum available number of cores. The output is a file with a name <filename>-kemp.txt or <filename>-basu.txt. The aforementioned example will generate the following output when executed with kemp: TEST T1 T2 pvalue alpha score category tag1 1 3 0.625886 0.0337644 0 U tag2 7 21 0.012574 0.014972 1.60165 D tag3 10 30 0.002223 0.0130005 8.29007 D The output reproduces input data, plus some extra columns:
The output of Basu program is: TEST T1 T2 evalue ev ie tag1 1 3 0.61952 2.3368e-05 tag2 7 21 0.033387 4.5486e-06 tag3 10 30 0.0077514 3.1912e-06 Again, the first columns correspond to the original data and the extra columns are:
EXAMPLESThe directory examples, contains a test file, GSE6677-clean.dat.gz, compressed with gzip program. To test Kemp and Basu, using this example file, type: gunzip examples/GSE6677-clean.dat.gz kemp examples/GSE6677-clean.dat The other files, with a .mat extension, are formatted for the low-level programs kemp.bin and basu.bin (described below). AUXILIARY SCRIPTSThe package is provided with 3 Ruby scripts:
RUNNING LOW LEVELThe command line options of kemp.bin are kemp.bin [OPTION...] <matrix name> Help Options: -?, --help Show help options Application Options: --save-temp Save per thread temporary results (for debug) -c, --cutoff-pars=file Parameters of cutoff function -n, --nprocs=N Number of processors And for basu.bin is: basu.bin [OPTION...] <matrix name> Help Options: -?, --help Show help options Application Options: --save-temp Save per thread temporary results (for debug) -n, --nprocs=N Number of processors The input matrix is formatted as: M+1 k S1 S2 ... Sk X11 X12 ... X1k ... ... ... ... XM1 XM2 .. XMk M+1 is the number of rows in the file, and k is the number of columns. For example, the content of file examples/test4.mat is: 6 3 5929 7460 592 144 221 14 397 404 40 200 250 20 2000 2500 200 20 100 2 The output of kemp.bin is stored in the file <filename>-kemp. It contains solely program's results in the same tag order of the input file: pvalue alpha score category 0.157513 0.00155825 0 U 0.008553 0.000720078 0 U 0.996594 0.00126509 0 U 0.974522 0.000136022 0 U 0 0.00467107 10 D The output file of basu.bin is <filename>-basu, the content is: evalue ev ie 0.25438 5.4679e-05 0.016736 5.8991e-05 0.99991 3.268e-08 0.99289 1.8689e-06 0 6.2679e-05 KEMP CUTOFF FUNCTION PARAMETERSTwo set of cutoff function parameters are provided. The file kemp.pars contains the values calculated for weights (a=4,b=1), whereas file kemp11.pars contains the values for weights (a=4,b=1). If no parameter file is informed in the command line, Kemp will search for the file in this following locations: $HOME/.kempbasu/kemp.pars $PWD/kemp.pars /etc/kemp.pars /usr/local/etc/kemp.pars KEMPBASU LIBRARYAll the code for calculating the significance levels is in the library kempbasu.so. Other programs can be linked to this library to directly use call the Kemp and Basu functions. A binding to a script language can be also be done. However, the API still needs a clean up and will be changed in future. The documentation about KempBasu API will be provided just after this API refactoring. | |||||||||||||||||||||||||||||||||||||||||||