My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
FAQ  
Frequently Asked Questions
Updated Feb 22, 2011 by hong.tang@gmail.com

Frequently Asked Questions

Questions:

  • What versions of Hadoop does hadoop-gpl-compression support?
  • hadoop-gpl-compression supports Hadoop 0.20, 0.21, and 0.22 (the current trunk). Due to the divergence of API, we branched out branch-0.1 which supports Hadoop 0.20, and the current trunk supports Hadoop 0.21 and 0.22.

  • How do I build 32 bit and 64 bit binaries?
  • First ensure you have checked out the correct code base (trunk for Hadoop 0.21/0.22, and branches/branch-0.1 for Hadoop 0.20). Next, you need to compile twice, once with the appropriate variables set.
    export JAVA_HOME=/path/to/32bit/jdk
    export CFLAGS=-m32
    export CXXFLAGS=-m32
    ant compile-native
    
    export JAVA_HOME=/path/to/64bit/jdk
    export CFLAGS=-m64
    export CXXFLAGS=-m64
    ant compile-native tar
    Note that you must have both 32-bit and 64-bit liblzo2 installed. This is how it looks like on my RedHat build machine:
    % ls -l /usr/lib*/liblzo2*
    -rw-r--r--  1 root root 171056 Mar 20  2006 /usr/lib/liblzo2.a
    lrwxrwxrwx  1 root root     16 Feb 17  2007 /usr/lib/liblzo2.so -> liblzo2.so.2.0.0*
    lrwxrwxrwx  1 root root     16 Feb 17  2007 /usr/lib/liblzo2.so.2 -> liblzo2.so.2.0.0*
    -rwxr-xr-x  1 root root 129067 Mar 20  2006 /usr/lib/liblzo2.so.2.0.0*
    -rw-r--r--  1 root root 208494 Mar 20  2006 /usr/lib64/liblzo2.a
    lrwxrwxrwx  1 root root     16 Feb 17  2007 /usr/lib64/liblzo2.so -> liblzo2.so.2.0.0*
    lrwxrwxrwx  1 root root     16 Feb 17  2007 /usr/lib64/liblzo2.so.2 -> liblzo2.so.2.0.0*
    -rwxr-xr-x  1 root root 126572 Mar 20  2006 /usr/lib64/liblzo2.so.2.0.0*
  • How do I configure Hadoop to use these classes?
  • Generally, using these classes is no different from using any classes from a third party jar: (1) make sure the jar file is in the class path; (2) make sure the depending dynamic libraries' paths are in the system property java.library.path; (3) use the classes provided by the jar file. There are various ways to do the above. The following is the approach I took by placing the jar files and native libraries in the right place and let hadoop script to do (1) and (2):
    # Build the jar file and native library:
    cd /path/to/hadoop-gpl-compression
    ant compile-native tar
    # Copy the jar file
    cp build/hadoop-gpl-compression-0.1.0-dev/hadoop-gpl-compression-0.1.0-dev.jar /path/to/hadoop/dist/lib/
    # Copy the native library
    tar -cBf - -C build/hadoop-gpl-compression-0.1.0-dev/lib/native . | tar -xBvf - -C /path/to/hadoop/dist/lib/native
    Additional steps are needed to add entries to hadoop configuration file to register the external codecs in the codec factory. Add the following key/value pairs into hadoop-site.xml (or core-site.xml):
      <property>
        <name>io.compression.codecs</name>
        <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
      </property>
      <property>
        <name>io.compression.codec.lzo.class</name>
        <value>com.hadoop.compression.lzo.LzoCodec</value>
      </property>
    If you would like to use lzo to compress intermediate map output, set the following in hadoop-site.xml:
      <property>
        <name>mapred.compress.map.output</name>
        <value>true</value>
      </property>
      <property>
        <name>mapred.map.output.compression.codec</name>
        <value>com.hadoop.compression.lzo.LzoCodec</value>
      </property>
    Or if you are using Hadoop 0.21 or later, set the following in mapred-site.xml:
      <property>
        <name>mapreduce.map.output.compress</name>
        <value>true</value>
      </property>
      <property>
        <name>mapreduce.map.output.compress.codec</name>
        <value>com.hadoop.compression.lzo.LzoCodec</value>
      </property>
  • How do I build hadoop-gpl-compression on Mac OS X 10.5 (Leopard)? (Note: some instructions below are based on http://wiki.apache.org/hadoop/UsingLzoCompression.)
  • If you choose the route of manual build:
    • Download lzo2 source from http://www.oberhumer.com/opensource/lzo/download/.
    • Unpack source tarball, configure/build/install lzo2 with the following commands:
    •   tar -xzf lzo-2.03.tar.gz
        cd lzo-2.03
        env CFLAGS="-arch x86_64" ./configure --build=x86_64-darwin --enable-shared --disable-asm --prefix=/path/to/lzo64/
        make; make install
    If you want to use macports, as root, do the following:
      port fetch lzo2 # if lzo2 is already installed, do "port uninstall lzo2"
      port edit lzo2 # the Portfile for lzo2 will be opened in your $EDITOR.
    ## Add the following block of text in the file and save the file. ##
    variant x86_64 description "Build the 64-bit." {
        configure.args-delete     --build=x86-apple-darwin ABI=standard
        configure.cflags-delete   -m32
        configure.cxxflags-delete -m32
    
        configure.args-append     --build=x86_64-apple-darwin ABI=64
        configure.cflags-append   -m64 -arch x86_64
        configure.cxxflags-append -m64 -arch x86_64
    }
    ## END ##
      port install lzo2 +x86_64
    Now the 64-bit lzo2 library will be installed under /opt/local/lib.
    • Finally, build hadoop-gpl-compression library with the following:
    •   env JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/ \
        C_INCLUDE_PATH=/path/to/lzo64/include LIBRARY_PATH=/path/to/lzo64/lib \
        CFLAGS="-arch x86_64" ant clean compile-native test tar
    In the above, substitute /path/to/lzo64 with /opt/local if you install lzo2 through macports. With a bit luck, you should see BUILD SUCCESSFUL at the end. Congratulation, now you can use LZO compression in your java program on Mac OS X 10.5 (Leopard)!

Sign in to add a comment
Powered by Google Project Hosting