My favorites | English | Sign in

Faster apps faster - GWT 2.0 with Speed Tracer New!

Google Code University

Hadoop Virtual Image Documentation

Setting up a Hadoop cluster can be an all day job. However, if you want to experiment with the platform right now, we have created a virtual machine image with a preconfigured single node instance of Hadoop. While this doesn't have the power of a full cluster, it does allow you to use the resources on your local machine to explore the Hadoop platform and run simple MapReduce jobs - like the labs referenced in Problem Solving on Large Scale Clusters.

The virtual machine image is designed to be used with the free VMware Player .

Setting Up the Image

The image is packaged as a directory archive. To begin set up deflate the image in the directory of your choice (you need at least 10GB, the disk image can grow to 20GB). The VMware image package contains:

  • image.vmx -- The VMware guest OS profile, a configuration file that describes the virtual machine characteristics (virtual CPU(s), amount of memory, etc.).
  • 20GB.vmdk -- A VMware virtual disk used to store the content of the virtual machine hard disk; this file grows as you store data on the virtual image. It is configured to store up to 20GB.
  • The archive contains two other files, image.vmsd and nvram, these are not critical for running the image but are created by the VMware player on startup.
  • As you run the virtual machine log files (vmware-x.log) will be created.

The system image is based on Ubuntu (version 7.04) and contains a Java machine (Sun JRE 6 - DLJ License v1.1) and the latest Hadoop distribution (as of this writing, 0.13.0).

To start the VMware Virtual Machine, go to the directory where the packaged files are deflated, and run:

    vmplayer image.vmx

A new window will appear which will print a message indicating the IP address allocated to the guest OS. This is the IP address you will use to submit jobs from the command line or the Eclipse environment. The guest OS contains a running Hadoop infrastructure which is configured with:

  • A GFS (HDFS) infrastructure using a single data node (no replication)
  • A single MapReduce worker

The guest OS can be reached from the provided console or via SSH using the IP address indicated above. Log into the guest OS with:

  • guest log in: guest, guest password: guest
  • administrator log in: root, administrator password: root

Once the image is loaded, you can log in with the guest account. Hadoop will be installed in the guest home directory(/home/guest/hadoop). Three scripts are provided for Hadoop maintenance purposes:

  • start-hadoop -- Starts file-system and MapReduce daemons.
  • stop-hadoop -- Stops all Hadoop daemons.
  • reset-hadoop -- Restarts new Hadoop environment with entirely empty file system. note: You must stop all daemons before you reset.

The Hadoop configuration can be edited by modifying the files in /home/guest/hadoop-conf/ For further information on this go to Hadoop Wiki .

To stop the Virtual Machine, log in as administrator and issue:

  • poweroff

Running Jobs from the Command Line

To run MapReduce programs from the command line, log into the guest OS, and use the Hadoop Command line tool to manipulate HDFS files and MapReduce jobs. A set of simple MapReduce programs are included with the Hadoop Distribution in hadoop-examples.jar. For example, to run the MapReduce approximation of pi included with the example files with four map tasks, each computing ten thousand samples, issue:

  • hadoop jar hadoop-examples.jar pi 4 10000

For more information see the Examples section on the Hadoop Wiki . Alternatively Hadoop jobs can be run directly from Eclipse, see the next section for more information.

Setting up the Eclipse Environment

The IBM MapReduce Tools for Eclipse Plug-in is a robust plug-in that brings Hadoop support to the Eclipse platform. Features include server configuration, support for launching MapReduce jobs and browsing the distributed file system. This setup assumes that you are running Eclipse (version 3.3 or above) on your computer.

  • Download hadoop-eclipse-plugin.jar .
  • If Eclipse is open, close it before proceeding.
  • Place hadoop-eclipse-plugin.jar directly into the plugins/ directory of Eclipse.
  • Open Eclipse
  • To use the MapReduce perspective go to: Window -> Open Perspective -> Other... -> Map/Reduce.
  • To enable the MapReduce servers window go to: Window -> Show View -> Other... -> Map Reduce Tools -> Map Reduce Servers.

Once you have the plug-in working with Eclipse, you can add the new new Hadoop server by:

  • Enabling the MapReduce Server view.
  • Clicking the blue elephant in the top right.
  • In "New Haddop Server Location" window, complete the form with:
  • Hostname: provided ip address (see above)
  • Installation Directory: /home/guest/hadoop
  • Username: guest
  • Password: guest (when prompted)