Google Code offered in: English - Español - 日本語 - 한국어 - Português - Pусский - 中文(简体) - 中文(繁體)
Setting up a Hadoop cluster can be an all day job. However, if you want to experiment with the platform right now, we have created a virtual machine image with a preconfigured single node instance of Hadoop. While this doesn't have the power of a full cluster, it does allow you to use the resources on your local machine to explore the Hadoop platform and run simple MapReduce jobs - like the labs referenced in Problem Solving on Large Scale Clusters.
The virtual machine image
is designed to be used with the free VMware Player
.
The image is packaged as a directory archive. To begin set up deflate the image in the directory of your choice (you need at least 10GB, the disk image can grow to 20GB). The VMware image package contains:
image.vmx -- The VMware guest OS profile, a configuration file that describes the virtual machine characteristics (virtual CPU(s), amount of memory, etc.).20GB.vmdk -- A VMware virtual disk used to store the content of the virtual machine hard disk; this file grows as you store data on the virtual image. It is configured to store up to 20GB.The system image is based on Ubuntu (version 7.04) and contains a Java machine (Sun JRE 6 - DLJ License v1.1) and the latest Hadoop distribution (as of this writing, 0.13.0).
To start the VMware Virtual Machine, go to the directory where the packaged files are deflated, and run:
vmplayer image.vmx
A new window will appear which will print a message indicating the IP address allocated to the guest OS. This is the IP address you will use to submit jobs from the command line or the Eclipse environment. The guest OS contains a running Hadoop infrastructure which is configured with:
The guest OS can be reached from the provided console or via SSH using the IP address indicated above. Log into the guest OS with:
guest, guest password: guestroot, administrator password: rootOnce the image is loaded, you can log in with the guest account. Hadoop will be installed in the guest home directory(/home/guest/hadoop). Three scripts are provided for Hadoop maintenance purposes:
start-hadoop -- Starts file-system and MapReduce daemons.stop-hadoop -- Stops all Hadoop daemons.reset-hadoop -- Restarts new Hadoop environment with entirely empty file system. note: You must stop all daemons before you reset.The Hadoop configuration can be edited by modifying the files in /home/guest/hadoop-conf/ For further information on this go to Hadoop Wiki
.
To stop the Virtual Machine, log in as administrator and issue:
poweroffTo run MapReduce programs from the command line, log into the guest OS, and use the Hadoop Command line tool to manipulate HDFS files and MapReduce jobs. A set of simple MapReduce programs are included with the Hadoop Distribution in hadoop-examples.jar. For example, to run the MapReduce approximation of pi included with the example files with four map tasks, each computing ten thousand samples, issue:
hadoop jar hadoop-examples.jar pi 4 10000For more information see the Examples section on the Hadoop Wiki
. Alternatively Hadoop jobs can be run directly from Eclipse, see the next section for more information.
The IBM MapReduce Tools for Eclipse Plug-in is a robust plug-in that brings Hadoop support to the Eclipse platform. Features include server configuration, support for launching MapReduce jobs and browsing the distributed file system. This setup assumes that you are running Eclipse (version 3.3 or above) on your computer.
.
Window -> Open Perspective -> Other... -> Map/Reduce.
Window -> Show View -> Other... -> Map Reduce Tools -> Map Reduce Servers.
Once you have the plug-in working with Eclipse, you can add the new new Hadoop server by:
provided ip address (see above)/home/guest/hadoopguestguest (when prompted)