Setup a MapReduce Program in Cloudera

- September 18, 2015

Setup a MapReduce Program in Cloudera

1. Download the Cloudera VMWare from below URL

http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms/cdh-5-4-x.html

The downloaded Cloudera VMware has included

CentOS

Hadoop setup (with all libraries)

Java

Eclipse IDE

2. Download the VMware player to open downloaded Cloudera virtual machine( use this below URL to download the VMware player)

https://www.vmware.com/products/player

3. Open the eclipse which is located on the desktop of our VM CentOS (The CentOS was loaded
by VMware player using Cloudera)

4. Create a new java project in eclipse (e.g: MapReduce)

5. Add all the Hadoop related dependency libraries need to be added to our project from below location (You can add all libraries by using java build path, it can be achieved by right-clicking the project and go to java build path and add external jar in library tab)

Library Path: /usr/lib/hadoop/client

6. Create packages to each and every problem (MapReduce problem can be solved by three methods, which are pair, strip, and hybrid approaches)

e.g:Package names are com.mapreduce.pair, com.mapreduce.straip, com.mapreduce.hybrid

7. Create the MainClass, Mapper, Partitioner and Reducer classes for each every problem with appropriate naming convention.

e.g: PairMain, PairMapper, PairPartitioner, PairReducer

8. Implement all the classes with necessary extends and implements.

9. Create a HDFS input file with given input (e.g: input.txt)

10. Create an HDFS input file in terminal by using below Linux command

> >hadoop fs -put <inputfile path> <HDFS file path>

e.g: >> hadoop fs -put input.txt hdfsinput.txt

Run the MapReduce program by using Linux command in Terminal

1. In eclipse, right click on our project and export the jar file with a specific name (project.jar)

2. Check our HDFS input file by using Linux command which was created in above step 10 >> hadoop fs -ls

3. Execute the jar file with created HDFS input by using below Linux command

General syntax:
>>hadoop jar <jarfile> <Main class with package> <inputHDFS file> <output name>

PairMain class execution:

>>hadoop jar project.jar com/ranjith/pair/PairMain projectInput.txt pairoutput

StraipMain class execution:

>>hadoop jar project.jar com/ranjith/straip/StraipMain projectInput.txt straipoutput

HybridMain class execution

>>hadoop jar project.jar com/ranjith/hybrid/HybridMain projectInput.txt hybridoutput

4. We can view the corresponding output in the terminal by below Linux command:

>> hadoop fs -cat pairoutput

5. Also, we can see the job status, all HDFS files, and outputs in-browser GUI by hitting below URLs

http://localhost:88888 (Hue)

http://localhost:50070

6. A deployed application job running status in Hue

7. HDFS files and output in Hue: Once you see the output file (e.g: pair output) you can click on the output file, there is a file with name Part_0000, by clicking this file you can see the output in the same browser.

8. The same HDFS files and output in http://localhost:50070 as well.

GitHub URL: https://github.com/RanjithSubraman/HadoopMapReduce

Search This Blog

I Know, Do YOU Know ?