Setup a MapReduce Program in Cloudera
Setup a MapReduce Program in Cloudera
1. Download the Cloudera VMWare from below URL
The downloaded Cloudera VMware has included
- CentOS
- Hadoop setup (with all libraries)
- Java
- Eclipse IDE
2. Download the VMware player to open downloaded Cloudera virtual machine( use this below URL to download the VMware player)
3. Open the eclipse which is located on the desktop of our VM CentOS (The CentOS was loaded
by VMware player using Cloudera)
by VMware player using Cloudera)
4. Create a new java project in eclipse (e.g: MapReduce)
5. Add all the Hadoop related dependency libraries need to be added to our project from below location (You can add all libraries by using java build path, it can be achieved by right-clicking the project and go to java build path and add external jar in library tab)
Library Path: /usr/lib/hadoop/client
6. Create packages to each and every problem (MapReduce problem can be solved by three methods, which are pair, strip, and hybrid approaches)
e.g:Package names are com.mapreduce.pair, com.mapreduce.straip, com.mapreduce.hybrid
7. Create the MainClass, Mapper, Partitioner and Reducer classes for each every problem with appropriate naming convention.
e.g: PairMain, PairMapper, PairPartitioner, PairReducer
8. Implement all the classes with necessary extends and implements.
9. Create a HDFS input file with given input (e.g: input.txt)
10. Create an HDFS input file in terminal by using below Linux command
> >hadoop fs -put <inputfile path> <HDFS file path>
e.g: >> hadoop fs -put input.txt hdfsinput.txt
Run the MapReduce program by using Linux command in Terminal
1. In eclipse, right click on our project and export the jar file with a specific name (project.jar)3. Execute the jar file with created HDFS input by using below Linux command
General syntax:
>>hadoop jar <jarfile> <Main class with package> <inputHDFS file> <output name>
PairMain class execution:>>hadoop jar <jarfile> <Main class with package> <inputHDFS file> <output name>
>>hadoop jar project.jar com/ranjith/pair/PairMain projectInput.txt pairoutput
StraipMain class execution:
>>hadoop jar project.jar com/ranjith/straip/StraipMain projectInput.txt straipoutput
HybridMain class execution
>>hadoop jar project.jar com/ranjith/hybrid/HybridMain projectInput.txt hybridoutput
4. We can view the corresponding output in the terminal by below Linux command:
>> hadoop fs -cat pairoutput
5. Also, we can see the job status, all HDFS files, and outputs in-browser GUI by hitting below URLs
http://localhost:88888 (Hue)
http://localhost:50070
6. A deployed application job running status in Hue
Comments
Post a Comment