Setup to execute Apache Spark in Cloudera
Setup to execute Apache Spark in Cloudera
Before proceeding the below steps, we should install the Cloudera and be using it by a VMware player. The Cloudera has included Hadoop, Spark, Java, Eclipse and all required libraries.
Please refer my previous post to setup the Hadoop:
1. Create a Maven project in eclipse
2. Edit the pom.xml with required spark dependencies.
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
</plugin>
</plugins>
</build>
3. Add all Spark dependency libraries from /urs/lib/spark/lib
4. Write the java code by using Spark library classes
e.g: WordCount
5. Execute the Maven project in Linux terminal
>> mvn package
- open the command line and move to the root maven project with "cd /<path to the project root>"
- execute the command "mvn package". Maven must be installed on the system path, otherwise, the command mvn will not be recognized. Refer to the maven documentation on how to set up maven properly.
- maven will build the java file and save it on the target directory /<path to the project root>/SparkWordCount-0.0.1-SNAPSHOT.jar
6. Create an input.txt and convert it to HDFS by using Hadoop commands
>> hadoop fs -put input.txt inputHDFS.txt
7. View and verify created HDFS input.txt
>>hadoop fs -ls
8. Execute the spark project
>> spark-submit --class <project package name>.<project main class> --master local[2] <.jar file path> <input file> <output file>
>> spark-submit --class mum.edu.SparkWordCount.App --master local[2] target/SparkWordCount-0.0.1-SNAPSHOT.jar inputHDFS.txt output
9. View the output in our Hue
http://quickstart.cloudera:8888/filebrowser/view/user/cloudera
The output will appear on part-00000 file
It is a great post. Keep sharing such kind of noteworthy information.
ReplyDeleteSpark Training in Chennai | Spark Course in Chennai