Setup to execute Apache Spark in Cloudera

Setup to execute Apache Spark in Cloudera


Before proceeding the below steps, we should install the Cloudera and be using it by a VMware player. The Cloudera has included Hadoop, Spark, Java, Eclipse and all required libraries.

Please refer my previous post to setup the Hadoop: 

1. Create a Maven project in eclipse


2. Edit the pom.xml with required spark dependencies.

<dependencies>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.2.0</version>
  </dependency>
</dependencies>

<build>
  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-compiler-plugin</artifactId>
      <version>3.1</version>
    </plugin>
  </plugins>
</build>

3. Add all Spark dependency libraries from /urs/lib/spark/lib

4. Write the java code by using Spark library classes
    e.g: WordCount  

5. Execute the Maven project in Linux terminal
>> mvn package
  • open the command line and move to the root maven project with "cd /<path to the project root>"
  • execute the command "mvn package". Maven must be installed on the system path, otherwise, the command mvn will not be recognized. Refer to the maven documentation on how to set up maven properly.
  • maven will build the java file and save it on the target directory /<path to the project root>/SparkWordCount-0.0.1-SNAPSHOT.jar
6. Create an input.txt and convert it to HDFS by using Hadoop commands
    >> hadoop fs -put input.txt inputHDFS.txt

7. View and verify created HDFS input.txt
    >>hadoop fs -ls

8. Execute the spark project
    >> spark-submit --class <project package name>.<project main class> --master local[2] <.jar file path> <input file> <output file>

    >> spark-submit --class mum.edu.SparkWordCount.App --master local[2]      target/SparkWordCount-0.0.1-SNAPSHOT.jar inputHDFS.txt output

9. View the output in our Hue
    http://quickstart.cloudera:8888/filebrowser/view/user/cloudera

The output will appear on part-00000 file


Comments

  1. It is a great post. Keep sharing such kind of noteworthy information.

    Spark Training in Chennai | Spark Course in Chennai

    ReplyDelete

Post a Comment

Popular posts from this blog

Programmatically turn ON/OFF NFC in Android

Sign-on by using Google OAuth2

Creating JSON REFS service by using Spring MVC