Monday, June 29, 2015

Hadoop with Cloudera + Maven project in IntelliJ

How to write Hadoop MR programs using IntelliJ

We will use Cloudera distributions and use maven to define all the dependencies

Though I am using IntelliJ, you could use Eclipse and create a similar project.

1. Create new Project in IntelliJ

You can use maven archetype to create this project. Use the following archtype
archetypeGroupId=org.apache.maven.archetypes
archetypeArtifactId=maven-archetype-quickstart

Use atleast Java 7. This will generate a maven project with pom.xml

2. Add Cloudera repo in the pom.xml


<repositories>
      <repository>
        <id>cloudera-releases</id>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
        <releases>
          <enabled>true</enabled>
        </releases>
        <snapshots>
         <enabled>false</enabled>
        </snapshots>
      </repository>
    </repositories>


3. Add Hadoop dependencies for writing a client project


<dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>${hadoop.version}</version>
    </dependency>


At time of writing this article, the latest Cloudera Hadoop Client version is 2.6.0-mr1-cdh5.5.0-SNAPSHOT

You might also want to add Maven Central Repo
http://repo1.maven.org/maven2
4. Now try and build the project.
The project should build and you should have all the dependencies downloaded.
Now you are good to go.. write all the MapReduce programs you know :)

1 comment: