How to write Hadoop MR programs using IntelliJ
We will use Cloudera distributions and use maven to define all the dependencies
Though I am using IntelliJ, you could use Eclipse and create a similar project.
1. Create new Project in IntelliJ
You can use maven archetype to create this project. Use the following archtype
archetypeGroupId=org.apache.maven.archetypes
archetypeArtifactId=maven-archetype-quickstart
Use atleast Java 7. This will generate a maven project with pom.xml
2. Add Cloudera repo in the pom.xml
<repositories>
<repository>
<id>cloudera-releases</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
3. Add Hadoop dependencies for writing a client project
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
At time of writing this article, the latest Cloudera Hadoop Client version is 2.6.0-mr1-cdh5.5.0-SNAPSHOT
You might also want to add Maven Central Repo
http://repo1.maven.org/maven2
The project should build and you should have all the dependencies downloaded.
Now you are good to go.. write all the MapReduce programs you know :)