3

I have started a little something on hadoop. It is setup and running properly. Right now I am doing a single node / stand alone cluster. I am trying to run sample job as mentioned on http://hadoop.apache.org/common/docs/r0.18.3/mapred_tutorial.html

So far, program is correctly compiled, jar has been created, manifest added successfully. But when I try to run the job I get this error.

Exception in thread "main" java.lang.ClassNotFoundException: org.myorg.WordCount
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

I have copy pasted the exact same program as mentioned in link. But it's giving this error. This is the command line I hit.

[shantanu@shades1ld1 hadoop]$ bin/hadoop jar /home/shantanu/hadoop/src/examples/wordcount.jar org.myorg.WordCount /tmp/Hadoop_Jobs/ /tmp/Hadoop_Results

I have gone through numerous articles, but couldn't find an explanation for this, please help.

2 Answers 2

11

I found that I needed to add this to the sample app to get hadoop to know what jar my class files are in.

diff --git a/src/org/myorg/WordCount.java b/src/org/myorg/WordCount.java
index 912311a..8cc1b93 100644
--- a/src/org/myorg/WordCount.java
+++ b/src/org/myorg/WordCount.java
@@ -43,7 +43,8 @@ public class WordCount {
  public static void main(String[] args) throws Exception {
     Configuration conf = new Configuration();

     Job job = new Job(conf, "wordcount");
+    job.setJarByClass(WordCount.class);

     job.setOutputKeyClass(Text.class);
     job.setOutputValueClass(IntWritable.class);

I'm not sure if this is new for hadoop or not, but setJarByClass will tell hadoop to use an entire jar based on a single class that is contained in that jar file. The jar must still be in your classpath. This is the command that I ran:

hadoop jar wordcount.jar org.myorg.WordCount /usr/$USER/wordcount/input /usr/$USER/wordcount/output

And I'd previously copied the sample files into the HDFS using this command:

hadoop dfs -copyFromLocal input/file01 /usr/$USER/wordcount/input/file01 
hadoop dfs -copyFromLocal input/file02 /usr/$USER/wordcount/input/file02 
hadoop dfs -ls /usr/$USER/wordcount/input

where input/file01:

Hello World Bye World

and input/file02:

Hello Hadoop Goodbye Hadoop

I put up a github repo with instructions on what I was able to get working.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, It is worth to update hadoop wiki wiki.apache.org/hadoop/WordCount. I will trust that google will find wiki link and this answer for next guy.
At 4:30 am, you saved me :D
7

Are you sure that wordcount.jar contains the org.myorg.WordCount class?

Didn't you modify the package name?

5 Comments

can you please elaborate? I am new in Java too. As an update, I removed the package org.myorg; statement. Now it is a single class. Still it gives me Exception in thread "main" java.lang.ClassNotFoundException: WordCount. Help !!
That's the problem. You should not remove the package name, or in the command line you should refer without it.
i have removed org.myorg.WordCount and made it WordCount, only
Deleted all setup and made new one. Suddenly everything started working. No changes were made except that I had not entered hadoop.tmp.dir in core-sites.xml. I think that was the problem. What do you think? And thanks for your help.
I had a similar problem, caused by my not following the directions carefully, specifically, deciding to not use the -d option. Doh!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.