Here are the steps required:
1) Download Nutch 0.9 from SVN Repository 
2) Create a Java Project in MyEclipse
  a) choose "Create Project from existing source" ( Eclipse will scan all the folders which contain   java files and make them source folders) 
  b) Go to the third Tab "Libraries"  . CLick on the "Add Class Folder" button and check the   conf folder . 
  c) Go to the fourth Tab "Order and Export" move the "conf" folder up in order to te first   position
3) Configure Nutch 
  a) change the property "plugin.folders" to "./src/plugin" on $NUTCH_HOME/conf/nutch-  default.xml
  b) in the cong folder do the following steps:
    * Rename crawl-urlfilter.txt.template to  crawl-urlfilter.txt
    * Rename automaton-urlfilter.txt.template to automaton-urlfilter.txt
    * In crawl-urlfilter.txt replace 
       +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/ with 
      +^http://([a-z0-9]*\.)*org.apache.com/
    * Create a urls folder. Add a file urls.txt with seed urls to crawl
  c) Edit nutch-site.xml and add the following
  <property>
  <name>http.agent.name</name>
  <value></value>
  <description>HTTP 'User-Agent' request header. MUST NOT be empty - 
  please set this to a single word uniquely related to your organization.
  NOTE: You should also check other related properties:
 http.robots.agents
 http.agent.description
 http.agent.url
 http.agent.email
 http.agent.version
  and set their values appropriately.
  </description>
</property>
<property>
  <name>http.agent.description</name>
  <value></value>
  <description>Further description of our bot- this text is used in
  the User-Agent header.  It appears in parenthesis after the agent name.
  </description>
</property>
<property>
  <name>http.agent.url</name>
  <value></value>
  <description>A URL to advertise in the User-Agent header.  This will 
   appear in parenthesis after the agent name. Custom dictates that this
   should be a URL of a page explaining the purpose and behavior of this
   crawler.
  </description>
</property>
<property>
  <name>http.agent.email</name>
  <value></value>
  <description>An email address to advertise in the HTTP 'From' request
   header and User-Agent header. A good practice is to mangle this
   address (e.g. 'info at example dot com') to avoid spamming.
  </description>
</property>
4) Create Eclipse Launcher
Menu Run > "Run..."
create "New" for "Java Application"
set in Main class
org.apache.nutch.crawl.Crawl
on tab Arguments, Program Arguments
urls -dir crawl -depth 3 -topN 50
in VM arguments
-Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log
click on "Run"
if all works, you should see Nutch getting busy at crawling

