org.archive.crawler.datamodel
Class RobotsExclusionPolicy

java.lang.Object
  extended by org.archive.crawler.datamodel.RobotsExclusionPolicy
All Implemented Interfaces:
java.io.Serializable

public class RobotsExclusionPolicy
extends java.lang.Object
implements java.io.Serializable

RobotsExclusionPolicy represents the actual policy adopted with respect to a specific remote server, usually constructed from consulting the robots.txt, if any, the server provided. (The similarly named RobotsHonoringPolicy, on the other hand, describes the strategy used by the crawler to determine to what extent it respects exclusion rules.) The expiration of policies after a suitable amount of time has elapsed since last fetch is handled outside this class, in CrawlServer itself.

Author:
gojomo
See Also:
Serialized Form

Field Summary
static RobotsExclusionPolicy ALLOWALL
           
static RobotsExclusionPolicy DENYALL
           
(package private)  RobotsHonoringPolicy honoringPolicy
           
 
Constructor Summary
RobotsExclusionPolicy(CrawlerSettings settings, java.util.LinkedList<java.lang.String> u, java.util.HashMap<java.lang.String,java.util.List<java.lang.String>> d, RobotsHonoringPolicy honoringPolicy)
           
RobotsExclusionPolicy(int type)
           
 
Method Summary
 boolean disallows(CrawlURI curi, java.lang.String userAgent)
           
static RobotsExclusionPolicy policyFor(CrawlerSettings settings, java.io.BufferedReader reader, RobotsHonoringPolicy honoringPolicy)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ALLOWALL

public static RobotsExclusionPolicy ALLOWALL

DENYALL

public static RobotsExclusionPolicy DENYALL

honoringPolicy

transient RobotsHonoringPolicy honoringPolicy
Constructor Detail

RobotsExclusionPolicy

public RobotsExclusionPolicy(CrawlerSettings settings,
                             java.util.LinkedList<java.lang.String> u,
                             java.util.HashMap<java.lang.String,java.util.List<java.lang.String>> d,
                             RobotsHonoringPolicy honoringPolicy)
Parameters:
settings -
u -
d -
honoringPolicy -

RobotsExclusionPolicy

public RobotsExclusionPolicy(int type)
Method Detail

policyFor

public static RobotsExclusionPolicy policyFor(CrawlerSettings settings,
                                              java.io.BufferedReader reader,
                                              RobotsHonoringPolicy honoringPolicy)
                                       throws java.io.IOException
Parameters:
settings -
reader -
honoringPolicy -
Returns:
Robot exclusion policy.
Throws:
java.io.IOException

disallows

public boolean disallows(CrawlURI curi,
                         java.lang.String userAgent)


Copyright © 2003-2008 Internet Archive. All Rights Reserved.