org.archive.crawler.datamodel
Class RobotsExclusionPolicy
java.lang.Object
org.archive.crawler.datamodel.RobotsExclusionPolicy
- All Implemented Interfaces:
- java.io.Serializable
public class RobotsExclusionPolicy
- extends java.lang.Object
- implements java.io.Serializable
RobotsExclusionPolicy represents the actual policy adopted with
respect to a specific remote server, usually constructed from
consulting the robots.txt, if any, the server provided.
(The similarly named RobotsHonoringPolicy, on the other hand,
describes the strategy used by the crawler to determine to what
extent it respects exclusion rules.)
The expiration of policies after a suitable amount of time has
elapsed since last fetch is handled outside this class, in
CrawlServer itself.
- Author:
- gojomo
- See Also:
- Serialized Form
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ALLOWALL
public static RobotsExclusionPolicy ALLOWALL
DENYALL
public static RobotsExclusionPolicy DENYALL
honoringPolicy
transient RobotsHonoringPolicy honoringPolicy
RobotsExclusionPolicy
public RobotsExclusionPolicy(CrawlerSettings settings,
java.util.LinkedList<java.lang.String> u,
java.util.HashMap<java.lang.String,java.util.List<java.lang.String>> d,
RobotsHonoringPolicy honoringPolicy)
- Parameters:
settings
- u
- d
- honoringPolicy
-
RobotsExclusionPolicy
public RobotsExclusionPolicy(int type)
policyFor
public static RobotsExclusionPolicy policyFor(CrawlerSettings settings,
java.io.BufferedReader reader,
RobotsHonoringPolicy honoringPolicy)
throws java.io.IOException
- Parameters:
settings
- reader
- honoringPolicy
-
- Returns:
- Robot exclusion policy.
- Throws:
java.io.IOException
disallows
public boolean disallows(CrawlURI curi,
java.lang.String userAgent)
Copyright © 2003-2008 Internet Archive. All Rights Reserved.