org.archive.crawler.processor.recrawl
Class PersistProcessor

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.framework.Processor
                      extended by org.archive.crawler.processor.recrawl.PersistProcessor
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean
Direct Known Subclasses:
PersistLogProcessor, PersistOnlineProcessor

public abstract class PersistProcessor
extends Processor

Superclass for Processors which utilize BDB-JE for URI state (including most notably history) persistence.

Author:
gojomo
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
static java.lang.String URI_HISTORY_DBNAME
          name of history Database
 
Fields inherited from class org.archive.crawler.framework.Processor
ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Constructor Summary
PersistProcessor(java.lang.String name, java.lang.String string)
          Usual constructor
 
Method Summary
protected static com.sleepycat.je.DatabaseConfig historyDatabaseConfig()
           
static void main(java.lang.String[] args)
          Utility main for importing a log into a BDB-JE environment or moving a database between environments (2 arguments), or simply dumping a log to stdout in a more readable format (1 argument).
 java.lang.String persistKeyFor(CrawlURI curi)
          Return a preferred String key for persisting the given CrawlURI's AList state.
protected  boolean shouldLoad(CrawlURI curi)
          Whether the current CrawlURI's state should be loaded
protected  boolean shouldStore(CrawlURI curi)
          Whether the current CrawlURI's state should be persisted (to log or direct to database)
 
Methods inherited from class org.archive.crawler.framework.Processor
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, initialTasks, innerProcess, innerRejectProcess, isContentToProcess, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement, listUsedFiles
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

URI_HISTORY_DBNAME

public static final java.lang.String URI_HISTORY_DBNAME
name of history Database

See Also:
Constant Field Values
Constructor Detail

PersistProcessor

public PersistProcessor(java.lang.String name,
                        java.lang.String string)
Usual constructor

Parameters:
name -
string -
Method Detail

historyDatabaseConfig

protected static com.sleepycat.je.DatabaseConfig historyDatabaseConfig()
Returns:
DatabaseConfig for history Database

persistKeyFor

public java.lang.String persistKeyFor(CrawlURI curi)
Return a preferred String key for persisting the given CrawlURI's AList state.

Parameters:
curi - CrawlURI
Returns:
String key

shouldStore

protected boolean shouldStore(CrawlURI curi)
Whether the current CrawlURI's state should be persisted (to log or direct to database)

Parameters:
curi - CrawlURI
Returns:
true if state should be stored; false to skip persistence

shouldLoad

protected boolean shouldLoad(CrawlURI curi)
Whether the current CrawlURI's state should be loaded

Parameters:
curi - CrawlURI
Returns:
true if state should be loaded; false to skip loading

main

public static void main(java.lang.String[] args)
                 throws com.sleepycat.je.DatabaseException,
                        java.io.IOException
Utility main for importing a log into a BDB-JE environment or moving a database between environments (2 arguments), or simply dumping a log to stdout in a more readable format (1 argument).

Parameters:
args - command-line arguments
Throws:
com.sleepycat.je.DatabaseException
java.io.IOException


Copyright © 2003-2008 Internet Archive. All Rights Reserved.