org.archive.crawler.processor.recrawl
Class PersistProcessor
java.lang.Object
javax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.framework.Processor
org.archive.crawler.processor.recrawl.PersistProcessor
- All Implemented Interfaces:
- java.io.Serializable, javax.management.DynamicMBean
- Direct Known Subclasses:
- PersistLogProcessor, PersistOnlineProcessor
public abstract class PersistProcessor
- extends Processor
Superclass for Processors which utilize BDB-JE for URI state
(including most notably history) persistence.
- Author:
- gojomo
- See Also:
- Serialized Form
Constructor Summary |
PersistProcessor(java.lang.String name,
java.lang.String string)
Usual constructor |
Method Summary |
protected static com.sleepycat.je.DatabaseConfig |
historyDatabaseConfig()
|
static void |
main(java.lang.String[] args)
Utility main for importing a log into a BDB-JE environment or moving a
database between environments (2 arguments), or simply dumping a log
to stdout in a more readable format (1 argument). |
java.lang.String |
persistKeyFor(CrawlURI curi)
Return a preferred String key for persisting the given CrawlURI's
AList state. |
protected boolean |
shouldLoad(CrawlURI curi)
Whether the current CrawlURI's state should be loaded |
protected boolean |
shouldStore(CrawlURI curi)
Whether the current CrawlURI's state should be persisted (to log or
direct to database) |
Methods inherited from class org.archive.crawler.framework.Processor |
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, initialTasks, innerProcess, innerRejectProcess, isContentToProcess, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn |
Methods inherited from class org.archive.crawler.settings.ComplexType |
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute |
Methods inherited from class org.archive.crawler.settings.Type |
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
getName |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
URI_HISTORY_DBNAME
public static final java.lang.String URI_HISTORY_DBNAME
- name of history Database
- See Also:
- Constant Field Values
PersistProcessor
public PersistProcessor(java.lang.String name,
java.lang.String string)
- Usual constructor
- Parameters:
name
- string
-
historyDatabaseConfig
protected static com.sleepycat.je.DatabaseConfig historyDatabaseConfig()
- Returns:
- DatabaseConfig for history Database
persistKeyFor
public java.lang.String persistKeyFor(CrawlURI curi)
- Return a preferred String key for persisting the given CrawlURI's
AList state.
- Parameters:
curi
- CrawlURI
- Returns:
- String key
shouldStore
protected boolean shouldStore(CrawlURI curi)
- Whether the current CrawlURI's state should be persisted (to log or
direct to database)
- Parameters:
curi
- CrawlURI
- Returns:
- true if state should be stored; false to skip persistence
shouldLoad
protected boolean shouldLoad(CrawlURI curi)
- Whether the current CrawlURI's state should be loaded
- Parameters:
curi
- CrawlURI
- Returns:
- true if state should be loaded; false to skip loading
main
public static void main(java.lang.String[] args)
throws com.sleepycat.je.DatabaseException,
java.io.IOException
- Utility main for importing a log into a BDB-JE environment or moving a
database between environments (2 arguments), or simply dumping a log
to stdout in a more readable format (1 argument).
- Parameters:
args
- command-line arguments
- Throws:
com.sleepycat.je.DatabaseException
java.io.IOException
Copyright © 2003-2008 Internet Archive. All Rights Reserved.