com.norconex.collector.http.handler.impl
Class DefaultURLExtractor

java.lang.Object
  extended by com.norconex.collector.http.handler.impl.DefaultURLExtractor
All Implemented Interfaces:
IURLExtractor, IXMLConfigurable, Serializable

public class DefaultURLExtractor
extends Object
implements IURLExtractor, IXMLConfigurable

Default implementation of IURLExtractor.

XML configuration usage (not required since default):

  <urlExtractor class="com.norconex.collector.http.handler.DefaultURLExtractor">
      <maxURLLength>
          (Optional maximum URL length.  Longer URLs won't be extracted.
           Default is 2048.)
      </maxURLLength>
  </urlExtractor>
 

Author:
Pascal Essiembre
See Also:
Serialized Form

Field Summary
static int DEFAULT_MAX_URL_LENGTH
           
 
Constructor Summary
DefaultURLExtractor()
           
 
Method Summary
 Set<String> extractURLs(Reader document, String documentUrl, ContentType contentType)
          Extracts URLs out of a document.
 int getMaxURLLength()
           
 void loadFromXML(Reader in)
           
 void saveToXML(Writer out)
           
 void setMaxURLLength(int maxURLLength)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_MAX_URL_LENGTH

public static final int DEFAULT_MAX_URL_LENGTH
See Also:
Constant Field Values
Constructor Detail

DefaultURLExtractor

public DefaultURLExtractor()
Method Detail

extractURLs

public Set<String> extractURLs(Reader document,
                               String documentUrl,
                               ContentType contentType)
                        throws IOException
Description copied from interface: IURLExtractor
Extracts URLs out of a document.

Specified by:
extractURLs in interface IURLExtractor
Parameters:
document - the document
documentUrl - document url
contentType - the document content type
Returns:
a set of URLs
Throws:
IOException - problem reading the document

getMaxURLLength

public int getMaxURLLength()

setMaxURLLength

public void setMaxURLLength(int maxURLLength)

loadFromXML

public void loadFromXML(Reader in)
Specified by:
loadFromXML in interface IXMLConfigurable

saveToXML

public void saveToXML(Writer out)
               throws IOException
Specified by:
saveToXML in interface IXMLConfigurable
Throws:
IOException


Copyright © 2009-2013 Norconex Inc.. All Rights Reserved.