com.norconex.collector.http.handler
Interface IURLExtractor

All Superinterfaces:
Serializable
All Known Implementing Classes:
DefaultURLExtractor

public interface IURLExtractor
extends Serializable

Responsible for extracting URLs out of a document.

Author:
Pascal Essiembre

Method Summary
 Set<String> extractURLs(Reader document, String documentUrl, ContentType contentType)
          Extracts URLs out of a document.
 

Method Detail

extractURLs

Set<String> extractURLs(Reader document,
                        String documentUrl,
                        ContentType contentType)
                        throws IOException
Extracts URLs out of a document.

Parameters:
document - the document
documentUrl - document url
contentType - the document content type
Returns:
a set of URLs
Throws:
IOException - problem reading the document


Copyright © 2009-2013 Norconex Inc.. All Rights Reserved.