com.norconex.collector.http.handler
Interface IURLExtractor
- All Superinterfaces:
- Serializable
- All Known Implementing Classes:
- DefaultURLExtractor
public interface IURLExtractor
- extends Serializable
Responsible for extracting URLs out of a document.
- Author:
- Pascal Essiembre
extractURLs
Set<String> extractURLs(Reader document,
String documentUrl,
ContentType contentType)
throws IOException
- Extracts URLs out of a document.
- Parameters:
document
- the documentdocumentUrl
- document urlcontentType
- the document content type
- Returns:
- a set of URLs
- Throws:
IOException
- problem reading the document
Copyright © 2009-2013 Norconex Inc.. All Rights Reserved.