Web-Harvest

Java web data extraction and scraping framework. XML config; XPath, XQuery, regex; HTTP client, HTML/XML parsing, plugin system. CLI and web IDE; outputs WARC-style and structured data.

Metadata
Category: Crawlers
License:Apache-2.0
Sponsored Ad