Nutch

Extensible, scalable web crawler for production use. Pluggable parsing (e.g. Apache Tika), indexing (Solr, Elasticsearch), and batch processing via Apache Hadoop. Supports custom parsers and scoring.

Metadata
Sponsored Ad