Nutch
Extensible, scalable web crawler for production use. Pluggable parsing (e.g. Apache Tika), indexing (Solr, Elasticsearch), and batch processing via Apache Hadoop. Supports custom parsers and scoring.
Metadata
Sponsored Ad
Extensible, scalable web crawler for production use. Pluggable parsing (e.g. Apache Tika), indexing (Solr, Elasticsearch), and batch processing via Apache Hadoop. Supports custom parsers and scoring.