Heritrix
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Metadata
Category: Crawlers
License: GNU Library or Lesser General Public License (LGPL)
Homepage: http://crawler.archive.org/
Sponsored Ad