Open Source Search Engines in Java

15 projects

High-performance, full-featured text search engine library written in Java. Provides indexing and search, spellchecking, hit highlighting, and advanced analysis. The core of Apache Solr, Elasticsearch, and OpenSearch.

Details

Free full-text search engine for large document collections. Implements inverted-index compression and provides optimized classes for mutable strings, bit I/O, minimal perfect hashing, and indexing. LGPL-licensed.

Details

Extensible, scalable web crawler for production use. Pluggable parsing (e. g.

Details

Open-source search platform built on Apache Lucene. Provides full-text, vector, and geospatial search with distributed indexing, replication, and a web admin UI. XML/HTTP and Java APIs.

Details

Decentralized P2P web search engine and crawler. Run as a shared global index, your own search portal, or intranet search. Includes HTTP proxy and configurable crawling.

Details

High-performance full-text search engine in Java. Usable as standalone engine, metasearcher, P2P hub, or embeddable library; cross-platform. Project also includes Egothor2 (stemmer, formats), Bobo crawler, and related tools.

Details
CompassInactive

Java search framework on top of Lucene; adds declarative search semantics and integrates with Hibernate and Spring. Synchronizes application data model with search index.

Details
ZilverlineInactive

Lucene-based 'reverse' search engine for local or network disks. Indexes collections of PDF, Word, HTML, TXT, CHM, ZIP, and RAR; search locally or via a web server. Plugin system for custom extractors.

Details
HounderInactive

Complete search system: crawler, indexer, and search UI/API. Targets documents of interest, scales in indexed pages, crawl speed, and concurrent queries.

Details
HSearchInactive

NoSQL search engine on Hadoop and HBase. Multi-format indexing, record/document-level access control, continuous updates, parallel indexing, REST/XML gateway, auto sharding and replication.

Details
LiusInactive

LIUS (Lucene Index Update and Search): Java indexing framework on Lucene. Indexes Word, Excel, PowerPoint, RTF, PDF, XML, HTML, TXT, OpenOffice, JavaBeans; XML-based configuration. Development ended; successor: Constellio.

Details
OxyusInactive

Java application for indexing web documents for intranet or Internet search. Delivers results via a web module and Java Server using a JDBC repository and Java Beans.

Details
PiscatorInactive

Small SQL/XML search engine. Loads an XML feed and exposes it for querying via plain SQL, similar to DB2 side tables.

Details
regainInactive

Fast search engine on top of Apache Lucene. Crawls files and web pages via pluggable preparators for multiple formats. Offers a desktop search with crawler and HTTP server, or a server-side installation for website/intranet full-text search with XML configuration.

Details
BDDBotInactive

Web robot, search engine, and web server in Java. Example from a book on building search engines; small footprint and compact indexes.

Details