A powerful semantic search plugin for Fess, the open-source enterprise search server. This plugin extends Fess's search capabilities by integrating neural search using OpenSearch's machine learning features and vector similarity search.
- Neural Search Integration: Leverages OpenSearch ML Commons plugin for semantic vector search
- Automatic Query Rewriting: Converts traditional text queries to neural queries when appropriate
- Rank Fusion Processing: Combines traditional and semantic search results for improved relevance
- Content Chunking: Processes long documents in chunks for better semantic matching
- Configurable Models: Supports multiple pre-trained transformer models from HuggingFace
- Seamless Integration: Works as a drop-in plugin for existing Fess installations
- Fess 15.0+ (Full-text Enterprise Search Server)
- OpenSearch 2.x with ML Commons plugin enabled
- Docker and Docker Compose (recommended for setup)
git clone https://github.com/codelibs/docker-fess.git
cd docker-fess/composeAdd the following line to your compose.yaml:
environment:
- "FESS_PLUGINS=fess-webapp-semantic-search:15.1.0"docker compose -f compose.yaml -f compose-opensearch2.yaml up -dDownload and run the setup script:
curl -o setup.sh https://raw.githubusercontent.com/codelibs/fess-webapp-semantic-search/main/tools/setup.sh
chmod +x setup.sh
./setup.sh localhost:9200The setup script will:
- Display available pre-trained models
- Register your selected model in OpenSearch
- Create the neural search pipeline
- Provide the configuration settings
In Fess Admin Panel (Admin > General > System Properties), add the configuration provided by the setup script:
fess.semantic_search.pipeline=neural_pipeline
fess.semantic_search.content.field=content_vector
fess.semantic_search.content.dimension=384
fess.semantic_search.content.method=hnsw
fess.semantic_search.content.engine=lucene
fess.semantic_search.content.space_type=cosinesimil
fess.semantic_search.content.model_id=<your-model-id>For better performance, you can add these optional parameters:
# HNSW search-time parameter (higher = better recall, slower search)
fess.semantic_search.content.param.ef_search=100
# Enable performance monitoring for debugging
fess.semantic_search.performance.monitoring.enabled=true
# Enable batch inference (requires compatible ML model setup)
fess.semantic_search.batch_inference.enabled=trueTo improve result diversity using Maximal Marginal Relevance:
# Enable MMR
fess.semantic_search.mmr.enabled=true
# Lambda: 1.0 = only relevance, 0.0 = only diversity, 0.5 = balanced
fess.semantic_search.mmr.lambda=0.7- Go to Admin > Maintenance and start reindexing
- Create your crawling configuration
- Start the crawler
- Begin semantic searching!
The plugin supports various pre-trained transformer models:
| Model | Dimension | Description |
|---|---|---|
| all-MiniLM-L6-v2 | 384 | Fast and efficient, good for general use |
| all-mpnet-base-v2 | 768 | Higher quality, slower performance |
| all-distilroberta-v1 | 768 | RoBERTa-based, good performance |
| msmarco-distilbert-base-tas-b | 768 | Optimized for passage retrieval |
| multi-qa-MiniLM-L6-cos-v1 | 384 | Specialized for question answering |
| paraphrase-multilingual-MiniLM-L12-v2 | 384 | Multilingual support |
| Property | Description | Default |
|---|---|---|
fess.semantic_search.pipeline |
Neural search pipeline name | - |
fess.semantic_search.content.model_id |
ML model ID in OpenSearch | - |
fess.semantic_search.content.field |
Vector field name | - |
fess.semantic_search.content.dimension |
Vector dimension size | - |
| Property | Description | Default |
|---|---|---|
fess.semantic_search.content.method |
Vector search method | hnsw |
fess.semantic_search.content.engine |
Vector search engine | lucene |
fess.semantic_search.content.space_type |
Distance calculation method | cosinesimil |
fess.semantic_search.min_score |
Minimum similarity score | - |
fess.semantic_search.min_content_length |
Minimum content length for processing | - |
fess.semantic_search.content.chunk_size |
Number of chunks to return | 1 |
| Property | Description | Default |
|---|---|---|
fess.semantic_search.content.param.m |
HNSW M parameter (higher = better recall, more memory) | 16 |
fess.semantic_search.content.param.ef_construction |
HNSW ef_construction parameter (higher = better quality, slower indexing) | 100 |
fess.semantic_search.content.param.ef_search |
HNSW ef_search parameter (higher = better recall, slower search) | Not set (OpenSearch default) |
| Property | Description | Default |
|---|---|---|
fess.semantic_search.performance.monitoring.enabled |
Enable detailed performance logging | false |
fess.semantic_search.batch_inference.enabled |
Enable batch inference for better GPU utilization | false |
| Property | Description | Default |
|---|---|---|
fess.semantic_search.mmr.enabled |
Enable Maximal Marginal Relevance for diversity | false |
fess.semantic_search.mmr.lambda |
MMR lambda (1.0=relevance, 0.0=diversity) | 0.5 |
- SemanticSearchHelper: Central component managing neural search configuration and model interactions
- NeuralQueryBuilder: Custom OpenSearch query builder for neural/vector search queries
- SemanticPhraseQueryCommand: Converts phrase queries to neural queries when appropriate
- SemanticTermQueryCommand: Handles term-based semantic search queries
- SemanticSearcher: Extends Fess's DefaultSearcher for rank fusion processing
- Query Processing: Integrates with Fess's QueryParser to rewrite queries for semantic search
- Document Processing: Adds rewrite rules for OpenSearch mapping and settings to support vector fields
- Rank Fusion: Registers as a searcher in Fess's rank fusion processor
- DI Container: Uses LastaDi for dependency injection
git clone https://github.com/codelibs/fess-webapp-semantic-search.git
cd fess-webapp-semantic-search
mvn clean packagemvn testmvn clean compile javadoc:javadocThe plugin is available from Maven Central:
<dependency>
<groupId>org.codelibs.fess</groupId>
<artifactId>fess-webapp-semantic-search</artifactId>
<version>15.1.0</version>
</dependency>- Download the JAR from Maven Repository
- Place it in your Fess webapp/WEB-INF/lib/ directory
- Restart Fess
See the Fess Plugin Guide for detailed installation instructions.
We welcome contributions!
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Run the test suite (
mvn test) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project uses:
- Maven for build management
- JUnit for testing
- CheckStyle for code formatting
- JavaDoc for documentation
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
This plugin is optimized for OpenSearch 3.3 with significant performance improvements and new features:
- Concurrent Segment Search: Enabled by default, up to 2.5x faster k-NN queries
- Improved HNSW: Default
space_typechanged tocosinesimilfor better semantic search accuracy - Performance Monitoring: Optional detailed query performance tracking
- Advanced Tuning: Fine-grained control over HNSW parameters including
ef_search
If upgrading from v15.2.x or earlier:
- The default
space_typehas changed froml2tocosinesimil - To maintain compatibility with existing indices, explicitly set:
fess.semantic_search.content.space_type=l2 - For new deployments, the new default
cosinesimilis recommended
| Plugin Version | Fess Version | OpenSearch Version |
|---|---|---|
| 15.3.x | 15.3+ | 3.3.x (recommended) |
| 15.0.x | 15.0+ | 2.x |
| 14.9.x | 14.9+ | 2.x |
- Documentation: Fess Documentation
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Community: Fess Community
- CodeLibs for developing and maintaining Fess
- HuggingFace for providing pre-trained transformer models
- OpenSearch team for ML Commons plugin
- All contributors who have helped improve this plugin