The highlighting module extracts and formats the portions of a document that match a query, so you can show users exactly why a result was returned. The module contains two distinct APIs: the modernDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/apache/lucene/llms.txt
Use this file to discover all available pages before exploring further.
UnifiedHighlighter and the legacy Highlighter.
Dependency
UnifiedHighlighter (recommended)
UnifiedHighlighter is the current, preferred API. It supports multiple offset strategies — postings offsets, term vectors, or re-analysis — and selects the best available strategy automatically per field.
It treats each document as a mini-corpus, scores passages the way Lucene scores documents, and uses a BreakIterator (defaulting to sentence boundaries) to define passage boundaries.
How it works
UnifiedHighlighter can retrieve offsets from three sources, chosen in preference order:
- Postings with offsets — index the field with
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETSfor best performance. - Term vectors with offsets — index the field with
FieldType.setStoreTermVectorOffsets(true). - Re-analysis — works on any stored field but is slower.
Setup
Build the highlighter using itsBuilder:
Highlight a single field
highlight() returns one snippet string per document in the TopDocs result, in the same order as topDocs.scoreDocs.
Highlight multiple fields at once
Controlling the number of passages
PassmaxPassages to highlight() to control how many top-ranked snippets are concatenated into the returned string:
PassageFormatter
By default, matching terms are wrapped in<b> tags and passages are separated by " ... ". You can customize this by providing a custom PassageFormatter to the builder:
PassageFormatter receives a Passage[] (each holding start/end offsets and term match positions) and the original field text, and returns a formatted Object (usually a String).
Classic Highlighter (legacy)
The originalHighlighter class (org.apache.lucene.search.highlight.Highlighter) remains available for backward compatibility. It requires storing term vectors and operates on a single document string at a time.
Choosing an offset source
| Source | Index option | Performance | Field type |
|---|---|---|---|
| Postings offsets | DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS | Fastest | TextField |
| Term vector offsets | setStoreTermVectorOffsets(true) | Fast, larger index | Any stored |
| Re-analysis | None required | Slowest, no extra index size | Any stored |
UnifiedHighlighter selects the best available source automatically. You can force a specific strategy by subclassing.