With Elasticsearch, Can I Highlight With Different Html Tags For Different Matched Tokens?
Solution 1:
The fast vector highlighter is a good place to start. I haven't worked w/ French yet so don't consider the following authoritative but based on the built-in french
analyzer, we could do something like this:
PUT multilang_index
{
"mappings": {
"properties": {
"description": {
"type": "text",
"term_vector": "with_positions_offsets",
"fields": {
"french": {
"type": "text",
"analyzer": "french",
"term_vector": "with_positions_offsets"
}
}
}
}
}
}
FYI the french
analyzer could be reimplemented/extended as shown here.
After ingesting the English & French examples:
POST multilang_index/_doc
{
"description": "the other data day data was walking through some interesting woods and data had an interesting thought about some data"
}
POST multilang_index/_doc
{
"description": "Les autres éléments ont été changés car on a appliqué un changement à chaque élément"
}
We can query for interesting data
like so:
POST multilang_index/_search
{
"query": {
"simple_query_string": {
"query": "interesting data",
"fields": [
"description"
]
}
},
"highlight": {
"fields": {
"description": {
"type": "fvh",
"pre_tags": ["<font color=\"red\">", "<font color=\"blue\">"],
"post_tags": ["</font>", "</font>"]
}
},
"number_of_fragments": 0
}
}
yielding
the other <fontcolor="blue">data</font> day <fontcolor="blue">data</font>
was walking through some <fontcolor="red">interesting</font> woods and
<fontcolor="blue">data</font> had an <fontcolor="red">interesting</font>
thought about some <fontcolor="blue">data</font>
and analogously for changer élément
:
POST multilang_index/_search
{
"query": {
"simple_query_string": {
"query": "changer élément",
"fields": [
"description.french"
]
}
},
"highlight": {
"fields": {
"description.french": {
"type": "fvh",
"pre_tags": ["<font color=\"red\">", "<font color=\"blue\">"],
"post_tags": ["</font>", "</font>"]
}
},
"number_of_fragments": 0
}
}
yielding
Les autres <fontcolor="blue">éléments</font> ont été
<fontcolor="red">changés</font> car on a appliqué un
<fontcolor="red">changement</font> à chaque <fontcolor="blue">élément</font>
which, to me, looks correctly stemmed.
Note that the pre_tags
order is enforced based on what token inside of the simple_query_string
query matches first. When querying for changer élément
, the shingle éléments
in the description
is matched first but what caused it to match is the 2nd token (élément
), thereby the blue
html tag instead of the red
.
Post a Comment for "With Elasticsearch, Can I Highlight With Different Html Tags For Different Matched Tokens?"