Skip to content Skip to sidebar Skip to footer

With Elasticsearch, Can I Highlight With Different Html Tags For Different Matched Tokens?

Learning ES at the moment, but I'm very keen to implement this. I know you can highlight different fields with different tags, using the pre_tags and post_tags keys of highlight in

Solution 1:

The fast vector highlighter is a good place to start. I haven't worked w/ French yet so don't consider the following authoritative but based on the built-in french analyzer, we could do something like this:

PUT multilang_index
{
  "mappings": {
    "properties": {
      "description": {
        "type": "text",
        "term_vector": "with_positions_offsets",
        "fields": {
          "french": {
            "type": "text",
            "analyzer": "french",
            "term_vector": "with_positions_offsets"
          }
        }
      }
    }
  }
}

FYI the french analyzer could be reimplemented/extended as shown here.

After ingesting the English & French examples:

POST multilang_index/_doc
{
  "description": "the other data day data was walking through some interesting woods and data had an interesting thought about some data"
}

POST multilang_index/_doc
{
  "description": "Les autres éléments ont été changés car on a appliqué un changement à chaque élément"
}

We can query for interesting data like so:

POST multilang_index/_search
{
  "query": {
    "simple_query_string": {
      "query": "interesting data",
      "fields": [
        "description"
      ]
    }
  },
  "highlight": {
    "fields": {
      "description": {
       "type": "fvh",
       "pre_tags": ["<font color=\"red\">", "<font color=\"blue\">"],
       "post_tags": ["</font>", "</font>"]
      }
    },
    "number_of_fragments": 0
  }
}

yielding

the other <fontcolor="blue">data</font> day <fontcolor="blue">data</font> 
was walking through some <fontcolor="red">interesting</font> woods and 
<fontcolor="blue">data</font> had an <fontcolor="red">interesting</font>
thought about some <fontcolor="blue">data</font>

and analogously for changer élément:

POST multilang_index/_search
{
  "query": {
    "simple_query_string": {
      "query": "changer élément",
      "fields": [
        "description.french"
      ]
    }
  },
  "highlight": {
    "fields": {
      "description.french": {
       "type": "fvh",
       "pre_tags": ["<font color=\"red\">", "<font color=\"blue\">"],
       "post_tags": ["</font>", "</font>"]
      }
    },
    "number_of_fragments": 0
  }
}

yielding

Les autres <fontcolor="blue">éléments</font> ont été 
<fontcolor="red">changés</font> car on a appliqué un 
<fontcolor="red">changement</font> à chaque <fontcolor="blue">élément</font>

which, to me, looks correctly stemmed.


Note that the pre_tags order is enforced based on what token inside of the simple_query_string query matches first. When querying for changer élément, the shingle éléments in the description is matched first but what caused it to match is the 2nd token (élément), thereby the blue html tag instead of the red.

Post a Comment for "With Elasticsearch, Can I Highlight With Different Html Tags For Different Matched Tokens?"