Friday, May 26, 2017

elasticsearch distinct value and search

In elasticsearch, to get the distinct value of one field, it is using term aggregation.

For example, here are the documents 
{
    "sourceTitle": "Arrival",
    "otherFields": ...
}
{
    "sourceTitle": "Arrival",
    "otherFields": ...
}
{
    "sourceTitle": "Eye in the Sky",
    "otherFields": ...
}
We want to get the result
    Arrival
    Eye in the sky
Here is the query to get the distinct values.
{
  "size": 0,
  "aggs": {
    "sourceTitle": {
      "terms": {
        "field": "sourceTitle",
        "size": 10
      }
    }
  }
}

But by default, elastic search will tokenize the field when indexing the data. To avoid that, we need make the field type "keyword". Then another problem comes up, when searching on "keyword" field, it has to be fully matched. Searching "Eye" on sourceTitle won't return anything. How to support getting distinct value and searching by partial text at the same moment.

We can set the field type to be text and give it a child field "keyword" which type is "keyword". 
"sourceTitle": {
  "type": "text",
  "fields": {
    "keyword": {
      "type": "keyword"
    }
  }
}

When getting distinct value, we shall run the aggs on the sourceTitle.keyword.
{
  "size": 0,
  "aggs": {
    "sourceTitle": {
      "terms": {
        "field": "sourceTitle.keyword",
        "size": 10
      }
    }
  }
}

https://discuss.elastic.co/t/distinct-value-and-search/86838/2

No comments:

Post a Comment