es查询忽略大小写

发表于2021-07-15|更新于2023-06-16|elasticsearch

|总字数:419|阅读时长:2分钟|浏览量:

默认分词器是Standard 标准分词器，是不区分大小写的。

在进行数据存储时, 大写的英文字符会转换成小写。
但keyword类型属于精准匹配，没法实现大小写区分。

normalizer

官方解释

第一：normalizer是 keyword的一个属性，类似 analyzer分词器的功能，不同的地方在于：可以对 keyword生成的单一 Term再做进一步的处理。
第二：normalizer 在 keyword 类型数据索引化之前被使用，同时在 match 或者 term 类型检索阶段也能被使用。

使用方式

PUT index
{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "foo": {
        "type": "keyword",
        "normalizer": "my_normalizer"
      }
    }
  }
}
------------------------------------------------------------------------------------------------------------------------------
PUT index
{"index":{"_id":1}}
{ "city": "New York"}
{"index":{"_id":2}}
{ "city": "new York"}
{"index":{"_id":3}}
{ "city": "New york"}
{"index":{"_id":4}}
{ "city": "NEW YORK"}
{"index":{"_id":5}}
{ "city": "Seattle"}
------------------------------------------------------------------------------------------------------------------------------
GET index/_search
{
  "size": 0,
  "aggs": {
    "cities": {
      "terms": {
        "field": "city"
      }
    }
  }
}

```
"aggregations" : {
    "cities" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "new york",
          "doc_count" : 4
        },
        {
          "key" : "seattle",
          "doc_count" : 1
        }
      ]
    }
  }
```

这样一来无论是精准检索,模糊检索还是聚合都是不区分大小写的

文章作者: 小五

文章链接: https://xiaowu95.wang/posts/2021b509/

版权声明: 本博客所有文章除特别声明外，均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来源小五的个人杂货铺！

elasticsearch elastic stack

感谢支持

微信
支付宝

相关推荐

Too many dynamic script compilations within, max[75/5m];

ResponseError: search_phase_execution_exception: [circuit_breaking_exception] Reason: [script] Too many dynamic script compilations within, max: [75/5m]; please use indexed, or scripts with parameters instead; this limit can be changed by the [script.context.number_sort.max_compilations_rate] setting

Elasticsearch：Snapshot备份与恢复

SnapshotElasticsearch文档里对于snapshot有如下描述： 1The index snapshot process is incremental. In the process of making the index snapshot Elasticsearch analyses the list of the index files that are already stored in the repository and copies only files that were created or changed since the last...

Elasticsearch之缓存

转自：https://www.jianshu.com/p/1ec202148189 Elasticsearch 包含三个类型的缓存，分别为： Node Query Cache 、 Shard Request Cache 、 Fielddata Cache。 Node Query Cache作用域Query Cache是Node级别的，被所有shard共享。早期版本也叫做为Filter Cache，顾名思义，它的作用是对过滤器的执行结果进行缓存。 Query Cache缓存的是压缩过的bitset，对应满足Query条件的docID列表。添加cache的时候，会注册一个回调，如果Segment被合并或者删除，那么就会被移除缓存简单来看可以这样理解，一个ES的查询会先被parse 成一系列Lucene 的phrase，这些phrases 中的filter语句，如果对于查询条件是一样的时候，其实结果集是已定的，那么这些phrase 其实就是可以存放在一个地方当做cache用，这个就是 query...

网站已更新最新版本点击刷新