什么时候需要重建索引

索引的mappings发生变更
索引的setting发生变更
集群内,集群间,需要做数据迁移

数据预处理

Ingest Pipeline

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
PUT _ingest/pipeline/split_xxx
{
"processors": [
{
"split": {
"field": "xxx",
"separator": ","
},
{
"set": {
"field": "xxx",
"value": "0"
}
}
}
]
}

# reindex
POST _reindex
{
"source": {
"index": "index_old"
},
"dest": {
"index": "index_new",
"pipeline": "split_xxx"
}
}

用script脚本在同步时做数据处理

es支持的script非常强大,这个不详细讲,仅仅举个简单的例子

1
2
3
4
5
6
7
8
9
10
11
12
13
POST _reindex
{
"source": {
"index": "index_old"
},
"dest": {
"index": "index_new"
},
"script": {
"source": "ctx._source.age += 2",
"lang": "painless"
}
}

字段重新命名

同样是用script,讲name属性重命名为newName

1
2
3
4
5
6
7
8
9
10
11
12
13
POST _reindex
{
"source": {
"index": "index_old"
},
"dest": {
"index": "index_new"
},
"script": {
"source": "ctx._source.newName = ctx._source.remove(\"name\")",
"lang": "painless"
}
}

只同步源index里部分字段

指定包含字段: 只将指定的字段从源索引复制到目标索引。

1
2
3
4
5
6
7
8
9
10
POST _reindex
{
"source": {
"index": "index_old",
"_source": ["name", "age"]
},
"dest": {
"index": "index_new"
}
}

只创建目标索引中缺少的文档

1
2
3
4
5
6
7
8
9
10
POST _reindex
{
"source": {
"index": "index_old"
},
"dest": {
"index": "index_new",
"op_type": "create"
}
}

设置批次大小

reindex底层是scroll,默认批次是1000条,可以设置多点

1
2
3
4
5
6
7
8
9
10
POST _reindex
{
"source": {
"index": "index_old",
"size": 5000
},
"dest": {
"index": "index_new"
}
}

遇到冲突继续

1
2
3
4
5
6
7
8
9
10
11
POST _reindex
{
"conflicts": "proceed",
"source": {
"index": "index_old"
},
"dest":
"index": "index_new",
"op_type": "create"
}
}

只reindex符合条件的数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
POST _reindex
{
"source": {
"index": "index_old",
"query": {
"term": {
"name.keyword": {
"value": "冬哥"
}
}
}
},
"dest": {
"index": "index_new"
}
}

屏蔽掉不想同步的字段

1
2
3
4
5
6
7
8
9
10
11
12
POST _reindex
{
"source": {
"index": "index_old",
"_source": {
"excludes": ["name"]
}
},
"dest": {
"index": "index_new"
}
}

不停机(alias)

Alias

一个索引可以接受多个别名,而一个别名也可以映射到多个索引,当指定别名时,别名将自动扩展到添加的索引。别名也可以关联到 filter,然后自动应用到检索,和 routing value。别名不能与索引同名。
在同一个 API 接口中可以先移除然后添加操作。该操作是原子操作,无需担心别名不指向任何一个索引的短暂瞬间

1
2
3
4
5
6
7
8
9
# 添加别名示例:
PUT /my_index_name/_alias/alias_name

POST /_aliases
{
"actions": [
{ "add": { "index": "my_index__name_v2", "alias": "alias_name" }}
]
}

首先保证要重建的索引是有别名的,并且其他业务方是使用这个alias来获取数据的
新建一个索引,将老索引的数据添加到新增的索引上

1
2
3
4
5
6
7
8
9
POST _reindex
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}

把老索引的别名删除,再给新增的索引添加老索引的别名

1
2
3
4
5
6
7
POST /_aliases
{
"actions": [
{ "remove": { "index": "my_index_name_v1", "alias": "alias_name" }},
{ "add": { "index": "my_index__name_v2", "alias": "alias_name" }}
]
}

停机(未建别名的情况)

1
2
3
4
5
6
7
1. 创建一个中间索引
2. 向中间索引备份源索引的数据(mapping)
3. 查询确认数据是否copy过去
4. 删除有问题的索引
5. 重新创建同名的索引(★字段类型修改正确★)
6. 从中间索引还原到源索引的数据
7. 删除中间索引

建中间索引(可以直接再kibana的索引管理-映射直接copy,es-7之后的版本记得把mapping下的_doc(7之后不支持type了)去掉)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
PUT prod-sopei-reindex
{
"mappings": {
"dynamic_templates": [
{
"message_field": {
"path_match": "message",
"match_mapping_type": "string",
"mapping": {
"norms": false,
"type": "text"
}
}
},
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"norms": false,
"type": "text"
}
}
}
],
"properties": {
"@timestamp": {
"type": "date"
},
"@version": {
"type": "keyword"
}
...
}
}
}

源copy到指定目标(wait_for_completion=false为异步copy)

这里目前没找到可以停止异步任务的api
另外我是在kibana操作的,这是一个日志索引,kibana默认timeout30s,可以修改kibana.yml; elasticsearch.requestTimeout: 60000(治标不治本)

1
2
3
4
5
6
7
8
9
10
11
POST _reindex?wait_for_completion=false
{
"source": {
"index": "xxx-20210916"
},
"dest": {
"index": "xxx-reindex"
}
}
# 查询reindex任务进度
GET _tasks?detailed=true&actions=*reindex

查询确认数据是否copy过去

1
GET xxx-reindex/_search

删除老索引

1
DELETE xxx-20210916

重新创建与老索引同名的索引(★但配置/Mapping取自新索引★)

1
2
3
4
PUT xxx-20210916
{
......
}

还原到源索引的数据

1
2
3
4
5
6
7
8
9
10
11
POST _reindex?wait_for_completion=false
{
"source": {
"index": "xxx-reindex"
},
"dest": {
"index": "xxx-20210916"
}
}
# 查询reindex任务进度
GET _tasks?detailed=true&actions=*reindex

删除中间索引

1
DELETE xxx-reindex