介绍

Grafana 是一款采用 go 语言编写的开源应用,可以从Elasticsearch,Prometheus,Graphite,InfluxDB等各种数据源中获取数据,并通过精美的图形将其可视化。

除了Prometheus的AlertManager 可以发送报警,Grafana 同时也支持告警。Grafana 可以无缝定义告警在数据中的位置,可视化的定义阈值,并可以通过钉钉、email等平台获取告警通知。最重要的是可直观的定义告警规则,不断的评估并发送通知。

由于Grafana alert告警比较弱,大部分告警都是通过Prometheus Alertmanager进行告警.

安装

见:https://github.com/behappy-project/behappy-docker-application/tree/master/grafana

图表配置

在时序图表配置场景下,我们需要核心关注配置的有:

  1. Metrics: promQL查询语句【注:当使用rancher部署方式时,此处编写会有乱码情况,解决办法是在PrometheusUI中编写粘贴到这里】
  2. Legend: 样本展示文字, 变量替换使用{{xxx}}方式,xxx字段必须保证Metrics中的promQL能够查出来
  3. Step/interval: 采集点间隔,每隔一段时间,采集一次数据。
    一条曲线的数据点数量 = 图表时长 / 采样间隔。例如查看最近24小时的数据,采样 间隔5min,数据点数量=24*60/5=288。
    采集间隔时间越短,采样率越大,图表数据量越大,曲线越平滑。 采集间隔默认自动计算生成,也可以自定义配置。
  4. metric time range: 每个点的数据统计时间区间时长。
    以QPS为例,图表上每个时间点的数据的意义是:在这时间点上,过去n秒间的访问量。

从上图可以看到,

  • 如果采样间隔 > 统计区间时长: 数据采样率 < 100%。未能采集到的数据丢弃,不会再图表上展示。采样率过小可能会错误异常的数据指标。
  • 如果采样间隔 == 统计区间时长,采样率100%。
  • 如果采样间隔 < 统计区间时长,数据被重复统计,意义不大。

自定义变量

为了实现一些常用的筛选过滤场景,grafana 提供了变量功能

Variables介绍

通过Dashboard页面的Settings选项,可以进入Dashboard的配置页面并且选择Variables子菜单:

用户需要指定变量的名称,后续用户就可以通过$variable_name的形式引用该变量。Grafana目前支持6种不同的变量类型,而能和Prometheus一起工作的主要包含以下5种类型:

类型工作方式
Query允许用户通过Datasource查询表达式的返回值动态生成变量的可选值
Interval该变量代表时间跨度,通过Interval类型的变量,可以动态改变PromQL区间向量表达式中的时间范围。如rate(node_cpu[2m])
Datasource允许用户动态切换当前Dashboard的数据源,特别适用于同一个Dashboard展示多个数据源数据的情况
Custom用户直接通过手动的方式,定义变量的可选值
Constant常量,在导入Dashboard时,会要求用户设置该常量的值

除了使用PromQL查询时间序列以过滤标签的方式以外,Grafana还提供了几个有用的函数

函数作用
label_values(label)返回Promthues所有监控指标中,标签名为label的所有可选值
label_values(metric, label)返回Promthues所有监控指标metric中,标签名为label的所有可选值
metrics(metric)返回所有指标名称满足metric定义正则表达式的指标名称
query_result(query)返回prometheus查询语句的查询结果

例如 label_values(node_uname_info{}, job) 获取job name

变量配置

设置一个变量

注意上面的变量配置,label_values(instance) 获取的是所有的实例ip,然而一般的情况下,我们需要针对应用维度进行区分,比如每个上报的metric,都包含application,现在我只希望查看prometheus-example应用的相关信息

测试变量配置可以如下

1
label_values(http_server_requests_seconds_count{application="prometheus-example"}, instance)

注意http_server_requests_seconds_count 这个属于上报metric name,必须要选一个实际有的指标才可以;接下来配置大盘

include all配置

上面这个完成了一个基本的变量使用配置,但是有这么个问题,如果我想查这个应用所有机器的监控,该怎么办?

为了支持使用全部,我们的metrics的表达式,就不能使用之前的精确匹配了,需要改成正则方式

1
(rate(http_server_requests_seconds_count{instance=~"$ip"}[1m]))

小结

借助Grafana的变量配置来实现大盘的条件筛选,其中变量配置关键点在于

1
2
3
4
5
6
# 假设我要获取的值字段值为"instance"
# 获取所有instance的值
label_values(instance)
# 获取满足条件的instance【获取http_requests_total指标下,service=$serice的instance值】
# 此处的$service为另外一个筛选字段,这样就实现了随着其它字段值的改变动态获取instance值。当然此处也可以写死。
label_values(http_requests_total{service='$service'}, instance)

其次在大盘的metric配置中,对于include all的支持,关键点在于promql的使用

  • = : 选择与提供的字符串完全相同的标签。
  • != : 选择与提供的字符串不相同的标签。
  • =~ : 选择正则表达式与提供的字符串(或子字符串)相匹配的标签。
  • !~ : 选择正则表达式与提供的字符串(或子字符串)不匹配的标签。

Run Grafana behind a reverse proxy

参考:

  1. https://grafana.com/tutorials/run-grafana-behind-a-proxy/
  2. https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/
  3. https://github.com/grafana/helm-charts/blob/main/charts/grafana/README.md
  4. https://github.com/grafana/grafana/blob/main/conf/defaults.ini

一级路径

只需要修改nginx配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# this is required to proxy Grafana Live WebSocket connections.
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}

server {
listen 80;
root /usr/share/nginx/html;
index index.html index.htm;

location / {
proxy_pass http://localhost:3000/;
}

# Proxy Grafana Live WebSocket connections.
location /api/live {
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $http_host;
proxy_pass http://localhost:3000/;
}
}

二级路径

除了修改nginx配置文件外,还需要修改grafana配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# this is required to proxy Grafana Live WebSocket connections.
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}

server {
listen 80;
root /usr/share/nginx/www;
index index.html index.htm;

location ^~ /grafana/ {
proxy_pass http://localhost:3000/;
}

# Proxy Grafana Live WebSocket connections.
location ^~ /grafana/api/live {
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $http_host;
proxy_pass http://localhost:3000/;
}
}

修改grafana.ini配置文件

1
2
3
4
5
# vim /etc/grafana/grafana.ini
root_url = %(protocol)s://%(domain)s:%(http_port)s/grafana
serve_from_sub_path = true

# systemctl restart grafana-server

Dashboard

Dashboard 市场

前往 Grafana Lab - Dashboards ,输入关键词即可搜索指定Dashboard。就可以获得你想要的。

另外,这些已有的dashboard也可以让我们更快掌握一些panel的配置和dashboard的使用。

引入dashboard

这里给出两款比较好用的dashboard:

  • 就是点击Import按钮:
  • 输入ID 之后,完成配置,点击Import按钮:
  • 效果如下:

附上一份自定义json model by sopei dashboard

以下的dashboard时间间隔都是1m,所以需要保证service/podMonitor抓取指标的interval小于1m才ok

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": 43,
"iteration": 1679017320555,
"links": [],
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fieldConfig": {
"defaults": {},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 10,
"w": 24,
"x": 0,
"y": 0
},
"hiddenSeries": false,
"id": 12,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.5.11",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"exemplar": true,
"expr": "sum(rate(http_requests_total{service=~\"^app.*|^api.*|^bff.*\"}[1m]))",
"instant": false,
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "最近 1 分钟 系统整体QPS",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:1763",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:1764",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fieldConfig": {
"defaults": {},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 10
},
"hiddenSeries": false,
"id": 11,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.5.11",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"exemplar": true,
"expr": "sum(rate(http_requests_total{service=\"$service\"}[1m]))by(pod)",
"instant": false,
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "最近 1 分钟 $service 服务整体QPS",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:1763",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:1764",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fieldConfig": {
"defaults": {},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 10
},
"hiddenSeries": false,
"id": 8,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.5.11",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"exemplar": true,
"expr": "sum(rate(http_requests_total{service=\"$service\",path=~\".*\"}[1m])) by (method,path)",
"interval": "",
"legendFormat": "{{method}}-{{path}}",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "最近 1 分钟平均 QPS, 根据路由分组",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:1763",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:1764",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"datasource": null,
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 18
},
"id": 2,
"options": {
"displayMode": "lcd",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showUnfilled": true,
"text": {}
},
"pluginVersion": "7.5.11",
"targets": [
{
"exemplar": true,
"expr": "sum(irate(http_requests_total{code!=\"0\",service=\"$service\", path=~\".*\"}[5m])) BY (method,service, path, code)",
"interval": "",
"legendFormat": "{{method}}-{{path}}-{{code}}",
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "最近 5 分钟, 非成功请求率, 根据路由分组",
"type": "bargauge"
},
{
"datasource": null,
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"graph": false,
"legend": false,
"tooltip": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": true
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "ms"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 18
},
"id": 6,
"options": {
"graph": {},
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "right"
},
"tooltipOptions": {
"mode": "single"
}
},
"pluginVersion": "7.5.11",
"targets": [
{
"exemplar": true,
"expr": "avg(increase(http_request_duration_ms_sum{service=\"$service\",path=~\".*\"}[1m]) / increase(http_request_duration_ms_count{service=\"$service\",path=~\".*\"}[1m]) >0) by (method,path)",
"interval": "",
"legendFormat": "{{method}}-{{path}}",
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "最近 1 分钟平均响应时间, 根据路由分组",
"type": "timeseries"
},
{
"datasource": null,
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "ms"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 26
},
"id": 10,
"options": {
"displayLabels": [],
"legend": {
"displayMode": "list",
"placement": "right",
"values": []
},
"pieType": "pie",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"text": {}
},
"pluginVersion": "7.5.11",
"targets": [
{
"exemplar": true,
"expr": "histogram_quantile(0.90, sum(irate(http_request_duration_ms_bucket{service=~\"$service\", path=~\".*\"}[1m])) by (method, path, le))\r\n",
"interval": "",
"legendFormat": "{{method}}-{{path}}",
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "最近 1 分钟 90 分位响应时间, 根据路由分组",
"type": "piechart"
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fieldConfig": {
"defaults": {},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 26
},
"hiddenSeries": false,
"id": 9,
"legend": {
"alignAsTable": false,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.5.11",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"exemplar": true,
"expr": "increase(http_requests_total{service=\"$service\"}[5m])/100",
"interval": "",
"legendFormat": "{{method}}-{{path}}",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "最近5分钟请求的增长率, 根据路由分组",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:1763",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:1764",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"schemaVersion": 27,
"style": "dark",
"tags": [],
"templating": {
"list": [
{
"allValue": null,
"current": {
"selected": true,
"text": "api-open-service",
"value": "api-open-service"
},
"datasource": null,
"definition": "label_values(nodejs_version_info, service)",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "service",
"multi": false,
"name": "service",
"options": [],
"query": {
"query": "label_values(nodejs_version_info, service)",
"refId": "StandardVariableQuery"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-3h",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "业务服务自定义指标监控",
"uid": "oWFY5AxVz",
"version": 34
}

Lable Mapping

映射面板中展示的lable为其他名称

参考:https://community.grafana.com/t/grafana-time-series-mqtt-legend-names-change/72506

1- Select Add overwrite on the right side at the bottom

2- Selection option “Fields with name”

3- Select the field e.g. in your case select Value 1

4- Then click on “Add overwrite property”

5- Select Standard options → Display name

6- Give your custom name here e-g- CPU 1

7- Click Save and Apply on the top right corner

Value Mapping

映射面板中展示的value为其他名称

参考:https://grafana.com/docs/grafana/latest/panels-visualizations/configure-value-mappings/