我在第二章中有介绍使用koa整合Prometheus自定义指标,这里记录下整合Springboot和Prometheus实现自定义指标

要在Spring Boot中使用Micrometer-registry-prometheus记录QPS和响应时间,可以按照以下步骤操作

Spring-boot-starter-actuator

SpringBoot中的spring-boot-starter-actuator依赖已经集成了对Micrometer的支持,其中的metrics端点的很多功能就是通过Micrometer实现的,prometheus端点默认也是开启支持的,实际上actuator依赖的spring-boot-actuator-autoconfigure中集成了对很多框架的开箱即用的API,其中prometheus包中集成了对Prometheus的支持,使得使用了actuator可以轻易地让项目暴露出prometheus端点,使得应用作为Prometheus收集数据的客户端,Prometheus(服务端软件)可以通过此端点收集应用中Micrometer的度量数据。

整合Micrometer-registry-prometheus

添加Micrometer和Micrometer-registry-prometheus依赖

在Spring Boot项目中,需要添加Micrometer和Micrometer-registry-prometheus的依赖。可以在Maven的pom.xml文件中添加以下依赖:

1
2
3
4
5
6
7
8
9
10
11
12
13
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-core</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<!--springboot中此依赖可以帮助我们暴露java应用的一些指标信息,比如堆栈等等-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

添加配置

在Spring Boot应用程序中,需要配置Micrometer-registry-prometheus以便它可以将指标暴露给Prometheus。可以在application.propertiesapplication.yml文件中添加以下配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
management:
endpoints:
web:
exposure:
include: "*"
metrics:
export:
prometheus:
enabled: true
step: 10s
endpoint:
prometheus:
enabled: true

编写代码,配合过滤器记录指标(比如我们要记录qps和响应时间)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import io.micrometer.prometheus.PrometheusMeterRegistry;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.actuate.autoconfigure.metrics.MeterRegistryCustomizer;
import org.springframework.boot.autoconfigure.condition.ConditionalOnClass;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
* @author xiaowu
*/
@Configuration
@ConditionalOnClass(PrometheusMeterRegistry.class)
public class MicrometerConfig {

@Value("${spring.application.name}")
private String applicationName;

@Bean
public MeterRegistryCustomizer<PrometheusMeterRegistry> metricsCommonTags() {
return registry -> {
// 这里配置下公共tag,application在一些dashboard中默认会配置,所以此处最好加上
registry.config()
.commonTags("application", applicationName);
};
}
}

使用Micrometer提供的计数器和计时器来捕获QPS和响应时间指标。例如,可以在应用程序中创建以下计数器和计时器:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.springframework.stereotype.Component;

import javax.servlet.*;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;

/**
* 通过Filter来做统一的拦截
* @author xiaowu
*/
@Slf4j
@Component
@RequiredArgsConstructor
public class PrometheusFilter implements Filter {

private final MeterRegistry meterRegistry;

@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
HttpServletRequest servletRequest = (HttpServletRequest) request;
String path = servletRequest.getRequestURI();
if (StringUtils.contains(path,"/actuator")) {
// 不记录/actuator接口请求指标
chain.doFilter(request, response);
return;
}
// 使用Timer类型,记录请求的开始时间
Timer.Sample sample = Timer.start();
// 处理请求
chain.doFilter(request, response);
HttpServletResponse servletResponse = (HttpServletResponse) response;
final Counter requestsCounter = Counter.builder("http_requests_total")
.tag("path",path)
.tag("method",servletRequest.getMethod())
.tag("status",String.valueOf(servletResponse.getStatus()))
.register(meterRegistry);
final Timer requestsTimer = Timer.builder("http_request_duration_ms")
.tag("path",path)
.tag("method",servletRequest.getMethod())
.tag("status",String.valueOf(servletResponse.getStatus()))
.register(meterRegistry);
// 记录请求的结束时间
sample.stop(requestsTimer);
// 增加请求数量的计数器
requestsCounter.increment();
}
}

在这个例子中,我们使用MeterRegistry依赖注入了计数器和计时器。在doFilter方法中,我们使用计时器记录请求的响应时间,并使用计数器增加请求的数量。

访问Prometheus指标

现在,可以访问Prometheus Web UI并查看暴露的指标。在浏览器中打开以下URL:

1
http://{ip}:{port}/actuator/prometheus

/actuator/prometheus是我们抓取指标信息的接口,这个接口将显示所有暴露给Prometheus的指标。可以使用Prometheus查询语言(PromQL)来过滤和可视化指标。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
......
# HELP http_server_requests_seconds Duration of HTTP server request handling
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{application="sapi_wecom_sdkapi",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 1.0
http_server_requests_seconds_sum{application="sapi_wecom_sdkapi",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 0.032126591
http_server_requests_seconds_count{application="sapi_wecom_sdkapi",exception="None",method="POST",outcome="SUCCESS",status="200",uri="/conversation/image",} 2.0
http_server_requests_seconds_sum{application="sapi_wecom_sdkapi",exception="None",method="POST",outcome="SUCCESS",status="200",uri="/conversation/image",} 0.369479355
# HELP http_server_requests_seconds_max Duration of HTTP server request handling
# TYPE http_server_requests_seconds_max gauge
http_server_requests_seconds_max{application="sapi_wecom_sdkapi",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 0.032126591
http_server_requests_seconds_max{application="sapi_wecom_sdkapi",exception="None",method="POST",outcome="SUCCESS",status="200",uri="/conversation/image",} 0.26498721
# HELP http_requests_total
# TYPE http_requests_total counter
http_requests_total{application="sapi_wecom_sdkapi",code="200",method="POST",path="/conversation/image",status="200",} 2.0
# HELP jvm_gc_live_data_size_bytes Size of long-lived heap memory pool after reclamation
# TYPE jvm_gc_live_data_size_bytes gauge
jvm_gc_live_data_size_bytes{application="sapi_wecom_sdkapi",} 3.5658208E7
......

可以看到我们的指标信息如上

结合AOP方式自定义Prometheus监控指标

摘自:https://www.51cto.com/article/653186.html

目前大部分使用Spring Boot构建微服务体系的公司,大都在使用Prometheus来构建微服务的度量指标(Metrics)类监控系统。而一般做法是通过在微服务应用中集成Prometheus指标采集SDK,从而使得Spring Boot暴露相关Metrics采集端点来实现。

但一般来说,Spring Boot默认暴露的Metrics数量及类型是有限的,如果想要建立针对微服务应用更丰富的监控维度(例如TP90/TP99分位值指标之类),那么还需要我们在Spring Boot默认已经打开的Metrics基础之上,配置Prometheus类库(micrometer-registry-prometheus)所提供的其他指标类型。

但怎么样才能在Spring Boot框架中以更优雅地方式实现呢?难道需要在业务代码中编写各种自定义监控指标代码的暴露逻辑吗?接下来的内容我们将通过@注解+AOP的方式来演示如何以更加优雅的方式来实现Prometheus监控指标的自定义!

自定义监控指标配置注解

需要说明的是在Spring Boot应用中,对程序运行信息的收集(如指标、日志),比较常用的方法是通过Spring的AOP代理拦截来实现,但这种拦截程序运行过程的逻辑多少会损耗点系统性能,因此在自定义Prometheus监控指标的过程中,可以将是否上报指标的选择权交给开发人员,而从易用性角度来说,可以通过注解的方式实现。例如:

1
2
3
4
5
6
7
8
9
10
11
12
13
import java.lang.annotation.ElementType; 
import java.lang.annotation.Inherited;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

@Target({ElementType.METHOD})
@Retention(RetentionPolicy.RUNTIME)
@Inherited
public @interface Tp {

String description() default "";
}

如上所示代码,我们定义了一个用于标注上报计时器指标类型的注解,如果想统计接口的想TP90、TP99这样的分位值指标,那么就可以通过该注解标注。除此之外,还可以定义上报其他指标类型的注解,例如:

1
2
3
4
5
6
7
8
9
10
11
12
13
import java.lang.annotation.ElementType; 
import java.lang.annotation.Inherited;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

@Target({ElementType.METHOD})
@Retention(RetentionPolicy.RUNTIME)
@Inherited
public @interface Count {

String description() default "";
}

如上所示,我们定义了一个用于上报计数器类型指标的注解!如果要统计接口的平均响应时间、接口的请求量之类的指标,那么可以通过该注解标注!

而如果觉得分别定义不同指标类型的注解比较麻烦,对于某些接口上述各种指标类型都希望上报到Prometheus,那么也可以定义一个通用注解,用于同时上报多个指标类型,例如:

1
2
3
4
5
6
7
8
9
10
11
12
13
import java.lang.annotation.ElementType; 
import java.lang.annotation.Inherited;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

@Target({ElementType.METHOD})
@Retention(RetentionPolicy.RUNTIME)
@Inherited
public @interface Monitor {

String description() default "";
}

总之,无论是分开定义特定指标注解还是定义一个通用的指标注解,其目标都是希望以更灵活的方式来扩展Spring Boot微服务应用的监控指标类型。

自定义监控指标注解AOP代理逻辑实现

上面我们灵活定义了上报不同指标类型的注解,而上述注解的具体实现逻辑,可以通过定义一个通用的AOP代理类来实现,具体实现代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
import com.wudimanong.monitor.metrics.Metrics; 
import com.wudimanong.monitor.metrics.annotation.Count;
import com.wudimanong.monitor.metrics.annotation.Monitor;
import com.wudimanong.monitor.metrics.annotation.Tp;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Tag;
import io.micrometer.core.instrument.Tags;
import io.micrometer.core.instrument.Timer;
import java.lang.reflect.Method;
import java.util.function.Function;
import org.aspectj.lang.ProceedingJoinPoint;
import org.aspectj.lang.annotation.Around;
import org.aspectj.lang.annotation.Aspect;
import org.aspectj.lang.reflect.MethodSignature;
import org.springframework.stereotype.Component;

@Aspect
@Component
public class MetricsAspect {

/**
* Prometheus指标管理
*/
private MeterRegistry registry;

private Function<ProceedingJoinPoint, Iterable<Tag>> tagsBasedOnJoinPoint;

public MetricsAspect(MeterRegistry registry) {
this.init(registry, pjp -> Tags
.of(new String[]{"class", pjp.getStaticPart().getSignature().getDeclaringTypeName(), "method",
pjp.getStaticPart().getSignature().getName()}));
}

public void init(MeterRegistry registry, Function<ProceedingJoinPoint, Iterable<Tag>> tagsBasedOnJoinPoint) {
this.registry = registry;
this.tagsBasedOnJoinPoint = tagsBasedOnJoinPoint;
}

/**
* 针对@Tp指标配置注解的逻辑实现
*/
@Around("@annotation(com.wudimanong.monitor.metrics.annotation.Tp)")
public Object timedMethod(ProceedingJoinPoint pjp) throws Throwable {
Method method = ((MethodSignature) pjp.getSignature()).getMethod();
method = pjp.getTarget().getClass().getMethod(method.getName(), method.getParameterTypes());
Tp tp = method.getAnnotation(Tp.class);
Timer.Sample sample = Timer.start(this.registry);
String exceptionClass = "none";
try {
return pjp.proceed();
}catch (Exception ex) {
exceptionClass = ex.getClass().getSimpleName();
throw ex;
}finally {
try {
String finalExceptionClass = exceptionClass;
//创建定义计数器,并设置指标的Tags信息(名称可以自定义)
Timer timer = Metrics.newTimer("tp.method.timed",
builder -> builder.tags(new String[]{"exception", finalExceptionClass})
.tags(this.tagsBasedOnJoinPoint.apply(pjp)).tag("description", tp.description())
.publishPercentileHistogram().register(this.registry));
sample.stop(timer);
}catch (Exception exception) {
}
}
}

/**
* 针对@Count指标配置注解的逻辑实现
*/
@Around("@annotation(com.wudimanong.monitor.metrics.annotation.Count)")
public Object countMethod(ProceedingJoinPoint pjp) throws Throwable {
Method method = ((MethodSignature) pjp.getSignature()).getMethod();
method = pjp.getTarget().getClass().getMethod(method.getName(), method.getParameterTypes());
Count count = method.getAnnotation(Count.class);
String exceptionClass = "none";
try {
return pjp.proceed();
}catch (Exception ex) {
exceptionClass = ex.getClass().getSimpleName();
throw ex;
}finally {
try {
String finalExceptionClass = exceptionClass;
//创建定义计数器,并设置指标的Tags信息(名称可以自定义)
Counter counter = Metrics.newCounter("count.method.counted",
builder -> builder.tags(new String[]{"exception", finalExceptionClass})
.tags(this.tagsBasedOnJoinPoint.apply(pjp)).tag("description", count.description())
.register(this.registry));
counter.increment();
}catch (Exception exception) {
}
}
}

/**
* 针对@Monitor通用指标配置注解的逻辑实现
*/
@Around("@annotation(com.wudimanong.monitor.metrics.annotation.Monitor)")
public Object monitorMethod(ProceedingJoinPoint pjp) throws Throwable {
Method method = ((MethodSignature) pjp.getSignature()).getMethod();
method = pjp.getTarget().getClass().getMethod(method.getName(), method.getParameterTypes());
Monitor monitor = method.getAnnotation(Monitor.class);
String exceptionClass = "none";
try {
return pjp.proceed();
}catch (Exception ex) {
exceptionClass = ex.getClass().getSimpleName();
throw ex;
}finally {
try {
String finalExceptionClass = exceptionClass;
//计时器Metric
Timer timer = Metrics.newTimer("tp.method.timed",
builder -> builder.tags(new String[]{"exception", finalExceptionClass})
.tags(this.tagsBasedOnJoinPoint.apply(pjp)).tag("description", monitor.description())
.publishPercentileHistogram().register(this.registry));
Timer.Sample sample = Timer.start(this.registry);
sample.stop(timer);

//计数器Metric
Counter counter = Metrics.newCounter("count.method.counted",
builder -> builder.tags(new String[]{"exception", finalExceptionClass})
.tags(this.tagsBasedOnJoinPoint.apply(pjp)).tag("description", monitor.description())
.register(this.registry));
counter.increment();
}catch (Exception exception) {
}
}
}
}

上述代码完整的实现了前面我们定义的指标配置注解的逻辑,其中针对@Monitor注解的逻辑就是@Tp和@Count注解逻辑的整合。如果还需要定义其他指标类型,可以在此基础上继续扩展!

需要注意,在上述逻辑实现中对“Timer”及“Counter”等指标类型的构建这里并没有直接使用“micrometer-registry-prometheus”依赖包中的构建对象,而是通过自定义的Metrics.newTimer()这样的方式实现,其主要用意是希望以更简洁、灵活的方式去实现指标的上报,其代码定义如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import io.micrometer.core.instrument.Counter; 
import io.micrometer.core.instrument.Counter.Builder;
import io.micrometer.core.instrument.DistributionSummary;
import io.micrometer.core.instrument.Gauge;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.lang.NonNull;
import java.util.function.Consumer;
import java.util.function.Supplier;
import org.springframework.beans.BeansException;
import org.springframework.context.ApplicationContext;
import org.springframework.context.ApplicationContextAware;

public class Metrics implements ApplicationContextAware {

private static ApplicationContext context;

@Override
public void setApplicationContext(@NonNull ApplicationContext applicationContext) throws BeansException {
context = applicationContext;
}

public static ApplicationContext getContext() {
return context;
}

public static Counter newCounter(String name, Consumer<Builder> consumer) {
MeterRegistry meterRegistry = context.getBean(MeterRegistry.class);
return new CounterBuilder(meterRegistry, name, consumer).build();
}

public static Timer newTimer(String name, Consumer<Timer.Builder> consumer) {
return new TimerBuilder(context.getBean(MeterRegistry.class), name, consumer).build();
}
}

上述代码通过接入Spring容器上下文,获取了MeterRegistry实例,并以此来构建像Counter、Timer这样的指标类型对象。而这里之所以将获取方法定义为静态的,主要是便于在业务代码中进行引用!

而在上述代码中涉及的CounterBuilder、TimerBuilder构造器代码定义分别如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import io.micrometer.core.instrument.Counter; 
import io.micrometer.core.instrument.Counter.Builder;
import io.micrometer.core.instrument.MeterRegistry;
import java.util.function.Consumer;

public class CounterBuilder {

private final MeterRegistry meterRegistry;

private Counter.Builder builder;

private Consumer<Builder> consumer;

public CounterBuilder(MeterRegistry meterRegistry, String name, Consumer<Counter.Builder> consumer) {
this.builder = Counter.builder(name);
this.meterRegistry = meterRegistry;
this.consumer = consumer;
}

public Counter build() {
consumer.accept(builder);
return builder.register(meterRegistry);
}
}

上述代码为CounterBuilder构造器代码!TimerBuilder构造器代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import io.micrometer.core.instrument.MeterRegistry; 
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.Timer.Builder;
import java.util.function.Consumer;

public class TimerBuilder {

private final MeterRegistry meterRegistry;

private Timer.Builder builder;

private Consumer<Builder> consumer;

public TimerBuilder(MeterRegistry meterRegistry, String name, Consumer<Timer.Builder> consumer) {
this.builder = Timer.builder(name);
this.meterRegistry = meterRegistry;
this.consumer = consumer;
}

public Timer build() {
this.consumer.accept(builder);
return builder.register(meterRegistry);
}
}

之所以还特地将构造器代码单独定义,主要是从代码的优雅性考虑!如果涉及其他指标类型的构造,也可以通过类似的方法进行扩展!

自定义指标注解配置类

在上述代码中我们已经定义了几个自定义指标注解及其实现逻辑代码,为了使其在Spring Boot环境中运行,还需要编写如下配置类,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import com.wudimanong.monitor.metrics.Metrics; 
import io.micrometer.core.instrument.MeterRegistry;
import org.springframework.boot.actuate.autoconfigure.metrics.MeterRegistryCustomizer;
import org.springframework.boot.autoconfigure.condition.ConditionalOnMissingBean;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.env.Environment;

@Configuration
public class CustomMetricsAutoConfiguration {

@Bean
@ConditionalOnMissingBean
public MeterRegistryCustomizer<MeterRegistry> meterRegistryCustomizer(Environment environment) {
return registry -> {
registry.config()
.commonTags("application", environment.getProperty("spring.application.name"));
};
}

@Bean
@ConditionalOnMissingBean
public Metrics metrics() {
return new Metrics();
}
}

上述配置代码主要是约定了上报Prometheus指标信息中所携带的应用名称,并对自定义了Metrics类进行了Bean配置!

业务代码的使用方式及效果

接下来我们演示在业务代码中如果要上报Prometheus监控指标应该怎么写,具体如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import com.wudimanong.monitor.metrics.annotation.Count; 
import com.wudimanong.monitor.metrics.annotation.Monitor;
import com.wudimanong.monitor.metrics.annotation.Tp;
import com.wudimanong.monitor.service.MonitorService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping("/monitor")
public class MonitorController {

@Autowired
private MonitorService monitorServiceImpl;

//监控指标注解使用
//@Tp(description = "/monitor/test")
//@Count(description = "/monitor/test")
@Monitor(description = "/monitor/test")
@GetMapping("/test")
public String monitorTest(@RequestParam("name") String name) {
monitorServiceImpl.monitorTest(name);
return "监控示范工程测试接口返回->OK!";
}
}

如上述代码所示,在实际的业务编程中就可以比较简单的通过注解来配置接口所上传的Prometheus监控指标了!此时在本地启动程序,可以通过访问微服务应用的“/actuator/prometheus”指标采集端点来查看相关指标,如下图所示:

img

有了这些自定义上报的监控指标,那么Promethues在采集后,我们就可以通过像Grafana这样的可视化工具,来构建起多维度界面友好地监控视图了,例如以TP90/TP99为例:

img

如上所示,在Grafana中可以同时定义多个PromeQL来定于不同的监控指标信息,这里我们分别通过Prometheus所提供的“histogram_quantile”函数统计了接口方法“monitorTest”的TP90及TP95分位值!而所使用的指标就是自定义的“tp_method_timed_xx”指标类型!