spring gateway失败重试机制

全局路由,对所以路由服务提供失败重试机制。全局路由,也就是default-filters.

# 打开路由:
spring.cloud.gateway.default-filters[0].name=Retry

#根据默认参数,只重试GET请求,修改参数使POST也支持
spring.cloud.gateway.default-filters[0].args.methods[0]=GET
spring.cloud.gateway.default-filters[0].args.methods[1]=POST

spring cloud gateway的重试,不是加入spring-retry就可以自动重试的,也不是加入一些时间参数就可以重试的。
下面看一下spring的文档:

6.26. The Retry GatewayFilter Factory

The Retry GatewayFilter factory supports the following parameters:

  • retries: The number of retries that should be attempted.

  • statuses: The HTTP status codes that should be retried, represented by using org.springframework.http.HttpStatus.

  • methods: The HTTP methods that should be retried, represented by usingorg.springframework.http.HttpMethod.

  • series: The series of status codes to be retried, represented by using
    org.springframework.http.HttpStatus.Series.

  • exceptions: A list of thrown exceptions that should be retried.

  • backoff: The configured exponential backoff for the retries. Retries are performed after a backoff interval of firstBackoff * (factor ^ n), where n is the iteration. If maxBackoff is configured, the maximum backoff applied is limited to maxBackoff. If basedOnPreviousValue is true, the backoff is calculated byusing prevBackoff * factor.

The following defaults are configured for Retry filter, if enabled:

  • retries: Three times

  • series: 5XX series

  • methods: GET method

  • exceptions: IOException and TimeoutException

  • backoff: disabled

The following listing configures a Retry GatewayFilter:

Example 55. application.yml

spring:
  cloud:
    gateway:
      routes:
      - id: retry_test
        uri: http://localhost:8080/flakey
        predicates:
        - Host=*.retry.com
        filters:
        - name: Retry
          args:
            retries: 3
            statuses: BAD_GATEWAY
            methods: GET,POST
            backoff:
              firstBackoff: 10ms
              maxBackoff: 50ms
              factor: 2
              basedOnPreviousValue: false

上面的配置是官方文档中的举例,是配置到具体某个路由。

路由配置参数在org.springframework.cloud.gateway.config.GatewayProperties中定义;
路由定义由RouteDefinitionRouteLocator implements RouteLocator, BeanFactoryAware, ApplicationEventPublisherAware完成;

	private Route convertToRoute(RouteDefinition routeDefinition) {
		AsyncPredicate<ServerWebExchange> predicate = combinePredicates(routeDefinition);
		List<GatewayFilter> gatewayFilters = getFilters(routeDefinition);

		return Route.async(routeDefinition).asyncPredicate(predicate)
				.replaceFilters(gatewayFilters).build();
	}
    private List<GatewayFilter> getFilters(RouteDefinition routeDefinition) {
		List<GatewayFilter> filters = new ArrayList<>();

		// TODO: support option to apply defaults after route specific filters?
		if (!this.gatewayProperties.getDefaultFilters().isEmpty()) {
			filters.addAll(loadGatewayFilters(routeDefinition.getId(),
					new ArrayList<>(this.gatewayProperties.getDefaultFilters())));
		}

		if (!routeDefinition.getFilters().isEmpty()) {
			filters.addAll(loadGatewayFilters(routeDefinition.getId(),
					new ArrayList<>(routeDefinition.getFilters())));
		}

		AnnotationAwareOrderComparator.sort(filters);
		return filters;
	}

判断是否重试具体的代码:1

ServerWebExchange exchange = context.applicationContext();
if (exceedsMaxIterations(exchange, retryConfig)) {
	return false;
}
// 先判断状态码,状态码优先级高于series
HttpStatus statusCode = exchange.getResponse().getStatusCode();

boolean retryableStatusCode = retryConfig.getStatuses()
		.contains(statusCode);

// null status code might mean a network exception?
// 状态码不存在重试,在判断series
if (!retryableStatusCode && statusCode != null) {
	// try the series
	retryableStatusCode = false;
	for (int i = 0; i < retryConfig.getSeries().size(); i++) {
		if (statusCode.series().equals(retryConfig.getSeries().get(i))) {
			retryableStatusCode = true;
			break;
		}
	}
}

final boolean finalRetryableStatusCode = retryableStatusCode;
trace("retryableStatusCode: %b, statusCode %s, configured statuses %s, configured series %s",
		() -> finalRetryableStatusCode, () -> statusCode,
		retryConfig::getStatuses, retryConfig::getSeries);

// 判断http的method是否需要重试
HttpMethod httpMethod = exchange.getRequest().getMethod();
boolean retryableMethod = retryConfig.getMethods().contains(httpMethod);

trace("retryableMethod: %b, httpMethod %s, configured methods %s",
		() -> retryableMethod, () -> httpMethod, retryConfig::getMethods);
// 最终需要状态码与请求方法两个都满足才会重试
return retryableMethod && finalRetryableStatusCode;

默认参数seriesSERVER_ERROR服务端错误,所以再服务端重试,包括所有5xx错误。如果针对个别的4xx错误,再增加statues参数。

补充:我为什么倒腾retry

之前使用的eureka,服务上下线的时候也没有配置失败重试,一个服务上下线等一些列延迟,会导致网关向外提供接口存在访问失败的情况。

后来换成nacos,服务上下线的速度快了,再调整一些列超时和刷新时间参数,倒是可以减少上下线过程中api访问失败情况出现的时间范围(1秒~10秒)

# 减小间隔,快速更新网关的服务列表,维持列表是最新
# 但是刷新间隔的减小,频繁线程休眠与唤醒,效率肯定是不好滴
ribbon.ServerListRefreshInterval=1000

还查阅了其他ribbon参数,尝试加入重试机制,发现嗯。。。设置的参数可能不对吧,反正没有重试。

hystrix.command.default.execution.timeout.enabled=true
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds=25000
ribbon.ReadTimeout=20000
ribbon.ConnectTimeout=5000
ribbon.MaxAutoRetries=1
ribbon.MaxAutoRetriesNextServer=1

但是服务下线时有那么一段时间既不存在连接超时也不存在读超时,而且连接被拒绝;
另外如果服务不是优雅的下线,比如kill -9,网关会出现如下错误

2021-09-07 11:52:51,446 [reactor-http-epoll-1] TRACE o.s.c.g.f.LoadBalancerClientFilter - LoadBalancerClientFilter url before: lb://xxx/xx-api/ext/wanyee/msg/list?pageNo=30&pageSize=2&beginTime=2020-09-07%2000:00:00&endTime=2021-09-07%2023:59:59&keyword=&suid=&uid=
2021-09-07 11:52:51,446 [reactor-http-epoll-1] TRACE o.s.c.g.f.LoadBalancerClientFilter - LoadBalancerClientFilter url chosen: http://192.168.2.1:9898/xx-api/list?pageNo=30&pageSize=2&beginTime=2020-09-07%2000:00:00&endTime=2021-09-07%2023:59:59&keyword=&suid=&uid=
2021-09-07 11:52:51,450 [reactor-http-epoll-1] ERROR o.s.b.a.w.r.e.AbstractErrorWebExceptionHandler - [dc72bc7f-127]  500 Server Error for HTTP GET "/xx-api/list?pageNo=30&pageSize=2&beginTime=2020-09-07%2000:00:00&endTime=2021-09-07%2023:59:59&keyword=&suid=&uid="
io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: 拒绝连接: /192.168.2.56:9198
	Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: 
Error has been observed at the following site(s):
	|_ checkpoint ⇢ org.springframework.cloud.gateway.filter.WeightCalculatorWebFilter [DefaultWebFilterChain]
	|_ checkpoint ⇢ org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter [DefaultWebFilterChain]
	|_ checkpoint ⇢ HTTP GET "/ams-api/ext/wanyee/msg/list?pageNo=30&pageSize=2&beginTime=2020-09-07%2000:00:00&endTime=2021-09-07%2023:59:59&keyword=&suid=&uid=" [ExceptionHandlingWebHandler]
Stack trace:
Caused by: java.net.ConnectException: finishConnect(..) failed: 拒绝连接
	at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
	at io.netty.channel.unix.Socket.finishConnect(Socket.java:251)
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:673)
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:650)
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:530)
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:465)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
2021-09-07 11:52:51,452 [reactor-http-epoll-1] TRACE o.s.c.g.f.GatewayMetricsFilter - gateway.requests tags: [tag(httpMethod=GET),tag(httpStatusCode=500),tag(outcome=SERVER_ERROR),tag(routeId=ams-api),tag(routeUri=lb://ams),tag(status=INTERNAL_SERVER_ERROR)]

网上搜索错误关键字,好吧,没用的
但是,spring肯定会相办法解决的,这个就是重试机制。

上一篇:不要再纠结REST接口的名称该如何命名了


下一篇:SpringCloudGateway 网关