Apollo(阿波罗)是携程框架部门研发的开源配置管理中心,能够集中化管理应用不同环境、不同集群的配置,配置修改后能够实时推送到应用端,并且具备规范的权限、流程治理等特性。这篇文章主要来剖析客户端获取配置的逻辑及ConfigService长连接实现的原理,帮助使用者深入了解Apollo配置中心。
一、Apollo整体架构
企业的配置中心框架由四个核心模块及三个辅助服务发现模块。
四个核心模块是:
1)ConfigService:主要功能是提供配置获取接口、提供配置推送接口、服务于Apollo客户端
2)AdminService:主要功能是提供配置管理接口、提供配置修改发布接口、服务于管理界面Portal
3)Client:主要作用是为应用获取配置,支持实时更新、通过MetaServer获取ConfigService的服务列表、使用客户端软负载SLB方式调用ConfigService
4)Portal:主要作用是配置管理界面、通过MetaServer获取AdminService的服务列表、使用客户端软负载SLB方式调用AdminService
三个辅助服务发现模块:
1)Eureka用于服务发现和注册、Config/AdminService注册实例并定期报心跳和ConfigService住在一起部署
- MetaServer:Portal通过域名访问MetaServer获取AdminService地址列表、Client通过域名访问MetaServer获取ConfigService地址列表、相当于一个Eureka Proxy、逻辑角色和ConfigService住在一起部署
3)NginxLB:和域名系统配合,协助Portal访问MetaServer获取AdminService地址列表、和域名系统配合,协助Client访问MetaServer获取ConfigService地址列表、和域名系统配合,协助用户访问Portal进行配置管理
Apollo配置中心客户端负责访问配置中心服务来获取配置数据,并实现配置热生效,但是对于像端口号这种原生配置,是需要重启服务才能生效的。
Apollo配置中心客户端核心包是apollo-core.jar和apollo-client.jar。
具体依赖结构请看下图:
二、客户端获取配置逻辑
配置中心客户端主要类图如下:
-
- 1.1. 获取服务列表请求
请求1是客户端从注册中心获取ConfigService服务列表,保证长连接能正常和某个可用ConfigService服务节点保持连接和短连接从哪个节点获取配置。当无法连接注册中心时,重试逻辑采用二进制指数退避算法,算法如下:
public long fail() {
long delayTime = lastDelayTime;
if (delayTime == 0) {
delayTime = delayTimeLowerBound;
} else {
delayTime = Math.min(lastDelayTime << 1, delayTimeUpperBound);
}
lastDelayTime = delayTime;
return delayTime;
}
请求url是http://apollo.meta/services/config?appId={appId}&ip={ip}
-
- 获取配置请求
获取配置请求是客户端每隔五分钟定时拉取,定时拉取保证了在长连接无法正常监听配置变化实时,在最悲观的情况下客户端在配置发布后五分钟,客户端也能获取到最新发布的配置信息。短连接接口流量控制策略是采用谷歌guava RateLimiter,就是基于令牌桶算法实现,代码实现简易。可动态配置令牌生成速率,令牌速率为每秒两个,获取令牌超时时间是五秒。
private void schedulePeriodicRefresh() {
logger.debug("Schedule periodic refresh with interval: {} {}",
m_configUtil.getRefreshInterval(), m_configUtil.getRefreshIntervalTimeUnit());
//定时任务,每隔五分钟发送一个请求
m_executorService.scheduleAtFixedRate(
new Runnable() {
@Override
public void run() {
Tracer.logEvent("Apollo.ConfigService", String.format("periodicRefresh: %s", m_namespace));
logger.debug("refresh config for namespace: {}", m_namespace);
trySync();
Tracer.logEvent("Apollo.Client.Version", Apollo.VERSION);
}
}, m_configUtil.getRefreshInterval(), m_configUtil.getRefreshInterval(),
m_configUtil.getRefreshIntervalTimeUnit());
}
//获取配置信息
private ApolloConfig loadApolloConfig() {
if (!m_loadConfigRateLimiter.tryAcquire(5, TimeUnit.SECONDS)) {
//wait at most 5 seconds
try {
TimeUnit.SECONDS.sleep(5);
} catch (InterruptedException e) {
}
}
String appId = m_configUtil.getAppId();
String cluster = m_configUtil.getCluster();
String dataCenter = m_configUtil.getDataCenter();
String secret = m_configUtil.getAccessKeySecret();
Tracer.logEvent("Apollo.Client.ConfigMeta", STRING_JOINER.join(appId, cluster, m_namespace));
int maxRetries = m_configNeedForceRefresh.get() ? 2 : 1;
long one rrorSleepTime = 0; // 0 means no sleep
Throwable exception = null;
List<ServiceDTO> configServices = getConfigServices();
String url = null;
retryLoopLabel:
for (int i = 0; i < maxRetries; i++) {
List<ServiceDTO> randomConfigServices = Lists.newLinkedList(configServices);
Collections.shuffle(randomConfigServices);
//Access the server which notifies the client first
if (m_longPollServiceDto.get() != null) {
randomConfigServices.add(0, m_longPollServiceDto.getAndSet(null));
}
for (ServiceDTO configService : randomConfigServices) {
if (onErrorSleepTime > 0) {
logger.warn( "Load config failed, will retry in {} {}. appId: {}, cluster: {}, namespaces: {}",onErrorSleepTime, m_configUtil.getOnErrorRetryIntervalTimeUnit(), appId, cluster, m_namespace);
try {
m_configUtil.getOnErrorRetryIntervalTimeUnit().sleep(onErrorSleepTime);
} catch (InterruptedException e) {
//ignore
}
}
url = assembleQueryConfigUrl(configService.getHomepageUrl(), appId, cluster, m_namespace, dataCenter, m_remoteMessages.get(), m_configCache.get());
logger.debug("Loading config from {}", url);
HttpRequest request = new HttpRequest(url);
if (!StringUtils.isBlank(secret)) {
Map<String, String> headers = Signature.buildHttpHeaders(url, appId, secret);
request.setHeaders(headers);
}
Transaction transaction = Tracer.newTransaction("Apollo.ConfigService", "queryConfig");
transaction.addData("Url", url);
try {
HttpResponse<ApolloConfig> response = m_httpUtil.doGet(request, ApolloConfig.class);
m_configNeedForceRefresh.set(false);
m_loadConfigFailSchedulePolicy.success();
transaction.addData("StatusCode", response.getStatusCode());
transaction.setStatus(Transaction.SUCCESS);
if (response.getStatusCode() == 304) {
logger.debug("Config server responds with 304 HTTP status code.");
return m_configCache.get();
}
ApolloConfig result = response.getBody();
logger.debug("Loaded config for {}: {}", m_namespace, result);
return result;
} catch (ApolloConfigStatusCodeException ex) {
ApolloConfigStatusCodeException statusCodeException = ex;
//config not found
if (ex.getStatusCode() == 404) {
String message = String.format(
"Could not find config for namespace - appId: %s, cluster: %s, namespace: %s, " +
"please check whether the configs are released in Apollo!",
appId, cluster, m_namespace);
statusCodeException = new ApolloConfigStatusCodeException(ex.getStatusCode(),
message);
}
Tracer.logEvent("ApolloConfigException", ExceptionUtil.getDetailMessage(statusCodeException));
transaction.setStatus(statusCodeException);
exception = statusCodeException;
if(ex.getStatusCode() == 404) {
break retryLoopLabel;
}
} catch (Throwable ex) {
Tracer.logEvent("ApolloConfigException", ExceptionUtil.getDetailMessage(ex));
transaction.setStatus(ex);
exception = ex;
} finally {
transaction.complete();
}
// if force refresh, do normal sleep, if normal config load, do exponential sleep
one rrorSleepTime = m_configNeedForceRefresh.get() ? m_configUtil.getOnErrorRetryInterval() :
m_loadConfigFailSchedulePolicy.fail();
}
}
String message = String.format(
"Load Apollo Config failed - appId: %s, cluster: %s, namespace: %s, url: %s",
appId, cluster, m_namespace, url);
throw new ApolloConfigException(message, exception);
}
配置实时生效的单靠短连接肯定是不能完成的,需要和长连接配合完成。下面将介绍长连接实现原理及长连接和短连接如何配合完成配置实时生效。
-
- 长连接实现原理
长连接,顾名思义就是客户端与服务端建立连接后不断开,一个客户端就是一个长连接,,而不是一个Namespace一个长连接,如果想要实现动态关闭某个appId长连接,是可以通过修改下面代码实现,同时通过修改短连接请求,添加开与关标志位实现。Apollo配置中心长连接实现逻辑是具体是什么呢?
-
-
- 1.3.1. 客户端主要逻辑
-
private void doLongPollingRefresh(String appId, String cluster, String ataCenter, String secret) {
final Random random = new Random();
ServiceDTO lastServiceDto = null;
//循环发生请求,实现客户端长连接
while (!m_longPollingStopped.get() && !Thread.currentThread().isInterrupted()) {
if (!m_longPollRateLimiter.tryAcquire(5, TimeUnit.SECONDS)) {
//wait at most 5 seconds
try {
TimeUnit.SECONDS.sleep(5);
} catch (InterruptedException e) {
}
}
Transaction transaction = Tracer.newTransaction("Apollo.ConfigService", "pollNotification");
String url = null;
try {
if (lastServiceDto == null) {
List<ServiceDTO> configServices = getConfigServices();
//随机获取一个可用的节点
lastServiceDto =configServices.get(random.nextInt(configServices.size()));
}
//拼接请求URL
url = assembleLongPollRefreshUrl(lastServiceDto.getHomepageUrl(), appId, cluster, dataCenter, m_notifications);
logger.debug("Long polling from {}", url);
HttpRequest request = new HttpRequest(url);
request.setReadTimeout(LONG_POLLING_READ_TIMEOUT);
if (!StringUtils.isBlank(secret)) {
Map<String, String> headers = Signature.buildHttpHeaders(url, appId, secret);
request.setHeaders(headers);
}
transaction.addData("Url", url);
//请求发送
final HttpResponse<List<ApolloConfigNotification>> response =
m_httpUtil.doGet(request, m_responseType);
logger.debug("Long polling response: {}, url: {}", response.getStatusCode(), url);
if (response.getStatusCode() == 200 && response.getBody() != null) {
updateNotifications(response.getBody());
updateRemoteNotifications(response.getBody());
transaction.addData("Result", response.getBody().toString());
notify(lastServiceDto, response.getBody());
}
//try to load balance
//304: NOT_MODIFIED(304, "Not Modified"),
//random.nextBoolean:伪均匀分布的布尔值,保证长连接均匀分布在各节点上
if (response.getStatusCode() == 304 && random.nextBoolean()) {
lastServiceDto = null;
}
m_longPollFailSchedulePolicyInSecond.success();
transaction.addData("StatusCode", response.getStatusCode());
transaction.setStatus(Transaction.SUCCESS);
} catch (Throwable ex) {
lastServiceDto = null;
Tracer.logEvent("ApolloConfigException", ExceptionUtil.getDetailMessage(ex));
transaction.setStatus(ex);
long sleepTimeInSecond = m_longPollFailSchedulePolicyInSecond.fail();
logger.warn(
"Long polling failed, will retry in {} seconds. appId: {}, cluster: {}, namespaces: {}, long polling url: {}, reason: {}",
sleepTimeInSecond, appId, cluster, assembleNamespaces(), url, ExceptionUtil.getDetailMessage(ex));
try {
TimeUnit.SECONDS.sleep(sleepTimeInSecond);
} catch (InterruptedException ie) {
//ignore
}
} finally {
transaction.complete();
}
}
}
从上面代码可以看出,长连接实现逻辑是一个无限循环发请求的过程。返回200更新本地配置,返回304说明配置没有改变。
-
-
- 1.3.2. 服务端主要逻辑
-
Namespace在点击发布按钮时会项ReleaseMessage表里插入一条信息,AppId+集群+Namespace(apollo+default+application),代码如下
DeferredResultWrapper deferredResultWrapper = new DeferredResultWrapper();
Set<String> namespaces = Sets.newHashSet();
Map<String, Long> clientSideNotifications = Maps.newHashMap();
Map<String, ApolloConfigNotification> filteredNotifications = filterNotifications(appId, notifications);
for (Map.Entry<String, ApolloConfigNotification> notificationEntry : filteredNotifications.entrySet()) {
String normalizedNamespace = notificationEntry.getKey();
ApolloConfigNotification notification = notificationEntry.getValue();
namespaces.add(normalizedNamespace);
clientSideNotifications.put(normalizedNamespace, notification.getNotificationId());
if (!Objects.equals(notification.getNamespaceName(), normalizedNamespace)) {
deferredResultWrapper.recordNamespaceNameNormalizedResult(notification.getNamespaceName(), normalizedNamespace);
}
}
if (CollectionUtils.isEmpty(namespaces)) {
throw new BadRequestException("Invalid format of notifications: " + notificationsAsString);
}
Multimap<String, String> watchedKeysMap =
watchKeysUtil.assembleAllWatchKeys(appId, cluster, namespaces, dataCenter);
Set<String> watchedKeys = Sets.newHashSet(watchedKeysMap.values());
//获取ReleaseMessage的最新数据,然后通知客户端配置信息有修改
List<ReleaseMessage> latestReleaseMessages =
releaseMessageService.findLatestReleaseMessagesGroupByMessages(watchedKeys);
/**
* Manually close the entity manager.
* Since for async request, Spring won't do so until the request is finished,
* which is unacceptable since we are doing long polling - means the db connection would be hold
* for a very long time
*/
entityManagerUtil.closeEntityManager();
List<ApolloConfigNotification> newNotifications = getApolloConfigNotifications(namespaces, clientSideNotifications, watchedKeysMap,latestReleaseMessages);
服务端保持长连接的核心就不得不提DeferredResultWrapper这个类,这是对DeferredResult-异步请求处理的一个包装类,使用DeferredResult的流程:
1.客户端发起异步请求
2. 请求到达ConfigService被挂起
- 向客户端进行响应,分为两种情况:
3.1 ReleaseMessage有新数据插入,说明配置发生改变,调用DeferredResult.setResult(),请求被唤醒,返回结果,response.getStatusCode()等于 200,此时客户端将发送一个短连接去获取最新配置信息,
3.2 超时,超时时间为60秒,返回一个你设定的结果,response.getStatusCode()等于 304 - 客户端得到响应,while循环,在进行下一次的请求发送。
综上所述可以看出客户端实时获取最新配置就是采用这种半推半拉的方式,长连接逻辑在配置没有被修改时60秒一个Http请求,当配置发生修改时,被挂起的请求就立即返回,通知客户端配置发生修改,客户端发送一个短连接获取最新配置信息。