我定期运行任务并为间隔提供灵活性,下一个超时在每个任务结束时计算,从Instant.now()转换为毫秒,并使用ScheduledExecutorService#schedule进行调度.
这段代码通常工作正常(左边的蓝色曲线),但其他日子则不太好.
在我看来,事情有时会在启动时变坏(机器每晚重启),虽然程序应该并且确实纠正自己,但ScheduledExecutorService#schedule不能恢复,并且计划任务一直运行得很晚.似乎完整的JVM重启是唯一的解决方案.
我最初的想法是,这是一个错误,根据机器启动的时间,事情可能会出错.但是以下日志输出表明该问题与我对ScheduledExecutorService#schedule的使用有关:
// Log time in GMT+2, other times are in GMT
// The following lines are written following system startup (all times are correct)
08 juin 00:08:49.993 [main] WARN com.pgscada.webdyn.Webdyn - Scheduling next webdyn service time. Currently 2018-06-07T22:08:49.993Z, last connection null
08 juin 00:08:50.586 [main] INFO com.pgscada.webdyn.Webdyn - The next data sample at 2018-06-07T22:10:00Z and the next FTP connection at 2018-06-07T22:30:00Z
08 juin 00:08:50.586 [main] WARN com.pgscada.webdyn.Webdyn - Completed webdyn schedule in 9ms, next execution at 2018-06-07T22:10:00Z (in 69414 ms) will run as data-sample
// So we are expecting the next execution to occur at 00:10:00 (or in 69.4 seconds)
// Except that it runs at 00:11:21
08 juin 00:11:21.206 [pool-1-thread-4] INFO com.pgscada.webdyn.Webdyn - Executing Webdyn service, isDataSample=true, isFtpConnection=false, nextTimeout=2018-06-07T22:10:00Z, lastFtpConnection=null
// But thats OK because it should correct itself
08 juin 00:13:04.151 [pool-1-thread-4] WARN com.pgscada.webdyn.Webdyn - Scheduling next webdyn service time. Currently 2018-06-07T22:10:00Z, last connection null
08 juin 00:13:04.167 [pool-1-thread-4] INFO com.pgscada.webdyn.Webdyn - The next data sample at 2018-06-07T22:20:00Z and the next FTP connection at 2018-06-07T22:30:00Z
08 juin 00:13:04.167 [pool-1-thread-4] WARN com.pgscada.webdyn.Webdyn - Completed webdyn schedule in 0ms, next execution at 2018-06-07T22:20:00Z (in 415833 ms) will run as data-sample
// So now we are expecting the next execution to occur at 00:20:00 (or in 415.8 seconds)
// But it runs at 00:28:06
08 juin 00:28:06.145 [pool-1-thread-4] INFO com.pgscada.webdyn.Webdyn - Executing Webdyn service, isDataSample=true, isFtpConnection=false, nextTimeout=2018-06-07T22:20:00Z, lastFtpConnection=null
以下是调度功能的实际生产代码.
ScheduledExecutorService EXECUTORS = Executors.newScheduledThreadPool(10);
private void scheduleNextTimeout(Instant currentTime, Instant lastFtpConnection) {
try {
log.info("Scheduling next webdyn service time. Currently {}, last connection {}", currentTime, lastFtpConnection);
// Parse config files first
getConfigIni().parse();
long time = System.nanoTime();
final Instant earliestPossibleTimeout = Instant.now().plusSeconds(5);
Instant nextDataSample = nextTimeout(currentTime);
if (nextDataSample.isBefore(earliestPossibleTimeout)) {
final Instant oldTime = nextDataSample;
nextDataSample = nextTimeout(earliestPossibleTimeout);
log.warn("Next data sample was calculated to a time in the past '{}', resetting to a future time: {}", oldTime, nextDataSample);
}
Instant nextFtp = nextFtpConnection(currentTime, lastFtpConnection);
if (nextFtp.isBefore(earliestPossibleTimeout)) {
final Instant oldTime = nextFtp;
nextFtp = nextFtpConnection(earliestPossibleTimeout, lastFtpConnection);
log.warn("Next FTP connection was calculated to a time in the past '{}', resetting to a future time: {}", oldTime, nextFtp);
}
final boolean isFtpConnection = !nextDataSample.isBefore(nextFtp);
final boolean isDataSample = !isFtpConnection || nextDataSample.equals(nextFtp);
log.info("The next data sample at {} and the next FTP connection at {}", nextDataSample, nextFtp);
final Instant nextTimeout = nextDataSample.isBefore(nextFtp) ? nextDataSample : nextFtp;
final long millis = Duration.between(Instant.now(), nextTimeout).toMillis();
EXECUTORS.schedule(() -> {
log.info("Executing Webdyn service, isDataSample={}, isFtpConnection={}, nextTimeout={}, lastFtpConnection={}",
isDataSample, isFtpConnection, nextTimeout, lastFtpConnection);
long tme = System.nanoTime();
try {
connect(isDataSample, isFtpConnection, nextTimeout, lastFtpConnection);
log.warn("Completed webdyn service in {}s", (System.nanoTime() - tme) / 1000000);
} catch (final Throwable ex) {
log.error("Failed webdyn service after {}ms : {}", (System.nanoTime() - tme) / 1000000, ex.getMessage(), ex);
} finally {
scheduleNextTimeout(nextTimeout, isFtpConnection ? nextTimeout : lastFtpConnection);
}
}, millis, TimeUnit.MILLISECONDS);
log.warn("Completed webdyn schedule in {}ms, next execution at {} (in {} ms) will run as {}",
(System.nanoTime() - time) / 1000000, nextTimeout, millis, isFtpConnection ? "ftp-connection" : "data-sample");
} catch (final Throwable ex) {
log.error("Fatal error in webdyn schedule : {}", ex.getMessage(), ex);
}
}
解决方法:
正如我在评论中所提到的那样,这里的问题是有一个共享的,可变的,非线程安全的资源(EXECUTORS atribute)被多个线程改变了.
它在启动时由主线程更改,并且从池中使用任何线程来执行任务.
需要注意的是,即使您一次只有一个线程访问共享资源(仅仅因为一次只运行一个任务),您仍然需要考虑一致性.这是因为没有同步,Java内存模型不保证一个线程所做的更改对其他线程一直可见,无论它们运行多久.
因此解决方案是使方法scheduleNextTimeout同步,从而保证更改不会保持在本地执行线程并写入主存储器.
您还可以在部件周围创建一个同步块(在“this”上同步),这样可以访问共享资源,但由于系统似乎不是重型的,其余的代码似乎不需要很长一段时间,没有必要……
在我第一次遇到这类问题的时候,我曾经从这篇很好的短篇文章中汲取了它的主旨:)
https://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#jsr133
我很高兴我可以帮忙.