1.
16313:20140109:110809.577 resuming IPMI checks on host [10.1.3.41]: connection restored 16337:20140109:113655.574 IPMI item [Current_1] on host [10.1.3.41] failed: first network error, wait for 15 seconds 16313:20140109:113717.981 IPMI item [Current_1] on host [10.1.3.41] failed: another network error, wait for 15 seconds 16313:20140109:113733.014 IPMI item [Inlet_Temp] on host [10.1.3.41] failed: another network error, wait for 15 seconds
出现情况:有时候能够获取数据,有时候抱着个错误,
解决:调整server中的Timeout取值,从3s-->10s
2.
16315:20140109:140225.429 cannot send list of active checks to [10.192.0.5]: host [10.192.0.5] not found 16317:20140109:140425.532 cannot send list of active checks to [10.192.0.5]: host [10.192.0.5] not found
这个问题出现在直接删除主机10.192.0.5后,serverlog出现的
将10.192.0.5上的agent停止即可。
========================================================
第一个问题解决方法不是这样的。因为设置后仍然出现了错误,但是如果我将查询间隔从1800s调整到60秒,查看了大概几十分钟,没有出现这个问题。但背后的原因仍不清楚。