今天nagios告警: 172.17.9.76有Alert,看agent的日志有如下:
(Agent-Handler-3:null) Connected to the server
Lost connection to the server. Dealing with the remai
然后参考这篇文章
https://www.server110.com/cloudstack/201404/10553.html
重启agent、libvirtd服务,异常依然。重启host,问题还是一样。
从日志中能看出,异常是management-server在连接上cloud-agent后,刷新vm状态时问题导致的。而此时,除了vRouter,所有vm的状态均为Stoped。vRouter的状态缺为Running,就此找到问题所在。不知何故,在host上使用virsh list并不能看到vRouter,而management-server却认为他是Running状态,需要刷新一下状态,导致在management-server查询不到vRouter,所以抛出异常。这应该是一个bug,需要修复。
解决方案,删除vRoute(需要先在数据库将状态置为Stopped,执行sql “update vm_instance set state = 'Stopped' where vm_type = 'DomainRouter';”)。
################################################################################################################################
我的做法:
登录cloudstack数据库查询HOST上跑的虚机信息,针对不同的HOST机请修改查询语句的IP信息
select c.account_name,a.instance_name,a.display_name,a.state,b.public_ip_address from vm_instance a,host b,account c where a.power_host=b.id and a.account_id=c.id and a.instance_name like "i%" and a.state not in ("Expunging","Destroyed") and b.public_ip_address="172.17.9.56";
查询结果类似如下:
+---------------------+---------------+--------------+---------+-------------------+
| account_name | instance_name | display_name | state | public_ip_address |
+---------------------+---------------+--------------+---------+-------------------+
| 8871_9639_3908_8088 | i-150-568-VM | test2014002 | Running | 172.17.9.76 |
| 6455_1427_2201_7373 | i-162-613-VM | yaojianedu | Running | 172.17.9.76 |
| 4562_9860_0757_4566 | i-275-992-VM | cloudHost01 | Running | 172.17.9.76 |
可用如下命令进行查询应该启动的二级VR
mysql> select c.account_name,a.instance_name,a.display_name,a.state,b.public_ip_address from vm_instance a,host b,account c where a.power_host=b.id and a.account_id=c.id and a.instance_name like "r%" and a.state not in ("Expunging","Destroyed") and b.public_ip_address="172.17.9.53";
+---------------------+---------------+--------------+---------+-------------------+
| account_name | instance_name | display_name | state | public_ip_address |
+---------------------+---------------+--------------+---------+-------------------+
| 5357_3036_2997_0118 | r-46-VM | NULL | Running | 172.17.9.53 |
| 1095_2254_5824_2083 | r-82-VM | NULL | Running | 172.17.9.53 |
| 5806_7846_8176_1902 | r-118-VM | NULL | Running | 172.17.9.53 |
| admin | r-279-VM | NULL | Running | 172.17.9.53 |
在cloudstack agent 启动vr,重启cloudstack management 和 agent 服务 大概等15分钟Alert 消失