问题出现:
重启后出现了这样的情况:
查看详细的参数
查看数据库neutron 中对应的agents表。发现表中没有alive这个字段
这些服务的实际状态为active:
----1------
● neutron-l3-agent.service - OpenStack Neutron Layer 3 Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-l3-agent.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2016-06-22 19:39:34 EDT; 1h 31min ago
Main PID: 2847 (neutron-l3-agen)
CGroup: /system.slice/neutron-l3-agent.service
-----2-----
● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2016-06-22 19:39:34 EDT; 1h 30min ago
Main PID: 2846 (neutron-openvsw)
CGroup: /system.slice/neutron-openvswitch-agent.service
----3------
● neutron-dhcp-agent.service - OpenStack Neutron DHCP Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-dhcp-agent.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2016-06-22 19:39:34 EDT; 2h 12min ago
Main PID: 2848 (neutron-dhcp-ag)
CGroup: /system.slice/neutron-dhcp-agent.service
----4----
● neutron-metadata-agent.service - OpenStack Neutron Metadata Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-metadata-agent.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2016-06-22 21:18:53 EDT; 34min ago
Main PID: 13505 (neutron-metadat)
CGroup: /system.slice/neutron-metadata-agent.service
---5-----
[root@compute1 ~]# systemctl status neutron-openvswitch-agent.service
● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2016-06-22 19:39:13 EDT; 2h 16min ago
Main PID: 1435 (neutron-openvsw)
CGroup: /system.slice/neutron-openvswitch-agent.service
=================
在compute node中做了关闭一个服务 再开启一个服务 查看log
发现log中有这样的提示 :
Agent out of sync with plugin!
Agent tunnel out of sync with plugin!
----------------------关闭服务后重新开启服务----------------
systemctl stop neutron-openvswitch-agent.service
[root@compute1 ~]# systemctl status neutron-openvswitch-agent.service
● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Wed 2016-06-22 22:00:18 EDT; 1min 34s ago
[root@compute1 ~]# systemctl start neutron-openvswitch-agent.service
[root@compute1 ~]# systemctl status neutron-openvswitch-agent.service
● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2016-06-22 22:02:05 EDT; 38s ago
---------------log-------------------
[root@compute1 neutron]# tail -f openvswitch-agent.log
2016-06-22 22:02:05.933 17125 INFO neutron.common.config [-] Logging enabled!
2016-06-22 22:02:05.934 17125 INFO neutron.common.config [-] /usr/bin/neutron-openvswitch-agent version 2015.1.2
2016-06-22 22:02:05.943 17125 WARNING oslo_config.cfg [-] Option "lock_path" from group "DEFAULT" is deprecated. Use option "lock_path" from group "oslo_concurrency".
2016-06-22 22:02:06.952 17125 INFO oslo_messaging._drivers.impl_rabbit [-] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.015 17125 INFO oslo_messaging._drivers.impl_rabbit [-] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.036 17125 INFO oslo_messaging._drivers.impl_rabbit [-] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.075 17125 INFO oslo_messaging._drivers.impl_rabbit [-] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.705 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.726 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.745 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.763 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.778 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.795 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.814 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.835 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.852 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.872 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.890 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.907 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.925 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connecting to AMQP server on controller0:5672
2016-06-22 22:02:07.943 17125 INFO oslo_messaging._drivers.impl_rabbit [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Connected to AMQP server on controller0:5672
2016-06-22 22:02:07.964 17125 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Agent initialized successfully, now running...
2016-06-22 22:02:07.976 17125 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Agent out of sync with plugin!
2016-06-22 22:02:08.106 17125 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-a019d63c-ddf2-4df3-93a4-841fdda04cd7 ] Agent tunnel out of sync with plugin!
------------neutron-dhcp-agent.log-------
在此log中看到了没有发送report 报了一个 Failed reporting state错误
2016-06-21 22:55:35.950 13361 INFO neutron.agent.dhcp.agent [-] Synchronizing state complete
2016-06-21 22:55:35.952 13361 INFO oslo_messaging._drivers.impl_rabbit [-] Connecting to AMQP server on controller0:5672
2016-06-21 22:55:35.970 13361 INFO neutron.agent.dhcp.agent [req-12dfeee8-c542-46ad-b3f5-2557c2fcbd2f ] Synchronizing state
2016-06-21 22:55:35.995 13361 INFO oslo_messaging._drivers.impl_rabbit [-] Connected to AMQP server on controller0:5672
2016-06-21 22:55:36.057 13361 INFO neutron.agent.dhcp.agent [-] DHCP agent started
2016-06-21 22:55:36.074 13361 INFO neutron.agent.dhcp.agent [req-12dfeee8-c542-46ad-b3f5-2557c2fcbd2f ] Synchronizing state complete
2016-06-22 19:39:35.733 2848 INFO neutron.common.config [-] Logging enabled!
2016-06-22 19:39:35.756 2848 INFO neutron.common.config [-] /usr/bin/neutron-dhcp-agent version 2015.1.2
2016-06-22 19:39:35.869 2848 WARNING oslo_config.cfg [req-8833acf1-c0ca-4783-bbfc-12ccfe4717d6 ] Option "lock_path" from group "DEFAULT" is deprecated. Use option "lock_path" from group "oslo_concurrency".
2016-06-22 19:39:35.884 2848 INFO oslo_messaging._drivers.impl_rabbit [req-fba865de-cb4e-44ee-9ccd-25ca25b23be9 ] Connecting to AMQP server on controller0:5672
2016-06-22 19:39:35.908 2848 INFO neutron.agent.dhcp.agent [-] Synchronizing state
2016-06-22 19:39:35.945 2848 INFO oslo_messaging._drivers.impl_rabbit [req-fba865de-cb4e-44ee-9ccd-25ca25b23be9 ] Connected to AMQP server on controller0:5672
2016-06-22 19:39:35.957 2848 INFO oslo_messaging._drivers.impl_rabbit [req-fba865de-cb4e-44ee-9ccd-25ca25b23be9 ] Connecting to AMQP server on controller0:5672
2016-06-22 19:39:35.973 2848 INFO oslo_messaging._drivers.impl_rabbit [-] Connecting to AMQP server on controller0:5672
2016-06-22 19:39:36.258 2848 INFO oslo_messaging._drivers.impl_rabbit [-] Connected to AMQP server on controller0:5672
2016-06-22 19:39:36.269 2848 INFO oslo_messaging._drivers.impl_rabbit [req-fba865de-cb4e-44ee-9ccd-25ca25b23be9 ] Connected to AMQP server on controller0:5672
2016-06-22 19:40:36.281 2848 ERROR neutron.agent.dhcp.agent [-] Unable to sync network state.
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent Traceback (most recent call last):
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 157, in sync_state
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent active_networks = self.plugin_rpc.get_active_networks_info()
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 417, in get_active_networks_info
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent host=self.host)
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in call
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent retry=self.retry)
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent timeout=timeout, retry=retry)
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 350, in send
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent retry=retry)
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 339, in _send
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent result = self._waiter.wait(msg_id, timeout)
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 243, in wait
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent message = self.waiters.get(msg_id, timeout=timeout)
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 149, in get
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent 'to message ID %s' % msg_id)
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent MessagingTimeout: Timed out waiting for a reply to message ID 699644629f1740e8b6013baba374bbc2
2016-06-22 19:40:36.281 2848 TRACE neutron.agent.dhcp.agent
2016-06-22 19:40:36.310 2848 INFO oslo_messaging._drivers.impl_rabbit [-] Connecting to AMQP server on controller0:5672
2016-06-22 19:40:36.320 2848 ERROR neutron.agent.dhcp.agent [req-fba865de-cb4e-44ee-9ccd-25ca25b23be9 ] Failed reporting state!
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent Traceback (most recent call last):
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 575, in _report_state
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent self.state_rpc.report_state(ctx, self.agent_state, self.use_call)
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 80, in report_state
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent return method(context, 'report_state', **kwargs)
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in call
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent retry=self.retry)
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent timeout=timeout, retry=retry)
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 350, in send
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent retry=retry)
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 339, in _send
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent result = self._waiter.wait(msg_id, timeout)
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 243, in wait
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent message = self.waiters.get(msg_id, timeout=timeout)
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 149, in get
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent 'to message ID %s' % msg_id)
2016-06-22 19:40:36.320 2848 TRACE neutron.agent.dhcp.agent MessagingTimeout: Timed out waiting for a reply to message ID 27290842d8d241e1b71757fb33c57f65
----google了一下----
that when the agents first boot up, they are out of sync. And that's normal behaviour. Then they do synchronize, but no message is written back in the logs,
当重启agent的时候如果提示下面的是正确的举动:
Agent tunnel out of sync with plugin!
Agent out of sync with plugin!
--------google到一个可能的解决方案点-------
也就是说agents会固定一段时间(75s)去向neutron-server报告,如果neutron-server没有收到agents们的报告就会显示为XXX。从这个点出发,建议查看schedule task
Agents report their own status to neutron-server periodically. The default inter time is 75 seconds. If neutron server can't recieve the report in 75 secods,the alive of the agent will be xxx. And it will be changed to :-) after recieving new status report.
Translates into : If this happens all of the time and is causing issues with scheduling then you should look into load on the servers where the agents are running, look into the logs of the agents, see if there are any issues with scheduled tasks
----在log中发现了一些问题,似乎是因为没有qdhcp的namespace---
2016-06-23 15:21:39.505 2876 INFO oslo_messaging._drivers.impl_rabbit [-] Connecting to AMQP server on controller0:5672
2016-06-23 15:21:39.535 2876 INFO neutron.agent.dhcp.agent [req-572ca88c-99d5-4ae9-8c06-86b62ee54745 ] Synchronizing state
2016-06-23 15:21:39.540 2876 INFO oslo_messaging._drivers.impl_rabbit [-] Connected to AMQP server on controller0:5672
2016-06-23 15:21:39.586 2876 INFO neutron.agent.dhcp.agent [-] DHCP agent started
2016-06-23 15:21:40.584 2876 ERROR neutron.agent.linux.utils [req-572ca88c-99d5-4ae9-8c06-86b62ee54745 ]
Command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'delete', 'qdhcp-60cf7464-09c8-4c4a-ba8d-cd3004970bd6']
Exit code: 1
Stdin:
Stdout:
Stderr: Cannot remove namespace file "/var/run/netns/qdhcp-60cf7464-09c8-4c4a-ba8d-cd3004970bd6": No such file or directory
2016-06-23 15:21:40.584 2876 WARNING neutron.agent.linux.dhcp [req-572ca88c-99d5-4ae9-8c06-86b62ee54745 ] Failed trying to delete namespace: qdhcp-60cf7464-09c8-4c4a-ba8d-cd3004970bd6
2016-06-23 15:21:40.585 2876 INFO neutron.agent.dhcp.agent [req-572ca88c-99d5-4ae9-8c06-86b62ee54745 ] Synchronizing state complete
2016-06-23 15:21:45.586 2876 INFO neutron.agent.dhcp.agent [-] Synchronizing state
2016-06-23 15:21:45.693 2876 INFO neutron.agent.dhcp.agent [-] Synchronizing state complete
2016-06-23 16:50:22.062 2833 INFO neutron.common.config [-] Logging enabled!
2016-06-23 16:50:22.063 2833 INFO neutron.common.config [-] /usr/bin/neutron-dhcp-agent version 2015.1.2
2016-06-23 16:50:22.100 2833 WARNING oslo_config.cfg [req-606d77c7-5a86-4edc-b03e-2440cb627da1 ] Option "lock_path" from group "DEFAULT" is deprecated. Use option "lock_path" from group "oslo_concurrency".
2016-06-23 16:50:22.106 2833 INFO oslo_messaging._drivers.impl_rabbit [req-d6ceb7b0-dd30-45aa-bcb6-32e3d6f04be8 ] Connecting to AMQP server on controller0:5672
2016-06-23 16:50:22.114 2833 INFO neutron.agent.dhcp.agent [-] Synchronizing state
2016-06-23 16:50:22.135 2833 INFO oslo_messaging._drivers.impl_rabbit [req-d6ceb7b0-dd30-45aa-bcb6-32e3d6f04be8 ] Connected to AMQP server on controller0:5672
2016-06-23 16:50:22.155 2833 INFO oslo_messaging._drivers.impl_rabbit [req-d6ceb7b0-dd30-45aa-bcb6-32e3d6f04be8 ] Connecting to AMQP server on controller0:5672
2016-06-23 16:50:22.168 2833 INFO oslo_messaging._drivers.impl_rabbit [-] Connecting to AMQP server on controller0:5672
2016-06-23 16:50:22.208 2833 INFO oslo_messaging._drivers.impl_rabbit [req-d6ceb7b0-dd30-45aa-bcb6-32e3d6f04be8 ] Connected to AMQP server on controller0:5672
2016-06-23 16:50:22.210 2833 INFO oslo_messaging._drivers.impl_rabbit [-] Connected to AMQP server on controller0:5672
2016-06-23 16:50:22.370 2833 INFO neutron.agent.dhcp.agent [-] Synchronizing state complete
2016-06-23 16:50:22.402 2833 INFO oslo_messaging._drivers.impl_rabbit [-] Connecting to AMQP server on controller0:5672
2016-06-23 16:50:22.421 2833 INFO neutron.agent.dhcp.agent [req-d6ceb7b0-dd30-45aa-bcb6-32e3d6f04be8 ] Synchronizing state
2016-06-23 16:50:22.433 2833 INFO oslo_messaging._drivers.impl_rabbit [-] Connected to AMQP server on controller0:5672
2016-06-23 16:50:22.463 2833 INFO neutron.agent.dhcp.agent [-] DHCP agent started
2016-06-23 16:50:22.521 2833 INFO neutron.agent.dhcp.agent [req-d6ceb7b0-dd30-45aa-bcb6-32e3d6f04be8 ] Synchronizing state complete
2016-06-23 16:51:49.764 2833 INFO neutron.openstack.common.service [req-606d77c7-5a86-4edc-b03e-2440cb627da1 ] Caught SIGTERM, exiting
2016-06-23 16:51:49.819 2833 ERROR oslo_messaging._drivers.impl_rabbit [-] Failed to consume message from queue:
2016-06-23 16:51:50.416 3288 INFO neutron.common.config [-] Logging enabled!
2016-06-23 16:51:50.416 3288 INFO neutron.common.config [-] /usr/bin/neutron-dhcp-agent version 2015.1.2
2016-06-23 16:51:50.430 3288 WARNING oslo_config.cfg [req-98f04cfd-f5db-4ff7-957d-9b436ac4a7ad ] Option "lock_path" from group "DEFAULT" is deprecated. Use option "lock_path" from group "oslo_concurrency".
===最终问题解决===
原因:
各个node间的时间不同步导致了各个agent不能正常的运行
[解决方案]
troubleshooting ntp or chrony to sync time with different node
1.在controller0上建立timeServer network and compute sync controller0
2.restart neutron agents on network node
3.restart nova-compute service on compute node
4.check neutron mult-agent service on network node