SUN M8000主机 IOU板卡硬件更换
一、 故障现象
1. 日志分析
XSCF> showstatus
* IOU#0 Status:Degraded;
XSCF> showlogs -v error
Date: May 24 20:44:22 CST 2019 Code: 80002000-33010000-0167058a00000000
Status: Alarm Occurred: May 24 20:44:20.367 CST 2019
FRU: /IOU#0
Msg: DDC configuration error. DDC wrongly installed(detector=5)
Date: May 25 19:09:03 CST 2019 Code: 80006000-33010000-0167058a00000000
Status: Alarm Occurred: May 25 19:08:38.909 CST 2019
FRU: /IOU#0
Msg: DDC configuratin error. DDC wrongly installed(detector=5)
通过showstatus命令可以看到IOU#0已为降级状态,IOU上装有硬盘、HBA卡以及网卡等备件,IOU#0为降级状态则运行不稳定,存在再次异常宕机隐患,一旦发生故障会影响操作系统正常运行,需要及时更换IOU#0模块。
2. 故障部件示意图
IOU#0在M8000服务器上的安装位置如下图(红色方框中的部件):
二、 实施过程
1. 确定故障件Domain
当前M8000服务器Domain划分如下表:
Domain |
CMU |
IOU |
备注 |
0 |
CMU#0、CMU#1 |
IOU#0 |
|
1 |
CMU#2、CMU#3 |
IOU#2 |
从Domain划分可见,发生故障的CMU#1和IOU#0都属于Domain 0。
2. 确定版本
XSCF> version -c xcp
XSCF#0 (Active )
XCP0 (Current): 1121
XCP1 (Reserve): 1121
XSCF#1 (Standby)
XCP0 (Current): 1121
XCP1 (Reserve): 1121
3. 硬件更换
XSCF> showstatus
* IOU#0 Status:Degraded;
XSCF> replacefru
----------------------------------------------------------------
Maintenance/Replacement Menu
Please select a type of FRU to be replaced.
1. CMU/IOU (CPU Memory Board Unit/IO Unit)
2. FAN (Fan Unit)
3. PSU (Power Supply Unit)
4. XSCFU (Extended System Control Facility Unit)
5. DDC_A (DDC for BP_A)
----------------------------------------------------------------------
Select [1-5|c:cancel] :1
-----------------------------------------------------------------
Maintenance/Replacement Menu
Please select whether to replace a CMU only, an IOU only,
or both a CMU and an IOU.
1. Replace CMU only.
2. Replace IOU only.
3. Replace both CMU and IOU.
----------------------------------------------------------------
Select [1-3|b:back] :2
----------------------------------------------------------------------
Maintenance/Replacement Menu
Please select an IOU to be replaced.
DomainID
No. FRU XSB#0 XSB#1 XSB#2 XSB#3 Power Status
--- ------------- ----------------------- ----- ---------------
1. IOU#0 0 0 0 0 Off Degraded
2. IOU#1 0 0 0 0 Off Not installed
3. IOU#2 1 1 1 1 On Normal
4. IOU#3 1 1 1 1 On Not installed
----------------------------------------------------------------------
Select [1-4|b:back] :1
----------------------------------------------------------------------
Maintenance/Replacement Menu
Status of the selected FRU.
FRU Status
------------- --------
IOU#0 Degraded
----------------------------------------------------------------------
You are about to replace IOU#0.
Do you want to continue?[r:replace|c:cancel] :r
Please confirm the Ready LED is not lit and that the Check LED is
blinking.
If this is the case, please replace IOU#0.开始拔出故障IOU板块,将硬盘、HBA卡等配件按顺序插入新的IOU板卡中,将新的IOU板卡插入SUN M8000,再进行下一步的操作
After replacement has been completed, please select[f:finish] :f
To ensure correct operation, diagnostic tests should be run on
IOU#0.[d:diagnose|s:skip] :d
Diagnostic tests for IOU#0 have started.
[This operation may take up to 60 minute(s)]
(progress scale reported in seconds)
0..... 30..... 60..... 90..... 120..... 150..... 180..... 210.....done
----------------------------------------------------------------------
Maintenance/Replacement Menu
Status of the selected FRU.
FRU Status
------------- --------
IOU#0 Normal #IOU板状态正常
----------------------------------------------------------------------
The replacement of IOU#0 has completed normally.[f:finish] :f #完成配件更换