netapp存储设备更换硬盘
一、 状态检查
通过命令“disk show –v”检查磁盘状态以及磁盘属于的机头,当有磁盘故障时,磁盘状态会显示为“FAILED”
此时通过“aggr status -s”可查看热备盘数量是否减少,如果热备盘比之前少了一块则说明热备盘已经开始顶替故障盘进行工作。
建议存储磁盘故障后,不要立即进行更换,待热备盘完全顶替故障盘后再进行换盘操作,也可以通过查看系统日志来进行信息确认:
netapp-db-B>rdfile /etc/messages
netapp-db-B>rdfile /etc/messages.0
netapp-db-B>rdfile /etc/messages.1 、
netapp-db-B>rdfile /etc/messages.2
netapp-db-B>rdfile /etc/messages.3
netapp-db-B>rdfile /etc/messages.4
查看日志会有类似如下提示:
ue Jan 1 03:51:44 CST [xxzx-netapp-db-B: raid.rg.recons.done:notice]: /aggr0/plex0/rg0: reconstruction completed for 2a.23 in 3:19:43.02
(重构结束时间)
当看到类似的日志时候,即可以开始更换磁盘
二、 物理磁盘更换
Netapp物理机拔出黄灯报警硬盘,几秒钟后插入新盘,注意观察有闪黄灯变为绿灯过程
三、 系统配置
- 状态确认
通过“disk show –v”查看磁盘归属情况,可看到0a.00.18与0a.0019状态为“Not Owed”与“NONE”,说明此两块磁盘未归属任何机头。
- 系统配置
如果机头为多个,则可根据实际需求,将不同的磁盘分配给不同的机头进行管理,此时则需要用串口登陆磁盘所要分配的机头按照下面的命令进行磁盘分配,(本例机头只有一个,即所有的磁盘均受一个机头管理,故无需考虑分配问题)
1) 进入维护模式
cdst-netapp> priv set advanced (维护模式)
Warning: These advanced commands are potentially dangerous; use
them only when directed to do so by NetApp
personnel.
cdst-netapp> (此时命令前带)
2) 磁盘分配
cdst-netapp> disk assign 0a.00.19 (分配磁盘0a.00.19)
Sat Jul 16 10:23:21 CST [cdst-netapp:diskown.changingOwner:info]: changing ownership for disk 0a.00.18 (S/N LXWH5A0M) from unowned (ID 4294967295) to cdst-netapp (ID 2014870888)
cdst-netapp> Sat Jul 16 10:23:21 CST [cdst-netapp:raid.assim.disk.nolabels:error]: Disk 0a.00.18 Shelf 0 Bay 18 [NETAPP X412_HVIPC560A15 NA01] S/N [LXWH5A0M] has no valid labels. It will be taken out of service to prevent possible data loss.
Sat Jul 16 10:23:21 CST [cdst-netapp:raid.config.disk.bad.label:error]: Disk 0a.00.18 Shelf 0 Bay 18 [NETAPP X412_HVIPC560A15 NA01] S/N [LXWH5A0M] has bad label.
Sat Jul 16 10:23:21 CST [cdst-netapp:callhome.dsk.label:CRITICAL]: Call home for DISK BAD LABEL
Sat Jul 16 10:23:21 CST [cdst-netapp:sfu.firmwareUpToDate:info]: Firmware is up-to-date on all disk shelves.
可看到此时“oa.00.19”磁盘已经分配了机头,而“oa.00.18”还未分配
cdst-netapp> disk assign 0a.00.18
Sat Jul 16 10:23:21 CST [cdst-netapp:diskown.changingOwner:info]: changing ownership for disk 0a.00.18 (S/N LXWH5A0M) from unowned (ID 4294967295) to cdst-netapp (ID 2014870888)
cdst-netapp> Sat Jul 16 10:23:21 CST [cdst-netapp:raid.assim.disk.nolabels:error]: Disk 0a.00.18 Shelf 0 Bay 18 [NETAPP X412_HVIPC560A15 NA01] S/N [LXWH5A0M] has no valid labels. It will be taken out of service to prevent possible data loss.
Sat Jul 16 10:23:21 CST [cdst-netapp:raid.config.disk.bad.label:error]: Disk 0a.00.18 Shelf 0 Bay 18 [NETAPP X412_HVIPC560A15 NA01] S/N [LXWH5A0M] has bad label.
Sat Jul 16 10:23:21 CST [cdst-netapp:callhome.dsk.label:CRITICAL]: Call home for DISK BAD LABEL
Sat Jul 16 10:23:21 CST [cdst-netapp:sfu.firmwareUpToDate:info]: Firmware is up-to-date on all disk shelves.
3) 检查磁盘标签
通过“sysconfig –r”查看各磁盘组的状况,其中可查看到热备盘的状况:
新更换的两块磁盘标签为“bad label”,需将此盘转换为热备盘。
4) 热备盘转换
cdst-netapp> disk unfail -s 0a.00.18 (将0a.00.18磁盘转换为热备盘)
disk unfail: unfailing disk 0a.00.18...
cdst-netapp> Sat Jul 16 10:37:56 CST [cdst-netapp:raid.disk.unfail.done:info]: Disk 0a.00.18 Shelf 0 Bay 18 [NETAPP X412_HVIPC560A15 NA01] S/N [LXWH5A0M] unfailed, and is now a spare
Sat Jul 16 10:38:05 CST [cdst-netapp:raid.disk.offline:notice]: Marking Disk 0a.00.18 Shelf 0 Bay 18 [NETAPP X412_HVIPC560A15 NA01] S/N [LXWH5A0M] offline.
Sat Jul 16 10:38:05 CST [cdst-netapp:bdfu.selected:info]: Disk 0a.00.18 [NETAPP X412_HVIPC560A15 NA01] S/N [LXWH5A0M] selected for background disk firmware update.
Sat Jul 16 10:38:05 CST [cdst-netapp:dfu.firmwareDownloading:info]: Now downloading firmware file /etc/disk_fw/X412_HVIPC560A15.NA02.LOD on 1 disk(s) of plex [Pool0]...
Sat Jul 16 10:38:21 CST [cdst-netapp:raid.disk.online:notice]: Onlining Disk 0a.00.18 Shelf 0 Bay 18 [NETAPP X412_HVIPC560A15 NA01] S/N [LXWH5A0M].
Sat Jul 16 10:38:21 CST [cdst-netapp:sfu.firmwareUpToDate:info]: Firmware is up-to-date on all disk shelves.
cdst-netapp> disk unfail -s 0a.00.19 (将0a.00.19磁盘转换为热备盘)
disk unfail: unfailing disk 0a.00.18...
cdst-netapp> Sat Jul 16 10:37:56 CST [cdst-netapp:raid.disk.unfail.done:info]: Disk 0a.00.18 Shelf 0 Bay 18 [NETAPP X412_HVIPC560A15 NA01] S/N [LXWH5A0M] unfailed, and is now a spare
Sat Jul 16 10:38:05 CST [cdst-netapp:raid.disk.offline:notice]: Marking Disk 0a.00.18 Shelf 0 Bay 18 [NETAPP X412_HVIPC560A15 NA01] S/N [LXWH5A0M] offline.
Sat Jul 16 10:38:05 CST [cdst-netapp:bdfu.selected:info]: Disk 0a.00.18 [NETAPP X412_HVIPC560A15 NA01] S/N [LXWH5A0M] selected for background disk firmware update.
Sat Jul 16 10:38:05 CST [cdst-netapp:dfu.firmwareDownloading:info]: Now downloading firmware file /etc/disk_fw/X412_HVIPC560A15.NA02.LOD on 1 disk(s) of plex [Pool0]...
Sat Jul 16 10:38:21 CST [cdst-netapp:raid.disk.online:notice]: Onlining Disk 0a.00.18 Shelf 0 Bay 18 [NETAPP X412_HVIPC560A15 NA01] S/N [LXWH5A0M].
Sat Jul 16 10:38:21 CST [cdst-netapp:sfu.firmwareUpToDate:info]: Firmware is up-to-date on all disk shelves.
至此两块故障盘均转换为热备盘,但此时状态为“not zeroed”,需进行磁盘零花操作
5) 热备盘零化
输入“disk zero spares”,此时没有零花的热备盘会开始零花操作,可通过“sysconfig –r”查看零花过程
cdst-netapp*> disk zero spares