smartctl定位磁盘故障信息

smartctl定位磁盘故障信息

Smartctl(S.M.A.R.T 自监控,分析和报告技术)是用于查看和检测磁盘硬件信息的工具,可以打印SMART自检错误日志,启用并禁用SMRAT自动检测,以及初始化设备自检。服务器环境中,一般磁盘都是通过RAID卡挂载,如果配置了直通模式,则可以直接使用smartctl查询磁盘信息,如果非直通模式则需要调用raid卡对应接口才可以查询。

smartctl插叙非直通模式磁盘信息

  • smartctl --scan 列出磁盘的名称,类型以及接口信息.
  • smartctl -H -d megaraid,8 /dev/bus/0 根据--scan查询信息,通过raid卡的型号,获取磁盘信息,不同的raid卡,获取方式不一致.
[root@centos ~]# smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/sdc -d scsi # /dev/sdc, SCSI device
/dev/sdd -d scsi # /dev/sdd, SCSI device
/dev/sde -d scsi # /dev/sde, SCSI device
/dev/sdf -d scsi # /dev/sdf, SCSI device
/dev/sdg -d scsi # /dev/sdg, SCSI device
/dev/bus/0 -d megaraid,1 # /dev/bus/0 [megaraid_disk_01], SCSI device
/dev/bus/0 -d megaraid,2 # /dev/bus/0 [megaraid_disk_02], SCSI device
/dev/bus/0 -d megaraid,3 # /dev/bus/0 [megaraid_disk_03], SCSI device
/dev/bus/0 -d megaraid,4 # /dev/bus/0 [megaraid_disk_04], SCSI device
/dev/bus/0 -d megaraid,5 # /dev/bus/0 [megaraid_disk_05], SCSI device
/dev/bus/0 -d megaraid,6 # /dev/bus/0 [megaraid_disk_06], SCSI device
/dev/bus/0 -d megaraid,7 # /dev/bus/0 [megaraid_disk_07], SCSI device
/dev/bus/0 -d megaraid,8 # /dev/bus/0 [megaraid_disk_08], SCSI device

# smartctl -H -d megaraid,8 /dev/bus/0
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.36.2.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

smartctl检测磁盘错误信息

  • 磁盘被检测出故障,这时磁盘是可用状态,只不过有部分磁道出现问题,会导致IO降低,波动严重
[root@centos ~]# smartctl -H -d -d megaraid,37 /dev/bus/15
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1127.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   001   001   016    Pre-fail  Always   FAILING_NOW 4294967295
  2 Throughput_Performance  0x0005   001   001   054    Pre-fail  Offline  FAILING_NOW 18967

smartctl关闭磁盘cache

  • SATA盘磁盘cache默认是开启,SAS盘磁盘cache默认为关闭状态;当磁盘cache开启时,如果出现掉电和强制开关机,这会出现数据丢失现象,导致系统启动失败
for i in `seq 9 26`;do smartctl -g wcache -d megaraid,${i} /dev/bus/0;done

#关闭raid盘cache
for i in `seq 0 10`; do /opt/MegaRAID/storcli/storcli64 /c0/v${i} set pdcache=Off;done
for i in `seq 0 10`; do /opt/MegaRAID/storcli/storcli64 /c0/v${i} set wrcache=WT;done

#查看raid cache状态,Cache选项为raid卡cache,WT为关闭,WB为开启,AWB为总是开启
/opt/MegaRAID/storcli/storcli64 /c0 show
---------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC       Size Name
---------------------------------------------------------------
1/0   RAID0 Optl  RW     Yes     RWTD  -   ON    1.090 TB
2/1   RAID0 Optl  RW     Yes     RWTD  -   ON    1.090 TB
0/2   RAID1 Optl  RW     Yes     RWTD  -   ON  446.625 GB
3/3   RAID0 Optl  RW     Yes     RWTD  -   ON    1.090 TB
4/4   RAID0 Optl  RW     Yes     RWTD  -   ON    1.090 TB
5/5   RAID0 Optl  RW     Yes     RWTD  -   ON    1.090 TB
6/6   RAID0 Optl  RW     Yes     RWTD  -   ON    1.090 TB
7/7   RAID0 Optl  RW     Yes     RWTD  -   ON    1.090 TB
8/8   RAID0 Optl  RW     Yes     RWTD  -   ON    1.090 TB
9/9   RAID0 Optl  RW     Yes     RWTD  -   ON    1.090 TB
10/10 RAID0 Optl  RW     Yes     RWTD  -   ON    1.090 TB
---------------------------------------------------------------

服务器点灯

  • ipmitool chassis identify 30 打开服务器定位灯,默认为15s,可以设置为需要的的时间30s
ipmitool chassis identify 30

磁盘点灯

#获取raid卡设备/dev/sg0
[root@centos-211 ~]# lsscsi -gt
[0:0:0:0]    enclosu                                 -          /dev/sg0
[0:0:3:0]    disk                                    /dev/sda   /dev/sg1
[0:0:4:0]    disk                                    /dev/sdb   /dev/sg2
[0:0:5:0]    disk                                    /dev/sdc   /dev/sg3
[0:0:6:0]    disk                                    /dev/sdd   /dev/sg4
[0:0:7:0]    disk                                    /dev/sde   /dev/sg5
[0:0:8:0]    disk                                    /dev/sdf   /dev/sg6
[0:0:9:0]    disk                                    /dev/sdg   /dev/sg7
[0:0:10:0]   disk                                    /dev/sdh   /dev/sg8
[0:0:11:0]   disk                                    /dev/sdi   /dev/sg9
[0:0:12:0]   disk                                    /dev/sdj   /dev/sg10
[0:0:13:0]   disk                                    /dev/sdk   /dev/sg11
[0:0:14:0]   disk                                    /dev/sdl   /dev/sg12
[0:2:0:0]    disk                                    /dev/sdm   /dev/sg13

#获取raid卡enclosu
sg_ses --index=1 --get ident /dev/sg24

#点亮磁盘灯
sg_ses --index=1 --set ident /dev/sg24

#查看支持选项
sg_ses -ee
上一篇:物联网平台 --- iot studio 如何控制 “闪烁告警”组件


下一篇:TOUGHWLAN V0.0.1 发布,WLAN运营管理系统