smartctl定位磁盘故障信息
Smartctl(S.M.A.R.T 自监控,分析和报告技术)是用于查看和检测磁盘硬件信息的工具,可以打印SMART自检和错误日志,启用并禁用SMRAT自动检测,以及初始化设备自检。服务器环境中,一般磁盘都是通过RAID卡挂载,如果配置了直通模式,则可以直接使用smartctl查询磁盘信息,如果非直通模式则需要调用raid卡对应接口才可以查询。
smartctl插叙非直通模式磁盘信息
- smartctl --scan 列出磁盘的名称,类型以及接口信息.
- smartctl -H -d megaraid,8 /dev/bus/0 根据--scan查询信息,通过raid卡的型号,获取磁盘信息,不同的raid卡,获取方式不一致.
[root@centos ~]# smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/sdc -d scsi # /dev/sdc, SCSI device
/dev/sdd -d scsi # /dev/sdd, SCSI device
/dev/sde -d scsi # /dev/sde, SCSI device
/dev/sdf -d scsi # /dev/sdf, SCSI device
/dev/sdg -d scsi # /dev/sdg, SCSI device
/dev/bus/0 -d megaraid,1 # /dev/bus/0 [megaraid_disk_01], SCSI device
/dev/bus/0 -d megaraid,2 # /dev/bus/0 [megaraid_disk_02], SCSI device
/dev/bus/0 -d megaraid,3 # /dev/bus/0 [megaraid_disk_03], SCSI device
/dev/bus/0 -d megaraid,4 # /dev/bus/0 [megaraid_disk_04], SCSI device
/dev/bus/0 -d megaraid,5 # /dev/bus/0 [megaraid_disk_05], SCSI device
/dev/bus/0 -d megaraid,6 # /dev/bus/0 [megaraid_disk_06], SCSI device
/dev/bus/0 -d megaraid,7 # /dev/bus/0 [megaraid_disk_07], SCSI device
/dev/bus/0 -d megaraid,8 # /dev/bus/0 [megaraid_disk_08], SCSI device
# smartctl -H -d megaraid,8 /dev/bus/0
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.36.2.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
smartctl检测磁盘错误信息
- 磁盘被检测出故障,这时磁盘是可用状态,只不过有部分磁道出现问题,会导致IO降低,波动严重
[root@centos ~]# smartctl -H -d -d megaraid,37 /dev/bus/15
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1127.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 001 001 016 Pre-fail Always FAILING_NOW 4294967295
2 Throughput_Performance 0x0005 001 001 054 Pre-fail Offline FAILING_NOW 18967
smartctl关闭磁盘cache
- SATA盘磁盘cache默认是开启,SAS盘磁盘cache默认为关闭状态;当磁盘cache开启时,如果出现掉电和强制开关机,这会出现数据丢失现象,导致系统启动失败
for i in `seq 9 26`;do smartctl -g wcache -d megaraid,${i} /dev/bus/0;done
#关闭raid盘cache
for i in `seq 0 10`; do /opt/MegaRAID/storcli/storcli64 /c0/v${i} set pdcache=Off;done
for i in `seq 0 10`; do /opt/MegaRAID/storcli/storcli64 /c0/v${i} set wrcache=WT;done
#查看raid cache状态,Cache选项为raid卡cache,WT为关闭,WB为开启,AWB为总是开启
/opt/MegaRAID/storcli/storcli64 /c0 show
---------------------------------------------------------------
DG/VD TYPE State Access Consist Cache Cac sCC Size Name
---------------------------------------------------------------
1/0 RAID0 Optl RW Yes RWTD - ON 1.090 TB
2/1 RAID0 Optl RW Yes RWTD - ON 1.090 TB
0/2 RAID1 Optl RW Yes RWTD - ON 446.625 GB
3/3 RAID0 Optl RW Yes RWTD - ON 1.090 TB
4/4 RAID0 Optl RW Yes RWTD - ON 1.090 TB
5/5 RAID0 Optl RW Yes RWTD - ON 1.090 TB
6/6 RAID0 Optl RW Yes RWTD - ON 1.090 TB
7/7 RAID0 Optl RW Yes RWTD - ON 1.090 TB
8/8 RAID0 Optl RW Yes RWTD - ON 1.090 TB
9/9 RAID0 Optl RW Yes RWTD - ON 1.090 TB
10/10 RAID0 Optl RW Yes RWTD - ON 1.090 TB
---------------------------------------------------------------
服务器点灯
- ipmitool chassis identify 30 打开服务器定位灯,默认为15s,可以设置为需要的的时间30s
ipmitool chassis identify 30
磁盘点灯
#获取raid卡设备/dev/sg0
[root@centos-211 ~]# lsscsi -gt
[0:0:0:0] enclosu - /dev/sg0
[0:0:3:0] disk /dev/sda /dev/sg1
[0:0:4:0] disk /dev/sdb /dev/sg2
[0:0:5:0] disk /dev/sdc /dev/sg3
[0:0:6:0] disk /dev/sdd /dev/sg4
[0:0:7:0] disk /dev/sde /dev/sg5
[0:0:8:0] disk /dev/sdf /dev/sg6
[0:0:9:0] disk /dev/sdg /dev/sg7
[0:0:10:0] disk /dev/sdh /dev/sg8
[0:0:11:0] disk /dev/sdi /dev/sg9
[0:0:12:0] disk /dev/sdj /dev/sg10
[0:0:13:0] disk /dev/sdk /dev/sg11
[0:0:14:0] disk /dev/sdl /dev/sg12
[0:2:0:0] disk /dev/sdm /dev/sg13
#获取raid卡enclosu
sg_ses --index=1 --get ident /dev/sg24
#点亮磁盘灯
sg_ses --index=1 --set ident /dev/sg24
#查看支持选项
sg_ses -ee