原创作品,出自 “深蓝的blog” 博客,深蓝的blog:http://blog.csdn.net/huangyanlong/article/details/47277715
近日,处理了一个关于ASM磁盘组空间不足引起的问题。
简单记录如下:
一、问题的反馈
驻地工程师的反馈:
驻地工程师以邮件的形式告知了出现的问题,以及解决该问题的紧急性。
大概这样的描述:告知了巡检时发现了某照片表空间已满,对其进行扩容操作,报错:ORA-15041:DISGROUP "DATA" space exhausted。由于月初需要对上月数据进行考核,客户上传一些照片,此事比较紧急,需立刻解决。
附件中,附带了一些查询信息,如下:
SQL> select group_number,name,total_mb,free_mb from v$ASM_DISKGROUP;
GROUP_NUMBER NAME TOTAL_MB FREE_MB
------------ ------------------------------ ---------- ----------
1 ARCH 860159 405817
2 CRS 30717 29791
3 DATA 1638394 238
SQL> select name,group_number,state,redundancy,total_mb,free_mb,path from v$asm_disk;
NAME GROUP_NUMBER STATE REDUNDA TOTAL_MB
------------------------------ ------------ -------- ------- ----------
FREE_MB
----------
PATH
--------------------------------------------------------------------------------
ARCH_0000 1 NORMAL UNKNOWN 860159
405817
/dev/oracleasm/disks/ARCH
CRS_0002 2 NORMAL UNKNOWN 10239
9931
/dev/oracleasm/disks/VOTE_CRS3
NAME GROUP_NUMBER STATE REDUNDA TOTAL_MB
------------------------------ ------------ -------- ------- ----------
FREE_MB
----------
PATH
--------------------------------------------------------------------------------
CRS_0001 2 NORMAL UNKNOWN 10239
9930
/dev/oracleasm/disks/VOTE_CRS2
DATA_0001 3 NORMAL UNKNOWN 819197
112
NAME GROUP_NUMBER STATE REDUNDA TOTAL_MB
------------------------------ ------------ -------- ------- ----------
FREE_MB
----------
PATH
--------------------------------------------------------------------------------
/dev/oracleasm/disks/DATA2
DATA_0000 3 NORMAL UNKNOWN 819197
126
/dev/oracleasm/disks/DATA1
CRS_0000 2 NORMAL UNKNOWN 10239
NAME GROUP_NUMBER STATE REDUNDA TOTAL_MB
------------------------------ ------------ -------- ------- ----------
FREE_MB
----------
PATH
--------------------------------------------------------------------------------
9930
/dev/oracleasm/disks/VOTE_CRS1
6 rows selected.
二、紧急的处理
连入生成库,查询确实asm空间严重不足了。
ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 1048576 860159 405780 0 405780 0 N ARCH/
MOUNTED NORMAL N 512 4096 1048576 30717 29791 10239 9776 0 Y CRS/
MOUNTED EXTERN N 512 4096 1048576 1638394 238 0 238 0 N DATA/
为快速解决问题,让应用跑起来,决定先从如何解决无法扩充表空间的方面进行入手。
想到的是缩减低利用率的表空间。
于是查看表空间的使用情况:
1、发现undo表空间、temp表空间被扩容了很大,可以对其缩减;
2、发现了一些低利用率的表空间,诸如GB级别的只存了几M的数据量,可以考虑缩减;
于是连续使用诸如下面这样的命令:
ALTER DATABASE
TEMPFILE '+DATA/xcky/xckytmp04.dbf'
RESIZE 1024M;
用来实现对可缩减表空间的大小进行缩减。
经过一番空间缩减后,再次查询空间使用率,满足扩容表空间的需求,完成了业务中存储照片表空间的扩容。应用系统使用恢复正常。
三、阶段性回馈
快速回馈驻地工程师问题解决情况。
问题原因是:ASM磁盘组空间不足引起。
1、临时采取的方法是缩减了其它表空间的大小,为/DATA目录释放空间(缩减了undo表空间、temp表空间、其它空间利用率较低的表空间的大小)。
并且,已经新建了一个10G,自动扩展,存储照片的表空间,命名为photo_info47.dbf。
2、但后续建议:
(1)为存储扩容。
按照本环境的ASM规划策略,目前ASM磁盘组中的/DATA已经使用了约1.4T(总大小约为1.5T),/DATA下目前可用空间剩余约50G。
(2)或重新规划asm存储,考虑临时在/ARCH上扩充表空间(目前剩余400G可用),但该/ARCH是用于存放归档文件的,不建议这么做,后续有如果归档剧增,有引发出现hang停数据库的可能。
四、后续解决本质性问题
再次连接生产库,查询是否有进一步解决问题的好方法。
先来查询目前空间的大致使用情况。
SQL> conn sys/oracle as sysdba
Connected.
SQL> show user
USER is "SYS"
SQL> select path,total_mb,free_mb from v$asm_disk_stat;
PATH TOTAL_MB FREE_MB
-------------------------------------------------- ---------- ----------
/dev/oracleasm/disks/ARCH 860159 405780
/dev/oracleasm/disks/VOTE_CRS3 10239 9931
/dev/oracleasm/disks/VOTE_CRS2 10239 9930
/dev/oracleasm/disks/DATA2 819197 25466
/dev/oracleasm/disks/DATA1 819197 25480
/dev/oracleasm/disks/VOTE_CRS1 10239 9930
6 rows selected.
ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 1048576 860159 404777 0 404777 0 N ARCH/
MOUNTED NORMAL N 512 4096 1048576 30717 29791 10239 9776 0 Y CRS/
MOUNTED EXTERN N 512 4096 1048576 1638394 49590 0 49590 0 N DATA/
查一下磁盘组的情况
SQL> select name,state from v$asm_diskgroup;
NAME STATE
------------------------------ -----------
ARCH CONNECTED
CRS MOUNTED
DATA CONNECTED
查看系统的磁盘使用情况,发下了一个好信息。
不知道为什么,存储中,竟然有一块盘没有利用。那太好了,可以把它分给ASM了。
于是,下面先把这块盘查询出来。
[root@gzxkdb1 ~]# fdisk -l
Disk /dev/emcpoweree: 2147.4 GB, 2147483648000 bytes
255 heads, 63 sectors/track, 261083 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/emcpoweree doesn't contain a valid partition table
通过以上信息,锁定/dev/emcpoweree设备没有被划分使用。
对该设备进行磁盘分区
[root@gzxkdb1 ~]# fdisk /dev/emcpoweree
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.
The number of cylinders for this disk is set to 261083.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-261083, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-261083, default 261083): +500G
Command (m for help): p
Disk /dev/emcpoweree: 2147.4 GB, 2147483648000 bytes
255 heads, 63 sectors/track, 261083 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/emcpoweree1 1 60789 488287611 83 Linux
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (60790-261083, default 60790):
Using default value 60790
Last cylinder or +size or +sizeM or +sizeK (60790-261083, default 261083): +500G
Command (m for help): p
Disk /dev/emcpoweree: 2147.4 GB, 2147483648000 bytes
255 heads, 63 sectors/track, 261083 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/emcpoweree1 1 60789 488287611 83 Linux
/dev/emcpoweree2 60790 121578 488287642+ 83 Linux
Command (m for help): m
Command action
a toggle a bootable flag
b edit bsd disklabel
c toggle the dos compatibility flag
d delete a partition
l list known partition types
m print this menu
n add a new partition
o create a new empty DOS partition table
p print the partition table
q quit without saving changes
s create a new empty Sun disklabel
t change a partition's system id
u change display/entry units
v verify the partition table
w write table to disk and exit
x extra functionality (experts only)
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (121579-261083, default 121579):
Using default value 121579
Last cylinder or +size or +sizeM or +sizeK (121579-261083, default 261083): +500G
Command (m for help): p
Disk /dev/emcpoweree: 2147.4 GB, 2147483648000 bytes
255 heads, 63 sectors/track, 261083 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/emcpoweree1 1 60789 488287611 83 Linux
/dev/emcpoweree2 60790 121578 488287642+ 83 Linux
/dev/emcpoweree3 121579 182367 488287642+ 83 Linux
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Selected partition 4
First cylinder (182368-261083, default 182368):
Using default value 182368
Last cylinder or +size or +sizeM or +sizeK (182368-261083, default 261083):
Using default value 261083
Command (m for help): p
Disk /dev/emcpoweree: 2147.4 GB, 2147483648000 bytes
255 heads, 63 sectors/track, 261083 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/emcpoweree1 1 60789 488287611 83 Linux
/dev/emcpoweree2 60790 121578 488287642+ 83 Linux
/dev/emcpoweree3 121579 182367 488287642+ 83 Linux
/dev/emcpoweree4 182368 261083 632286270 83 Linux
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
以上,完成了对该磁盘分区。分四个主分区,3个500G,剩余最后部分一个区。
查看asm磁盘列表
[root@gzxkdb1 ~]# service oracleasm listdisks
ARCH
DATA1
DATA2
VOTE_CRS1
VOTE_CRS2
VOTE_CRS3
创建asm磁盘
[root@gzxkdb1 ~]# service oracleasm createdisk DATA3 /dev/emcpoweree1
Marking disk "DATA3" as an ASM disk: [ OK ]
[root@gzxkdb1 ~]# service oracleasm createdisk DATA4 /dev/emcpoweree2
Marking disk "DAT43" as an ASM disk: [ OK ]
[root@gzxkdb1 ~]# service oracleasm createdisk DATA5 /dev/emcpoweree3
Marking disk "DATA5" as an ASM disk: [ OK ]
[root@gzxkdb1 ~]# service oracleasm createdisk DATA6 /dev/emcpoweree4
Marking disk "DATA6" as an ASM disk: [ OK ]
在另外一个节点,对新添加的磁盘进行扫描
[root@gzxkdb2 ~]# service oracleasm scandisks //节点2上完成扫描磁盘
[root@gzxkdb2 ~]# service oracleasm listdisks
ARCH
DATA4
DATA1
DATA2
DATA3
DATA5
DATA6
VOTE_CRS1
VOTE_CRS2
VOTE_CRS3
在节点1,用sysasm用户进行登录实例
[grid@gzxkdb1 ~]$ sqlplus '/as sysasm'
SQL*Plus: Release 11.2.0.3.0 Production on Mon Aug 3 17:48:58 2015
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
查看asm磁盘情况
SQL> set linesize 200
SQL> set pagesize 200
SQL> col NAME for a30
SQL> col PATH for a50
SQL> r
1* select name,path,mode_status,state,disk_number,failgroup from v$asm_disk
NAME PATH MODE_ST STATE DISK_NUMBER FAILGROUP
------------------------------ -------------------------------------------------- ------- -------- ----------- ------------------------------
/dev/oracleasm/disks/DATA6 ONLINE NORMAL 0
/dev/oracleasm/disks/DATA5 ONLINE NORMAL 1
/dev/oracleasm/disks/DATA4 ONLINE NORMAL 2
/dev/oracleasm/disks/DATA3 ONLINE NORMAL 3
ARCH_0000 /dev/oracleasm/disks/ARCH ONLINE NORMAL 0 ARCH_0000
CRS_0002 /dev/oracleasm/disks/VOTE_CRS3 ONLINE NORMAL 2 CRS_0002
CRS_0001 /dev/oracleasm/disks/VOTE_CRS2 ONLINE NORMAL 1 CRS_0001
DATA_0001 /dev/oracleasm/disks/DATA2 ONLINE NORMAL 1 DATA_0001
DATA_0000 /dev/oracleasm/disks/DATA1 ONLINE NORMAL 0 DATA_0000
CRS_0000 /dev/oracleasm/disks/VOTE_CRS1 ONLINE NORMAL 0 CRS_0000
10 rows selected.
批量扩容ASM磁盘组
SQL> alter diskgroup DATA add disk '/dev/oracleasm/disks/DATA3' rebalance 10;
Diskgroup altered.
SQL> alter diskgroup DATA add disk '/dev/oracleasm/disks/DATA4' rebalance 10;
Diskgroup altered.
SQL> alter diskgroup DATA add disk '/dev/oracleasm/disks/DATA5' rebalance 10;
Diskgroup altered.
SQL> alter diskgroup DATA add disk '/dev/oracleasm/disks/DATA6' rebalance 10;
Diskgroup altered.
SQL> select * from v$asm_operation;
GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
------------ ----- ---- ---------- ---------- ---------- ---------- ---------- ----------- --------------------------------------------
3 REBAL RUN 10 10 59949 634963 5143 111
当查询v$asm_operation没有数据时,表示IO自动均衡已经完成
SQL> select * from v$asm_operation;
no rows selected
再次查看磁盘组的空间
ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 1048576 860159 404170 0 404170 0 N ARCH/
MOUNTED NORMAL N 512 4096 1048576 30717 29791 10239 9776 0 Y CRS/
MOUNTED EXTERN Y 512 4096 1048576 3686390 2097561 0 2097561 0 N DATA/
DATA/已经得到扩容,已经有近2T的剩余空间,可以满足一段时期业务的需求了。
五、最后反馈
留言:
昨天贵州的“ASM磁盘组空间不足”问题。后续,发现了“盘阵”有未用空间,约2T,已经为ASM添加。
可以满足一段时间的磁盘空间需要了。
驻地工程师表示了感谢。
至此,本次任务记录完成。