Archive area +RECO has -7440384 free KB remaining (Usable_file_MB is negative) 17 July 2011
Posted by David Alejo Marcos in ASM, Exadata, Oracle 11.2, RMAN.Tags: ASM, Exadata, Oracle 11.2, RMAN
trackback
I must say, this has been a busy weekend.
We have been promoting a release to production and a guaranteed restore point was created on Friday as rollback strategy. On Sunday I was called as we started to receive alerts.
The problem:
Our monitoring system started to send emails and SNMP Traps with the following alerts:
OEM alert for Automatic Storage Management +ASM4_ssssss4: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100) OEM alert for Automatic Storage Management +ASM2_ssssss2: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100) OEM alert for Automatic Storage Management +ASM3_ssssss3: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100) OEM alert for Automatic Storage Management +ASM7_ssssss7: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100) OEM alert for Automatic Storage Management +ASM8_ssssss8: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100) OEM alert for Automatic Storage Management +ASM6_ssssss6: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100) OEM alert for Automatic Storage Management +ASM1_ssssss1: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100) OEM alert for Automatic Storage Management +ASM5_ssssss5: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100) OEM alert for Database Instance : Archive area +RECO has -7440384 free KB remaining. #Current Free Archive Area #KB# value: -7440384# OEM alert for Database Instance : Archive area +RECO has -21725184 free KB remaining. (Current Free Archive Area (KB) value: -21725184)
Not very nice in any situation, but not when you are in the middle of a critical, high visible release.
I did have a look and this is what I found.
The solution:
The first thing I did was to check the Flash Recovery Area, as it is configured to write to our +RECO diskgroup:
NAME USED_MB LIMIT_MB PCT_USED -------------------- ---------- ---------- ---------- +RECO 1630569 2048000 79.62 Elapsed: 00:00:00.12 FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES -------------------- ------------------ ------------------------- --------------- CONTROL FILE 0 0 1 REDO LOG 0 0 0 ARCHIVED LOG 0 0 0 BACKUP PIECE 75.66 75.66 86 IMAGE COPY 0 0 0 FLASHBACK LOG 3.96 0 707 FOREIGN ARCHIVED LOG 0 0 0 7 rows selected. Elapsed: 00:00:01.62
Numbers did look ok, some backup files could be reclaimed (Oracle should do it automatically). Lets have a look the ASM:
oracle@ssss (+ASM1)$ asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 4194304 55050240 26568696 5004567 10782064 0 N DATA/ MOUNTED NORMAL N 512 4096 4194304 35900928 3192844 3263720 -35438 0 N RECO/ MOUNTED NORMAL N 512 4096 4194304 4175360 4061640 379578 1841031 0 N SYSTEMDG/ Elapsed: 00:00:01.62
Bingo, this is where our problem is. USABLE_FILE_MB (+RECO diskgroup) indicates the amount of free space that can be utilized, including the mirroring space, and being able to restore redundancy after a disk failure. A negative number on this column, could be critical in case of disk failure for the system as we might not have enough space perform a restore of all files to the surviving of disk.
Our backups goes to ASM and we copy them to tape afterwards. Our retention policy on disk is between 2 or 3 days, depending of the systems.
When I did check the contents of the backupset on ASM I found some old backups:
Type Redund Striped Time Sys Name Y 2011_07_17/ Y 2011_07_16/ Y 2011_07_15/ Y 2011_07_14/ Y 2011_07_13/ Y 2011_07_12/ Elapsed: 00:00:01.62
To delete those old backups I executed the following script from RMAN:
RMAN> delete backup tag EOD_DLY_110712 device type disk; allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=1430 instance=<instance> device type=DISK allocated channel: ORA_DISK_2 channel ORA_DISK_2: SID=2138 instance=<instance> device type=DISK allocated channel: ORA_DISK_3 channel ORA_DISK_3: SID=8 instance=<instance> device type=DISK allocated channel: ORA_DISK_4 channel ORA_DISK_4: SID=150 instance=<instance> device type=DISK List of Backup Pieces BP Key BS Key Pc# Cp# Status Device Type Piece Name ------- ------- --- --- ----------- ----------- ---------- 8292 4020 1 1 AVAILABLE DISK +RECO//backupset/2011_07_12/sssss_eod_dly_110712_0.nnnn.nnnnn 8293 4021 1 1 AVAILABLE DISK +RECO//backupset/2011_07_12/sssss_eod_dly_110712_0.nnnn.nnnnn Do you really want to delete the above objects (enter YES or NO)? yes deleted backup piece backup piece handle=+RECO//backupset/2011_07_12/sssss_eod_dly_110712_0.nnnn.nnnnnn RECID=8292 STAMP=nnnn deleted backup piece backup piece handle=+RECO//backupset/2011_07_12/sssss_eod_dly_110712_0.nnnn.nnnnnn RECID=8293 STAMP=nnnn Deleted 2 objects Elapsed: 00:00:01.62
After deleting a two more old backups, the number looked much better:
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 4194304 55050240 26568696 5004567 10782064 0 N DATA/ MOUNTED NORMAL N 512 4096 4194304 35900928 3548260 3263720 142270 0 N RECO/ MOUNTED NORMAL N 512 4096 4194304 4175360 4061640 379578 1841031 0 N SYSTEMDG/
Note.- There is another temporary fix. I could changed db_recovery_file_dest to point to +DATA instead of +RECO, but as we have a guaranteed restore point, I thought releasing space from old backups was easier.
As always, comments are welcome.
David Alejo-Marcos.
David Marcos Consulting Ltd.
Comments»
No comments yet — be the first.