jump to navigation

Unable to delete archive log and FRA is filling 7 September 2010

Posted by David Alejo Marcos in ASM, Oracle 11.2, RMAN.
Tags: , ,
4 comments

Couple of weeks ago we had a problem with one of our busiest databases.

The FRA was filling quite rapidly and we just could not free enough space. What made this problem interesting was the fact that Oracle was telling us more than 40% of the space was marked as reclaimable.

The problem:

We have automatic alerts when we are running out of space. When I had a look to the situation this is what I saw:

SQL> select * from v$flash_recovery_area_usage;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
------------ ------------------ ------------------------- ---------------
CONTROLFILE .01 0 1
ONLINELOG .89 0 9
ARCHIVELOG 51.29 49.67 1181
BACKUPPIECE 0 0 0
IMAGECOPY 0 0 0
FLASHBACKLOG 40.43 34.83 1658

My first thought was to remove old files using RMAN, after checking all standby databases were in sync I execute the following command (we have space enough to store 7 days worth of archivelogs, but I wanted to play safe as we had many changes during the weekend and one change in particular, generated 60+ GB of archivelogs):

rman
RMAN> connect target /
RMAN> delete noprompt archivelog until time 'SYSDATE - 3';
List of Archived Log Copies
Key Thrd Seq S Low Time Name
------- ---- ------- - ------------------- ----
228321 1 71005 A 07.08.2010 14:27:00 +FRA/xxx/archivelog/2010_08_07/thread_1_seq_71005.670.726416889
228323 1 71006 A 07.08.2010 14:28:09 +FRA/xxx/archivelog/2010_08_07/thread_1_seq_71006.2112.726416957
228330 1 71007 A 07.08.2010 14:29:16 +FRA/xxx/archivelog/2010_08_07/thread_1_seq_71007.1210.726417025
.....
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of delete command on ORA_DISK_1 channel at 08/09/2010 18:19:58
ORA-15028: ASM file '+FRA/xxx/archivelog/2010_08_07/thread_1_seq_71005.670.726416889' not dropped; currently being accessed

bingo, what should have been a simple operation now has just become more complicated…

The solution:

Technically we have plenty of space on the FRA, but Oracle is complaining that a file is being used. This file is old.

The only way I could fix the problem is by killing the process holding a lock on the file. This process has to be one of the arc processes.

In this case the solution was quite simple. The steps I followed were (bearing in mind that arc? processes are being restarted automatically by oracle):
1.- find the process id for arc:
ps -ef | grep -i ora_arc*
oracle 5607 1 1 19:02 ? 00:00:00 ora_arc9_prod1

2.- kill the running process:
kill -9 5607
3.- check the process is started again before killing more:
ps -ef | grep -i ora_arc9_prod1
4.- perform 2 and 3 for all arc? running for your instance.

Once we have killed all arc? processes and we are sure they have been re-started by Oracle I did the following:
1.- Perform couple of redo log switches and make sure are being propagated to all standby databases:

SQL> alter system switch logfile;

2.- Connect to RMAN and perform a backup of the archivelog “being in used”:
rman
RMAN> connect target /
RMAN> backup archivelog from sequence 71005 until sequence 71005 thread 1;

The output is:

Starting backup at 09.08.2010 18:50:02
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=620 instance=prod1 devtype=DISK
channel ORA_DISK_1: starting archive log backupset
channel ORA_DISK_1: specifying archive log(s) in backup set
input archive log thread=1 sequence=71005 recid=228321 stamp=726416895
channel ORA_DISK_1: starting piece 1 at 09.08.2010 18:50:04
channel ORA_DISK_1: finished piece 1 at 09.08.2010 18:50:11
piece handle=+FRA/xxx/backupset/2010_08_09/annnf0_tag20100809t185002_0.1097.726605405 tag=TAG20100809T185002 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:08
Finished backup at 09.08.2010 18:50:11
Starting Control File and SPFILE Autobackup at 09.08.2010 18:50:11
piece handle=/u04/oradata/prod/rman_disk/auto/PROD/c-4255406167-20100809-00_PROD comment=NONE
Finished Control File and SPFILE Autobackup at 09.08.2010 18:50:15

Once we have performed the backup, we can delete old archivelogs to release some space:

RMAN> delete noprompt archivelog until time 'SYSDATE - 3';
....
deleted archive log
archive log filename=+FRA/xxx/archivelog/2010_08_07/thread_1_seq_71005.670.726416889 recid=228321 stamp=726416895
deleted archive log
archive log filename=+FRA/xxx/archivelog/2010_08_07/thread_1_seq_71006.2112.726416957 recid=228323 stamp=726416962
....

3.- Monitor the space on the FRA:
SQL> select * from v$flash_recovery_area_usage;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
-------------------- ------------------ ------------------------- ---------------
CONTROLFILE .01 .00 1
ONLINELOG .89 .00 9
ARCHIVELOG 49.09 46.83 1168
BACKUPPIECE .00 .00 0
IMAGECOPY .00 .00 0
FLASHBACKLOG 40.22 33.91 1648

couple of times:

SQL> select * from v$flash_recovery_area_usage;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
-------------------- ------------------ ------------------------- ---------------
CONTROLFILE .01 .00 1
ONLINELOG .89 .00 9
ARCHIVELOG 3.10 .04 225
BACKUPPIECE .10 .00 1
IMAGECOPY .00 .00 0
FLASHBACKLOG 39.91 32.95 1633

As always, comments are welcome.

Advertisements

Upgrade clusterware from 10gR2 to 11gR2 7 September 2010

Posted by David Alejo Marcos in ASM, Oracle 11.2, RAC.
Tags: , ,
1 comment so far

With Oracle 10g becoming unsupported soon, more and more business will need to upgrade to 11g. My suggestion is to go for 11gR2 as it is more stable than 11gR1.

Most of our databases are running RAC. With this type of configuration, we need to upgrade the Clusterware before upgrading the databases.

The problem:

Upgrade Oracle Clusterware from 10.2.0.4 to 11.2.0.1.

The solution:

I was in charge of doing the upgrade on our DR environment (3-node RAC database). The same steps applies to production clusterware.

As we are running Linux the software we need is:

1.- linux.x64_11gR2_grid.zip

2.- p6880880_112000_Linux-x86-64.zip

3.- p9343627_112010_Linux-x86-64.zip

Note.- Please, make sure the patches listed above are still valid. I am aware Oracle release a new PSU1.

The steps detailed below were followed:

1.- Check the environment:

cd /u01/software/gridInfstructureUpgrade/grid
./runcluvfy.sh stage -pre crsinst -n server001a,server001b,server001c -r 11gR2 -verbose 2>&1 | tee /tmp/cluvfy.pre.crsinst_100807

2.- Make sure you have working public IP address for SCAN.

IP Addresses for SCAN:
scancrssav.
xx.xx.x.218, xx.xx.x.219, xx.xx.x.220 mask 255.255.255.192.

Checks resolves:
[oracle@server001a ~]$ host scancrssav.
scancrssav.. has address xx.xx.x.220
scancrssav.. has address xx.xx.x.219
scancrssav.. has address xx.xx.x.218

3.- Stop services and databases:
srvctl stop service -d
srvctl stop database -d

4.- Stop ASM on all nodes:
srvctl stop asm -n server001a
srvctl stop asm -n server001b
srvctl stop asm -n server001c

5.- If you are running dbconsole or Oracle Grid Agent, it should be stopped:
emctl stop dbconsole
or
emctl stop agent

6.- Create new CRS home on all nodes:
mkdir -p /u01/crs/oracle/product/11.2.0/grid

7.- backup OCR and Voting Disks:
cd /u01/crs/oracle/product/crs/bin

To find out location of OCR:
[oracle@server001a bin]$ ./ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 505808
Used space (kbytes) : 28548
Available space (kbytes) : 477260
ID : 1254611171
Device/File Name : /dev/mapper/voting-ocrp6
Device/File integrity check succeeded
Device/File Name : /dev/mapper/voting-ocrp5
Device/File integrity check succeeded
Cluster registry integrity check succeeded

To find out location voting disks:
[oracle@server001a bin]$ ./crsctl query css votedisk
0. 0 /dev/mapper/voting-ocrp2
1. 0 /dev/mapper/voting-ocrp3
2. 0 /dev/mapper/voting-ocrp1
located 3 votedisk(s).

Backup OCR:
Connect as Root.
cd /u01/crs/oracle/product/crs/bin
[root@server001a bin]# ./ocrconfig -export /u01/crs/backup/ocr_backup_100807
[root@server001a bin]# ls -lrt /u01/crs/backup/ocr_backup_100807
-rw-r--r-- 1 root root 752851 Aug 7 10:14 /u01/crs/backup/ocr_backup_100807

Backup Voting Disks:
[oracle@server001a bin]$ dd if=/dev/mapper/voting-ocrp1 of=/u01/crs/backup/voting-ocrp1_4k_100807
1012095+0 records in
1012095+0 records out
518192640 bytes (518 MB) copied, 9.39839 seconds, 55.1 MB/s

[oracle@server001a bin]$ dd if=/dev/mapper/voting-ocrp2 of=/u01/crs/backup/voting-ocrp2_4k_100807
1012095+0 records in
1012095+0 records out
518192640 bytes (518 MB) copied, 10.4548 seconds, 49.6 MB/s

[oracle@server001a bin]$ dd if=/dev/mapper/voting-ocrp3 of=/u01/crs/backup/voting-ocrp3_4k_100807
1012095+0 records in
1012095+0 records out
518192640 bytes (518 MB) copied, 9.85683 seconds, 52.6 MB/s

8.- Prepare environment
unset ORA_CRS_HOME
unset ORACLE_BASE
unset ORACLE_HOME
unset ORACLE_SID

set TEMP to a location with, at least 1GB free space:
export TEMP=/u01/software/temp
export TMP=$TEMP

Set the new ORACLE_HOME environment:
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=/u01/crs/oracle/product/11.2.0/grid

9.- Execute runInstaller.

10.- Upgrade Grid Infrastructure:

11.- yes to Existing ASM Instance detected. Are you sure you want to continue?:

12.- Choose language (English is the default language):

13.- Select nodes on the Cluster (all). Leave the tick on Upgrade Cluster Automatic Storage Manager (ASM):

14.- choose SCAN name (scancrssav.):

15.- ASM sysdba password:

16.- Choose groups for ASM authentication (we decide to use the same to avoid problems. This decision will be reviewed as part of our Internal Audit and Segregation of duties):

17.- if you choose the same group oracle complains (you will need to acknowledge the alert):

18.- Location for the software. Note, the image shows ORACLE_HOME as /u01/app, this should be /u01/crs
as Clusterware cannot share the same path than ORACLE_BASE:

19.- Oracle performs checks:

20.- Oracle warnings:

21.- Oracle gives you a script to fix:

This was the output in my case:
output:
[root@server001b CVU_11.2.0.1.0_oracle]# ./runfixup.sh
Response file being used is :./fixup.response
Enable file being used is :./fixup.enable
Log file location: ./orarun.log
Setting Kernel Parameters...
fs.file-max = 327679
fs.file-max = 6815744
net.ipv4.ip_local_port_range = 9000 65500
net.core.wmem_max = 262144
net.core.wmem_max = 1048576

22.- Oracle will provide you with a summary of the installation.
23.- Running:

24.- Execute Rootupgrade.sh on all nodes:

This is the output I had:
[root@server001a ~]# /u01/crs/oracle/product/11.2.0/grid/rootupgrade.sh
Running Oracle 11g root.sh script…

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/crs/oracle/product/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file “dbhome” already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
The file “oraenv” already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
The file “coraenv” already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2010-08-07 11:08:17: Parsing the host name
2010-08-07 11:08:17: Checking for super user privileges
2010-08-07 11:08:17: User has super user privileges
Using configuration parameter file: /u01/crs/oracle/product/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
CSS appears healthy
EVM appears healthy
CRS appears healthy
Shutting down Oracle Cluster Ready Services (CRS):
Aug 07 11:08:46.458 | INF | daemon shutting down
Stopping resources. This could take several minutes.
Successfully stopped CRS resources.
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.
LOCAL ADD MODE
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.
root wallet
root wallet cert
root cert export
peer wallet
profile reader wallet
pa wallet
peer wallet keys
pa wallet keys
peer cert request
pa cert request
peer cert
pa cert
peer root cert TP
profile reader root cert TP
pa root cert TP
peer pa cert TP
pa peer cert TP
profile reader pa cert TP
profile reader peer cert TP
peer user cert
pa user cert
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
CRS-2672: Attempting to start ‘ora.mdnsd’ on ‘server001a’
CRS-2676: Start of ‘ora.mdnsd’ on ‘server001a’ succeeded
CRS-2672: Attempting to start ‘ora.gipcd’ on ‘server001a’
CRS-2676: Start of ‘ora.gipcd’ on ‘server001a’ succeeded
CRS-2672: Attempting to start ‘ora.gpnpd’ on ‘server001a’
CRS-2676: Start of ‘ora.gpnpd’ on ‘server001a’ succeeded
CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘server001a’
CRS-2676: Start of ‘ora.cssdmonitor’ on ‘server001a’ succeeded
CRS-2672: Attempting to start ‘ora.cssd’ on ‘server001a’
CRS-2672: Attempting to start ‘ora.diskmon’ on ‘server001a’
CRS-2676: Start of ‘ora.diskmon’ on ‘server001a’ succeeded
CRS-2676: Start of ‘ora.cssd’ on ‘server001a’ succeeded
CRS-2672: Attempting to start ‘ora.ctssd’ on ‘server001a’
CRS-2676: Start of ‘ora.ctssd’ on ‘server001a’ succeeded
CRS-2672: Attempting to start ‘ora.crsd’ on ‘server001a’
CRS-2676: Start of ‘ora.crsd’ on ‘server001a’ succeeded
CRS-2672: Attempting to start ‘ora.evmd’ on ‘server001a’
CRS-2676: Start of ‘ora.evmd’ on ‘server001a’ succeeded
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.

server001a 2010/08/07 11:12:45 /u01/crs/oracle/product/11.2.0/grid/cdata/server001a/backup_20100807_111245.olr
Preparing packages for installation…
cvuqdisk-1.0.7-1
Configure Oracle Grid Infrastructure for a Cluster … succeeded
Updating inventory properties for clusterware
Starting Oracle Universal Installer…

Checking swap space: must be greater than 500 MB. Actual 7997 MB Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oracle/oraInventory
‘UpdateNodeList’ was successful.
Starting Oracle Universal Installer…

Checking swap space: must be greater than 500 MB. Actual 7997 MB Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oracle/oraInventory
‘UpdateNodeList’ was successful.
[root@server001a ~]#

25.- After executing rootupgrade.sh, Oracle proceeds to upgrade the ASM (if we left the tick on the previous step):

26.- We had the following error while upgrading ASM:

This is what I could see from the logfile:
INFO: Found associated job
INFO: Starting 'Automatic Storage Management Configuration Assistant'
INFO: Starting 'Automatic Storage Management Configuration Assistant'
INFO: Executing ASMCA
INFO: Command /u01/crs/oracle/product/11.2.0/grid/bin/asmca -silent -upgradeASM -oui_internal
INFO: ... GenericInternalPlugIn.handleProcess() entered.
INFO: ... GenericInternalPlugIn: getting configAssistantParmas.
INFO: ... GenericInternalPlugIn: checking secretArguments.
INFO: ... GenericInternalPlugIn: starting read loop.
INFO: Read: ASMSNMP_PASSWORD_PROMPT
INFO: Processing: ASMSNMP_PASSWORD_PROMPT for argument tag -asmMonitorPassword
INFO: End of argument passing to stdin
INFO: Read:
INFO: Read: Failed to stop databases currently using ASM.
INFO: Read:
INFO: Completed Plugin named: Automatic Storage Management Configuration Assistant

This is because we have configured Clusterware to start databases automatically, and the rootupgrade.sh stops and starts CRS.

The solution in our case was to stop databases and ASM manually using srvctl.

After fixing the problem with ASM, we had an error with OCR:

INFO: ERROR:
INFO: PRVF-4178 : Check for OCR location "/dev/mapper/voting-ocrp6" failed on the following nodes:
INFO: server001a:Owner of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "root" ; Found = "oracle"];Permissions of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "0640" ; Found = "0660"]
INFO: server001c:Owner of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "root" ; Found = "oracle"];Permissions of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "0640" ; Found = "0660"]
INFO: server001b:Owner of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "root" ; Found = "oracle"];Permissions of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "0640" ; Found = "0660"]
INFO: Checking OCR location "/dev/mapper/voting-ocrp5"...
INFO: ERROR:

There is a solution provided in Metalink: Solution as per PRVF-4178 Reported from CLUVFY [ID 1058576.1]:

The group must be one of the groups assigned for ASM
chown root:oinstall /dev/mapper/voting-ocrp6
chown root:oinstall /dev/mapper/voting-ocrp5
chmod 0660 /dev/mapper/voting-ocrp6
chmod 0660 /dev/mapper/voting-ocrp5

but then we started to see a different error:

INFO: PRVF-4178 : Check for OCR location "/dev/mapper/voting-ocrp6" failed on the following nodes:
INFO: server001a:Permissions of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "0640" ; Found = "0660"]
INFO: server001c:Permissions of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "0640" ; Found = "0660"]
INFO: server001b:Permissions of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "0640" ; Found = "0660"]
INFO: Checking OCR location "/dev/mapper/voting-ocrp5"...
INFO: ERROR:
INFO: PRVF-4178 : Check for OCR location "/dev/mapper/voting-ocrp5" failed on the following nodes:
INFO: server001a:Permissions of file "/dev/mapper/voting-ocrp5" did not match the expected value. [Expected = "0640" ; Found = "0660"]
INFO: server001c:Permissions of file "/dev/mapper/voting-ocrp5" did not match the expected value. [Expected = "0640" ; Found = "0660"]
INFO: server001b:Permissions of file "/dev/mapper/voting-ocrp5" did not match the expected value. [Expected = "0640" ; Found = "0660"]

so I decided to reach a compromise and leave it with 0660 as we will be moving it to ASM.

click “Skip” to move on.

Applying patches in Grid Infrastructure has changed completely and some security has been implemented (having to unlock and lock the environment).

I will post a new entry on how I did patch the environment.

As always, comments are welcome.