jump to navigation

ORA-16072: a minimum of one standby database destination is required 27 August 2011

Posted by David Alejo Marcos in ASM, Exadata, Oracle 11.2, RAC, RMAN.
Tags: , , , , , , ,
2 comments

This is a quick post regarding the error on the subject. This is the second time it happens to me, so I thought I will write a bit about it.

The problem:

I am refreshing one of my UAT environments (happens to be a Full Rack Exadata) using Oracle RMAN duplicate command. Then the following happens (on both occasions).

1.- Duplicate command fails (lack of space for restoring archivelogs, or any other error). This is can be fixed quite easy.

2.- following error while trying to open the database after restore and recover has finished:

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 13710
Session ID: 1250 Serial number: 5
SQL> exit

On the alert.log file we can read the following:

Wed Aug 24 13:32:48 2011
alter database open
Wed Aug 24 13:32:49 2011
LGWR: STARTING ARCH PROCESSES
Wed Aug 24 13:32:49 2011
ARC0 started with pid=49, OS id=13950
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
<strong>LGWR: Primary database is in MAXIMUM AVAILABILITY mode</strong>
<strong>LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR</strong>
<strong>LGWR: Minimum of 1 LGWR standby database required</strong>
Errors in file /apps/oracle/server/diag/rdbms/xxxx04/xxxx041/trace/xxxx041_lgwr_13465.trc:
<strong>ORA-16072: a minimum of one standby database destination is required</strong>
Errors in file /apps/oracle/server/diag/rdbms/xxxx04/xxxx041/trace/xxxx041_lgwr_13465.trc:
ORA-16072: a minimum of one standby database destination is required
LGWR (ospid: 13465): terminating the instance due to error 16072
Wed Aug 24 13:32:50 2011
ARC1 started with pid=48, OS id=13952
Wed Aug 24 13:32:50 2011
System state dump is made for local instance
System State dumped to trace file /apps/oracle/server/diag/rdbms/xxxx04/xxxx041/trace/xxxx041_diag_13137.trc
Trace dumping is performing id=[cdmp_20110824133250]
Instance terminated by LGWR, pid = 13465

The Solution:

Quite simple:

1.- Start up database in mount mode:

SQL> startup mount

ORACLE instance started.

Total System Global Area 1.7103E+10 bytes
Fixed Size                  2230472 bytes
Variable Size            4731176760 bytes
Database Buffers         1.2180E+10 bytes
Redo Buffers              189497344 bytes
Database mounted.

SQL> select open_mode, DATABASE_ROLE, guard_status, SWITCHOVER_STATUS from v$database;

OPEN_MODE            DATABASE_ROLE    GUARD_S SWITCHOVER_STATUS
-------------------- ---------------- ------- --------------------
MOUNTED              PRIMARY          NONE    NOT ALLOWED

2.- Execute the following command:

SQL> alter database set standby database to maximize performance;

Database altered.

SQL> select open_mode, DATABASE_ROLE, guard_status, SWITCHOVER_STATUS from v$database;

OPEN_MODE            DATABASE_ROLE    GUARD_S SWITCHOVER_STATUS
-------------------- ---------------- ------- --------------------
MOUNTED              PRIMARY          NONE    NOT ALLOWED

3.- Stop database:

SQL shutdown immediate
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.

4.- Start up database mount mode:

SQL> startup mount
ORACLE instance started.

Total System Global Area 1.7103E+10 bytes
Fixed Size                  2230472 bytes
Variable Size            4731176760 bytes
Database Buffers         1.2180E+10 bytes
Redo Buffers              189497344 bytes
Database mounted.

5.- Open database:

SQL> alter database open;

Database altered.

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
READ WRITE

SQL> select instance_name from v$instance;

INSTANCE_NAME
----------------
xxxx041

SQL>

As always, comments are welcome.

David Alejo-Marcos.

David Marcos Consulting Ltd.

How to transfer files from ASM to another ASM or filesystem or DBFS… 23 July 2011

Posted by David Alejo Marcos in ASM, Exadata, Oracle 11.2.
Tags: , ,
2 comments

I had a requirement of transferring files from our PROD ASM to our UAT ASM as DBFS is proving to be slow.

The problem:

We are currently refreshing UAT schemas using Oracle Datapump to DBFS and then transferring those files to UAT using SCP.

DBFS does not provided us with the performance we need as datapump files are quite big. Same export onto ASM or NFS proves to be much, much faster.

We are currently testing exports to ASM, but, how to move dmp files from PROD ASM to UAT ASM?

The solution:

The answer for us is using DBMS_FILE_TRANSFER. It is very simple to set up (steps below) and it has proved to be fast.

DBMS_FILE_TRANSFER will copy files from one ORACLE DIRECTORY to another ORACLE DIRECTORY. The directory can point to a folder on ASM, DBFS, Filesystem, etc, so the transfer is “heterogeneous”.

I decided to go for GET_FILE, so most of the work will be done on UAT, the other option is PUT_FILE.

The syntax is as follows:

DBMS_FILE_TRANSFER.GET_FILE (
source_directory_object      IN  VARCHAR2,
source_file_name             IN  VARCHAR2,
source_database              IN  VARCHAR2,
destination_directory_object IN  VARCHAR2,
destination_file_name        IN  VARCHAR2);

Where:

source_directory_object: The directory object from which the file is copied at the source site. This directory object must exist at the source site.

source_file_name: The name of the file that is copied in the remote file system. This file must exist in the remote file system in the directory associated with the source directory object.

source_database: The name of a database link to the remote database where the file is located.

destination_directory_object: The directory object into which the file is placed at the destination site. This directory object must exist in the local file system.

destination_file_name: The name of the file copied to the local file system. A file with the same name must not exist in the destination directory in the local file system.

These are the steps for my test:

1.- Create directory on destination:

oracle@sssss (+ASM1)$ asmcmd
ASMCMD> mkdir +DATA/DPUMP
ASMCMD> mkdir +DATA/DPUMP/sid
ASMCMD> exit

2.- Create directory on database

SQL> create directory SID_ASM_DPUMP_DIR as '+DATA/DPUMP/sid';

Directory created.

SQL> grant read, write, execute on directory SID_ASM_DPUMP_DIR to public;

Grant succeeded.

3.- Transfer file

SQL> set timing on time on
17:22:13 SQL> BEGIN
17:22:17   2  dbms_file_transfer.get_file ('SID_ASM_DPUMP_DIR',
17:22:17   3                               'expdp.dmp',
17:22:17   4                               'TESTSRC',
17:22:17   5                               'SID_ASM_DPUMP_DIR',
17:22:17   6                               'expdp.dmp');
17:22:17   7  END;
17:22:17   8
17:22:17   9  /
PL/SQL procedure successfully completed.
Elapsed: 00:01:07.57
17:23:31 SQL>

4.- Check files is on destination:

ASMCMD [+DATA/DPUMP/sid] > ls -ls
Type     Redund  Striped  Time             Sys  Name
N    expdp.dmp => +DATA/sid/DUMPSET/FILE_TRANSFER_0_0.797.756840143

OK, so ls -ls does not work show us the size of the file on the destination directory. The reason is because it is an alias, you need to perform ls -s on the directory where the file is stored.

ASMCMD [+DATA/DPUMP/sid] > ls -ls +DATA/sid/DUMPSET/
Type     Redund  Striped  Time             Sys  Block_Size   Blocks       Bytes        Space  Name
DUMPSET  MIRROR  COARSE   JUL 18 17:00:00  Y          4096  1784576  7309623296  14633926656  FILE_TRANSFER_0_0.797.756840143

So, we have managed to transfer 6.8GB of data between two remote servers in 1 minute 8 seconds… not bad.

As always, comments are welcome.

David Alejo-Marcos.

David Marcos Consulting Ltd.

Archive area +RECO has -7440384 free KB remaining (Usable_file_MB is negative) 17 July 2011

Posted by David Alejo Marcos in ASM, Exadata, Oracle 11.2, RMAN.
Tags: , , ,
add a comment

I must say, this has been a busy weekend.

We have been promoting a release to production and a guaranteed restore point was created on Friday as rollback strategy. On Sunday I was called as we started to receive alerts.

The problem:

Our monitoring system started to send emails and SNMP Traps with the following alerts:

OEM alert for Automatic Storage Management +ASM4_ssssss4: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100)
OEM alert for Automatic Storage Management +ASM2_ssssss2: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100)
OEM alert for Automatic Storage Management +ASM3_ssssss3: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100)
OEM alert for Automatic Storage Management +ASM7_ssssss7: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100)
OEM alert for Automatic Storage Management +ASM8_ssssss8: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100)
OEM alert for Automatic Storage Management +ASM6_ssssss6: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100)
OEM alert for Automatic Storage Management +ASM1_ssssss1: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100)
OEM alert for Automatic Storage Management +ASM5_ssssss5: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100)
OEM alert for Database Instance : Archive area +RECO has -7440384 free KB remaining. #Current Free Archive Area #KB# value: -7440384#
OEM alert for Database Instance : Archive area +RECO has -21725184 free KB remaining. (Current Free Archive Area (KB) value: -21725184)

Not very nice in any situation, but not when you are in the middle of a critical, high visible release.

I did have a look and this is what I found.

The solution:

The first thing I did was to check the Flash Recovery Area, as it is configured to write to our +RECO diskgroup:

NAME                    USED_MB   LIMIT_MB   PCT_USED
-------------------- ---------- ---------- ----------
+RECO                   1630569    2048000      79.62
Elapsed: 00:00:00.12
FILE_TYPE            PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
-------------------- ------------------ ------------------------- ---------------
CONTROL FILE                          0                         0               1
REDO LOG                              0                         0               0
ARCHIVED LOG                          0                         0               0
BACKUP PIECE                      75.66                     75.66              86
IMAGE COPY                            0                         0               0
FLASHBACK LOG                      3.96                         0             707
FOREIGN ARCHIVED LOG                  0                         0               0
7 rows selected.
Elapsed: 00:00:01.62

Numbers did look ok, some backup files could be reclaimed (Oracle should do it automatically). Lets have a look the ASM:

oracle@ssss (+ASM1)$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB   Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  4194304  55050240  26568696          5004567        10782064              0             N  DATA/
MOUNTED  NORMAL  N         512   4096  4194304  35900928   3192844          3263720          -35438              0             N  RECO/
MOUNTED  NORMAL  N         512   4096  4194304   4175360   4061640           379578         1841031              0             N  SYSTEMDG/
Elapsed: 00:00:01.62

Bingo, this is where our problem is. USABLE_FILE_MB (+RECO diskgroup) indicates the amount of free space that can be utilized, including the mirroring space, and being able to restore redundancy after a disk failure. A negative number on this column, could be critical in case of disk failure for the system as we might not have enough space perform a restore of all files to the surviving of disk.

Our backups goes to ASM and we copy them to tape afterwards. Our retention policy on disk is between 2 or 3 days, depending of the systems.

When I did check the contents of the backupset on ASM I found some old backups:

Type  Redund  Striped  Time             Sys  Name
Y    2011_07_17/
Y    2011_07_16/
Y    2011_07_15/
Y    2011_07_14/
Y    2011_07_13/
Y    2011_07_12/
Elapsed: 00:00:01.62

To delete those old backups I executed the following script from RMAN:

RMAN> delete backup tag EOD_DLY_110712 device type disk;
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1430 instance=<instance> device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=2138 instance=<instance> device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=8 instance=<instance> device type=DISK
allocated channel: ORA_DISK_4
channel ORA_DISK_4: SID=150 instance=<instance> device type=DISK
List of Backup Pieces
BP Key  BS Key  Pc# Cp# Status      Device Type Piece Name
------- ------- --- --- ----------- ----------- ----------
8292    4020    1   1   AVAILABLE   DISK        +RECO//backupset/2011_07_12/sssss_eod_dly_110712_0.nnnn.nnnnn
8293    4021    1   1   AVAILABLE   DISK        +RECO//backupset/2011_07_12/sssss_eod_dly_110712_0.nnnn.nnnnn
Do you really want to delete the above objects (enter YES or NO)? yes
deleted backup piece
backup piece handle=+RECO//backupset/2011_07_12/sssss_eod_dly_110712_0.nnnn.nnnnnn RECID=8292 STAMP=nnnn
deleted backup piece
backup piece handle=+RECO//backupset/2011_07_12/sssss_eod_dly_110712_0.nnnn.nnnnnn RECID=8293 STAMP=nnnn
Deleted 2 objects
Elapsed: 00:00:01.62

After deleting a two more old backups, the number looked much better:

State    Type    Rebal  Sector  Block       AU  Total_MB   Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  4194304  55050240  26568696          5004567        10782064              0             N  DATA/
MOUNTED  NORMAL  N         512   4096  4194304  35900928   3548260          3263720          142270              0             N  RECO/
MOUNTED  NORMAL  N         512   4096  4194304   4175360   4061640           379578         1841031              0             N  SYSTEMDG/

Note.- There is another temporary fix. I could changed db_recovery_file_dest to point to +DATA instead of +RECO, but as we have a guaranteed restore point, I thought releasing space from old backups was easier.

As always, comments are welcome.

David Alejo-Marcos.

David Marcos Consulting Ltd.

Unable to delete archive log and FRA is filling 7 September 2010

Posted by David Alejo Marcos in ASM, Oracle 11.2, RMAN.
Tags: , ,
4 comments

Couple of weeks ago we had a problem with one of our busiest databases.

The FRA was filling quite rapidly and we just could not free enough space. What made this problem interesting was the fact that Oracle was telling us more than 40% of the space was marked as reclaimable.

The problem:

We have automatic alerts when we are running out of space. When I had a look to the situation this is what I saw:

SQL> select * from v$flash_recovery_area_usage;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
------------ ------------------ ------------------------- ---------------
CONTROLFILE .01 0 1
ONLINELOG .89 0 9
ARCHIVELOG 51.29 49.67 1181
BACKUPPIECE 0 0 0
IMAGECOPY 0 0 0
FLASHBACKLOG 40.43 34.83 1658

My first thought was to remove old files using RMAN, after checking all standby databases were in sync I execute the following command (we have space enough to store 7 days worth of archivelogs, but I wanted to play safe as we had many changes during the weekend and one change in particular, generated 60+ GB of archivelogs):

rman
RMAN> connect target /
RMAN> delete noprompt archivelog until time 'SYSDATE - 3';
List of Archived Log Copies
Key Thrd Seq S Low Time Name
------- ---- ------- - ------------------- ----
228321 1 71005 A 07.08.2010 14:27:00 +FRA/xxx/archivelog/2010_08_07/thread_1_seq_71005.670.726416889
228323 1 71006 A 07.08.2010 14:28:09 +FRA/xxx/archivelog/2010_08_07/thread_1_seq_71006.2112.726416957
228330 1 71007 A 07.08.2010 14:29:16 +FRA/xxx/archivelog/2010_08_07/thread_1_seq_71007.1210.726417025
.....
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of delete command on ORA_DISK_1 channel at 08/09/2010 18:19:58
ORA-15028: ASM file '+FRA/xxx/archivelog/2010_08_07/thread_1_seq_71005.670.726416889' not dropped; currently being accessed

bingo, what should have been a simple operation now has just become more complicated…

The solution:

Technically we have plenty of space on the FRA, but Oracle is complaining that a file is being used. This file is old.

The only way I could fix the problem is by killing the process holding a lock on the file. This process has to be one of the arc processes.

In this case the solution was quite simple. The steps I followed were (bearing in mind that arc? processes are being restarted automatically by oracle):
1.- find the process id for arc:
ps -ef | grep -i ora_arc*
oracle 5607 1 1 19:02 ? 00:00:00 ora_arc9_prod1

2.- kill the running process:
kill -9 5607
3.- check the process is started again before killing more:
ps -ef | grep -i ora_arc9_prod1
4.- perform 2 and 3 for all arc? running for your instance.

Once we have killed all arc? processes and we are sure they have been re-started by Oracle I did the following:
1.- Perform couple of redo log switches and make sure are being propagated to all standby databases:

SQL> alter system switch logfile;

2.- Connect to RMAN and perform a backup of the archivelog “being in used”:
rman
RMAN> connect target /
RMAN> backup archivelog from sequence 71005 until sequence 71005 thread 1;

The output is:

Starting backup at 09.08.2010 18:50:02
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=620 instance=prod1 devtype=DISK
channel ORA_DISK_1: starting archive log backupset
channel ORA_DISK_1: specifying archive log(s) in backup set
input archive log thread=1 sequence=71005 recid=228321 stamp=726416895
channel ORA_DISK_1: starting piece 1 at 09.08.2010 18:50:04
channel ORA_DISK_1: finished piece 1 at 09.08.2010 18:50:11
piece handle=+FRA/xxx/backupset/2010_08_09/annnf0_tag20100809t185002_0.1097.726605405 tag=TAG20100809T185002 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:08
Finished backup at 09.08.2010 18:50:11
Starting Control File and SPFILE Autobackup at 09.08.2010 18:50:11
piece handle=/u04/oradata/prod/rman_disk/auto/PROD/c-4255406167-20100809-00_PROD comment=NONE
Finished Control File and SPFILE Autobackup at 09.08.2010 18:50:15

Once we have performed the backup, we can delete old archivelogs to release some space:

RMAN> delete noprompt archivelog until time 'SYSDATE - 3';
....
deleted archive log
archive log filename=+FRA/xxx/archivelog/2010_08_07/thread_1_seq_71005.670.726416889 recid=228321 stamp=726416895
deleted archive log
archive log filename=+FRA/xxx/archivelog/2010_08_07/thread_1_seq_71006.2112.726416957 recid=228323 stamp=726416962
....

3.- Monitor the space on the FRA:
SQL> select * from v$flash_recovery_area_usage;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
-------------------- ------------------ ------------------------- ---------------
CONTROLFILE .01 .00 1
ONLINELOG .89 .00 9
ARCHIVELOG 49.09 46.83 1168
BACKUPPIECE .00 .00 0
IMAGECOPY .00 .00 0
FLASHBACKLOG 40.22 33.91 1648

couple of times:

SQL> select * from v$flash_recovery_area_usage;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
-------------------- ------------------ ------------------------- ---------------
CONTROLFILE .01 .00 1
ONLINELOG .89 .00 9
ARCHIVELOG 3.10 .04 225
BACKUPPIECE .10 .00 1
IMAGECOPY .00 .00 0
FLASHBACKLOG 39.91 32.95 1633

As always, comments are welcome.

Upgrade clusterware from 10gR2 to 11gR2 7 September 2010

Posted by David Alejo Marcos in ASM, Oracle 11.2, RAC.
Tags: , ,
1 comment so far

With Oracle 10g becoming unsupported soon, more and more business will need to upgrade to 11g. My suggestion is to go for 11gR2 as it is more stable than 11gR1.

Most of our databases are running RAC. With this type of configuration, we need to upgrade the Clusterware before upgrading the databases.

The problem:

Upgrade Oracle Clusterware from 10.2.0.4 to 11.2.0.1.

The solution:

I was in charge of doing the upgrade on our DR environment (3-node RAC database). The same steps applies to production clusterware.

As we are running Linux the software we need is:

1.- linux.x64_11gR2_grid.zip

2.- p6880880_112000_Linux-x86-64.zip

3.- p9343627_112010_Linux-x86-64.zip

Note.- Please, make sure the patches listed above are still valid. I am aware Oracle release a new PSU1.

The steps detailed below were followed:

1.- Check the environment:

cd /u01/software/gridInfstructureUpgrade/grid
./runcluvfy.sh stage -pre crsinst -n server001a,server001b,server001c -r 11gR2 -verbose 2>&1 | tee /tmp/cluvfy.pre.crsinst_100807

2.- Make sure you have working public IP address for SCAN.

IP Addresses for SCAN:
scancrssav.
xx.xx.x.218, xx.xx.x.219, xx.xx.x.220 mask 255.255.255.192.

Checks resolves:
[oracle@server001a ~]$ host scancrssav.
scancrssav.. has address xx.xx.x.220
scancrssav.. has address xx.xx.x.219
scancrssav.. has address xx.xx.x.218

3.- Stop services and databases:
srvctl stop service -d
srvctl stop database -d

4.- Stop ASM on all nodes:
srvctl stop asm -n server001a
srvctl stop asm -n server001b
srvctl stop asm -n server001c

5.- If you are running dbconsole or Oracle Grid Agent, it should be stopped:
emctl stop dbconsole
or
emctl stop agent

6.- Create new CRS home on all nodes:
mkdir -p /u01/crs/oracle/product/11.2.0/grid

7.- backup OCR and Voting Disks:
cd /u01/crs/oracle/product/crs/bin

To find out location of OCR:
[oracle@server001a bin]$ ./ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 505808
Used space (kbytes) : 28548
Available space (kbytes) : 477260
ID : 1254611171
Device/File Name : /dev/mapper/voting-ocrp6
Device/File integrity check succeeded
Device/File Name : /dev/mapper/voting-ocrp5
Device/File integrity check succeeded
Cluster registry integrity check succeeded

To find out location voting disks:
[oracle@server001a bin]$ ./crsctl query css votedisk
0. 0 /dev/mapper/voting-ocrp2
1. 0 /dev/mapper/voting-ocrp3
2. 0 /dev/mapper/voting-ocrp1
located 3 votedisk(s).

Backup OCR:
Connect as Root.
cd /u01/crs/oracle/product/crs/bin
[root@server001a bin]# ./ocrconfig -export /u01/crs/backup/ocr_backup_100807
[root@server001a bin]# ls -lrt /u01/crs/backup/ocr_backup_100807
-rw-r--r-- 1 root root 752851 Aug 7 10:14 /u01/crs/backup/ocr_backup_100807

Backup Voting Disks:
[oracle@server001a bin]$ dd if=/dev/mapper/voting-ocrp1 of=/u01/crs/backup/voting-ocrp1_4k_100807
1012095+0 records in
1012095+0 records out
518192640 bytes (518 MB) copied, 9.39839 seconds, 55.1 MB/s

[oracle@server001a bin]$ dd if=/dev/mapper/voting-ocrp2 of=/u01/crs/backup/voting-ocrp2_4k_100807
1012095+0 records in
1012095+0 records out
518192640 bytes (518 MB) copied, 10.4548 seconds, 49.6 MB/s

[oracle@server001a bin]$ dd if=/dev/mapper/voting-ocrp3 of=/u01/crs/backup/voting-ocrp3_4k_100807
1012095+0 records in
1012095+0 records out
518192640 bytes (518 MB) copied, 9.85683 seconds, 52.6 MB/s

8.- Prepare environment
unset ORA_CRS_HOME
unset ORACLE_BASE
unset ORACLE_HOME
unset ORACLE_SID

set TEMP to a location with, at least 1GB free space:
export TEMP=/u01/software/temp
export TMP=$TEMP

Set the new ORACLE_HOME environment:
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=/u01/crs/oracle/product/11.2.0/grid

9.- Execute runInstaller.

10.- Upgrade Grid Infrastructure:

11.- yes to Existing ASM Instance detected. Are you sure you want to continue?:

12.- Choose language (English is the default language):

13.- Select nodes on the Cluster (all). Leave the tick on Upgrade Cluster Automatic Storage Manager (ASM):

14.- choose SCAN name (scancrssav.):

15.- ASM sysdba password:

16.- Choose groups for ASM authentication (we decide to use the same to avoid problems. This decision will be reviewed as part of our Internal Audit and Segregation of duties):

17.- if you choose the same group oracle complains (you will need to acknowledge the alert):

18.- Location for the software. Note, the image shows ORACLE_HOME as /u01/app, this should be /u01/crs
as Clusterware cannot share the same path than ORACLE_BASE:

19.- Oracle performs checks:

20.- Oracle warnings:

21.- Oracle gives you a script to fix:

This was the output in my case:
output:
[root@server001b CVU_11.2.0.1.0_oracle]# ./runfixup.sh
Response file being used is :./fixup.response
Enable file being used is :./fixup.enable
Log file location: ./orarun.log
Setting Kernel Parameters...
fs.file-max = 327679
fs.file-max = 6815744
net.ipv4.ip_local_port_range = 9000 65500
net.core.wmem_max = 262144
net.core.wmem_max = 1048576

22.- Oracle will provide you with a summary of the installation.
23.- Running:

24.- Execute Rootupgrade.sh on all nodes:

This is the output I had:
[root@server001a ~]# /u01/crs/oracle/product/11.2.0/grid/rootupgrade.sh
Running Oracle 11g root.sh script…

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/crs/oracle/product/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file “dbhome” already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
The file “oraenv” already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:
The file “coraenv” already exists in /usr/local/bin. Overwrite it? (y/n)
[n]:

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2010-08-07 11:08:17: Parsing the host name
2010-08-07 11:08:17: Checking for super user privileges
2010-08-07 11:08:17: User has super user privileges
Using configuration parameter file: /u01/crs/oracle/product/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
CSS appears healthy
EVM appears healthy
CRS appears healthy
Shutting down Oracle Cluster Ready Services (CRS):
Aug 07 11:08:46.458 | INF | daemon shutting down
Stopping resources. This could take several minutes.
Successfully stopped CRS resources.
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.
LOCAL ADD MODE
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.
root wallet
root wallet cert
root cert export
peer wallet
profile reader wallet
pa wallet
peer wallet keys
pa wallet keys
peer cert request
pa cert request
peer cert
pa cert
peer root cert TP
profile reader root cert TP
pa root cert TP
peer pa cert TP
pa peer cert TP
profile reader pa cert TP
profile reader peer cert TP
peer user cert
pa user cert
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
CRS-2672: Attempting to start ‘ora.mdnsd’ on ‘server001a’
CRS-2676: Start of ‘ora.mdnsd’ on ‘server001a’ succeeded
CRS-2672: Attempting to start ‘ora.gipcd’ on ‘server001a’
CRS-2676: Start of ‘ora.gipcd’ on ‘server001a’ succeeded
CRS-2672: Attempting to start ‘ora.gpnpd’ on ‘server001a’
CRS-2676: Start of ‘ora.gpnpd’ on ‘server001a’ succeeded
CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘server001a’
CRS-2676: Start of ‘ora.cssdmonitor’ on ‘server001a’ succeeded
CRS-2672: Attempting to start ‘ora.cssd’ on ‘server001a’
CRS-2672: Attempting to start ‘ora.diskmon’ on ‘server001a’
CRS-2676: Start of ‘ora.diskmon’ on ‘server001a’ succeeded
CRS-2676: Start of ‘ora.cssd’ on ‘server001a’ succeeded
CRS-2672: Attempting to start ‘ora.ctssd’ on ‘server001a’
CRS-2676: Start of ‘ora.ctssd’ on ‘server001a’ succeeded
CRS-2672: Attempting to start ‘ora.crsd’ on ‘server001a’
CRS-2676: Start of ‘ora.crsd’ on ‘server001a’ succeeded
CRS-2672: Attempting to start ‘ora.evmd’ on ‘server001a’
CRS-2676: Start of ‘ora.evmd’ on ‘server001a’ succeeded
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.

server001a 2010/08/07 11:12:45 /u01/crs/oracle/product/11.2.0/grid/cdata/server001a/backup_20100807_111245.olr
Preparing packages for installation…
cvuqdisk-1.0.7-1
Configure Oracle Grid Infrastructure for a Cluster … succeeded
Updating inventory properties for clusterware
Starting Oracle Universal Installer…

Checking swap space: must be greater than 500 MB. Actual 7997 MB Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oracle/oraInventory
‘UpdateNodeList’ was successful.
Starting Oracle Universal Installer…

Checking swap space: must be greater than 500 MB. Actual 7997 MB Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oracle/oraInventory
‘UpdateNodeList’ was successful.
[root@server001a ~]#

25.- After executing rootupgrade.sh, Oracle proceeds to upgrade the ASM (if we left the tick on the previous step):

26.- We had the following error while upgrading ASM:

This is what I could see from the logfile:
INFO: Found associated job
INFO: Starting 'Automatic Storage Management Configuration Assistant'
INFO: Starting 'Automatic Storage Management Configuration Assistant'
INFO: Executing ASMCA
INFO: Command /u01/crs/oracle/product/11.2.0/grid/bin/asmca -silent -upgradeASM -oui_internal
INFO: ... GenericInternalPlugIn.handleProcess() entered.
INFO: ... GenericInternalPlugIn: getting configAssistantParmas.
INFO: ... GenericInternalPlugIn: checking secretArguments.
INFO: ... GenericInternalPlugIn: starting read loop.
INFO: Read: ASMSNMP_PASSWORD_PROMPT
INFO: Processing: ASMSNMP_PASSWORD_PROMPT for argument tag -asmMonitorPassword
INFO: End of argument passing to stdin
INFO: Read:
INFO: Read: Failed to stop databases currently using ASM.
INFO: Read:
INFO: Completed Plugin named: Automatic Storage Management Configuration Assistant

This is because we have configured Clusterware to start databases automatically, and the rootupgrade.sh stops and starts CRS.

The solution in our case was to stop databases and ASM manually using srvctl.

After fixing the problem with ASM, we had an error with OCR:

INFO: ERROR:
INFO: PRVF-4178 : Check for OCR location "/dev/mapper/voting-ocrp6" failed on the following nodes:
INFO: server001a:Owner of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "root" ; Found = "oracle"];Permissions of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "0640" ; Found = "0660"]
INFO: server001c:Owner of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "root" ; Found = "oracle"];Permissions of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "0640" ; Found = "0660"]
INFO: server001b:Owner of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "root" ; Found = "oracle"];Permissions of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "0640" ; Found = "0660"]
INFO: Checking OCR location "/dev/mapper/voting-ocrp5"...
INFO: ERROR:

There is a solution provided in Metalink: Solution as per PRVF-4178 Reported from CLUVFY [ID 1058576.1]:

The group must be one of the groups assigned for ASM
chown root:oinstall /dev/mapper/voting-ocrp6
chown root:oinstall /dev/mapper/voting-ocrp5
chmod 0660 /dev/mapper/voting-ocrp6
chmod 0660 /dev/mapper/voting-ocrp5

but then we started to see a different error:

INFO: PRVF-4178 : Check for OCR location "/dev/mapper/voting-ocrp6" failed on the following nodes:
INFO: server001a:Permissions of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "0640" ; Found = "0660"]
INFO: server001c:Permissions of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "0640" ; Found = "0660"]
INFO: server001b:Permissions of file "/dev/mapper/voting-ocrp6" did not match the expected value. [Expected = "0640" ; Found = "0660"]
INFO: Checking OCR location "/dev/mapper/voting-ocrp5"...
INFO: ERROR:
INFO: PRVF-4178 : Check for OCR location "/dev/mapper/voting-ocrp5" failed on the following nodes:
INFO: server001a:Permissions of file "/dev/mapper/voting-ocrp5" did not match the expected value. [Expected = "0640" ; Found = "0660"]
INFO: server001c:Permissions of file "/dev/mapper/voting-ocrp5" did not match the expected value. [Expected = "0640" ; Found = "0660"]
INFO: server001b:Permissions of file "/dev/mapper/voting-ocrp5" did not match the expected value. [Expected = "0640" ; Found = "0660"]

so I decided to reach a compromise and leave it with 0660 as we will be moving it to ASM.

click “Skip” to move on.

Applying patches in Grid Infrastructure has changed completely and some security has been implemented (having to unlock and lock the environment).

I will post a new entry on how I did patch the environment.

As always, comments are welcome.

Move datafile to a different ASM diskgroup using RMAN 23 July 2010

Posted by David Alejo Marcos in ASM, Oracle 11.2, RMAN.
Tags: , ,
3 comments

Yesterday we received an email from one of the developers with the following error:

ORA-00376: file 68 cannot be read at this time
ORA-01110: data file 68: ‘+DATA/xxx/datafile/yyy.366.724161769’

This error came to me for investigation and a fix as soon as possible as the database is critical for testing.

Earlier during the day, we did a SnapClone from a local standby running on the same server, but this should not have any impact on this other database.

As we are using SAN storage, I asked the SAN Manager and the SA to see if there was a chance something might have gone wrong.

They say no.

The problem:

ORA-00376: file 68 cannot be read at this time
ORA-01110: data file 68: ‘+DATA/xxx/datafile/yyy.366.724161769’

The status of the datafile on dba_data_files was AVAILABLE, but the online_status from v$recover_file was OFFLINE.

The solution:

The first thing I did was to check the alert.log file for any anomaly. I found the following entry:

Sat Jul 10 19:55:58 2010
ORA-1683: unable to extend index yyyy.FC_CDS_CNTR_CRV_ISCLCLTD_IDX partition FC_CDS_CNTR_CRV_FEB10 by 128 in tablespace yyy
......
ORA-1653: unable to extend table yyy.FC_JOB_STATUS by 128 in tablespace yyy
alter tablespace yyy add datafile '+DATA' size 10G AUTOEXTEND ON NEXT 1G MAXSIZE unlimited
Completed: alter tablespace yyy add datafile '+DATA' size 10G AUTOEXTEND ON NEXT 1G MAXSIZE unlimited
alter tablespace yyy add datafile size 10G AUTOEXTEND ON NEXT 1G MAXSIZE unlimited
Completed: alter tablespace yyy add datafile size 10G AUTOEXTEND ON NEXT 1G MAXSIZE unlimited
undo_tablespace = UNDOTBS1
Successfully onlined Undo Tablespace 1.
undo_tablespace = UNDOTBS1
Successfully onlined Undo Tablespace 1.
[oracle@ukwhqaorc004 bdump]$ vi alert_xxx.log
Mon Jul 12 11:55:37 2010
SUCCESS: diskgroup DATA was mounted
SUCCESS: diskgroup DATA was dismounted
SUCCESS: diskgroup DATA was mounted
SUCCESS: diskgroup DATA was dismounted
SUCCESS: diskgroup DATA was mounted
SUCCESS: diskgroup DATA was dismounted
SUCCESS: diskgroup DATA was mounted
SUCCESS: diskgroup DATA was dismounted
SUCCESS: diskgroup DATA was mounted
SUCCESS: diskgroup DATA was dismounted
Mon Jul 12 12:02:47 2010
alter tablespace yyy add datafile '+DATA' size 10G AUTOEXTEND ON NEXT 1G MAXSIZE unlimited
Mon Jul 12 12:02:47 2010
SUCCESS: diskgroup DATA was mounted
Mon Jul 12 12:03:49 2010
SUCCESS: diskgroup DATA was dismounted
Mon Jul 12 12:03:49 2010
SUCCESS: diskgroup DATA was mounted
SUCCESS: diskgroup DATA was dismounted
SUCCESS: diskgroup DATA was mounted
Mon Jul 12 12:03:49 2010
Completed: alter tablespace yyy add datafile '+DATA' size 10G AUTOEXTEND ON NEXT 1G MAXSIZE unlimited
Mon Jul 12 12:11:11 2010

and earlier on the day:
Thu Jul 22 09:05:08 2010
Errors in file /u01/app/oracle/admin/whqaint/bdump/xxx_ckpt_522.trc:
ORA-01110: data file 68: '+DATA/xxx/datafile/yyy.366.724161769'
ORA-01115: IO error reading block from file 68 (block # 1)
ORA-15078: ASM diskgroup was forcibly dismounted
Thu Jul 22 09:05:23 2010
Errors in file /u01/app/oracle/admin/xxx/bdump/xxx_ckpt_522.trc:
ORA-01110: data file 68: '+DATA/xxx/datafile/yyy.366.724161769'
ORA-01115: IO error reading block from file 68 (block # 1)
ORA-15078: ASM diskgroup was forcibly dismounted
Thu Jul 22 09:05:38 2010
Errors in file /u01/app/oracle/admin/xxx/bdump/xxx_ckpt_522.trc:
ORA-01171: datafile 68 going offline due to error advancing checkpoint
ORA-01110: data file 68: '+DATA/xxx/datafile/xxx.366.724161769'
ORA-01115: IO error reading block from file 68 (block # 1)
ORA-15078: ASM diskgroup was forcibly dismounted

The key line is ORA-15078: ASM diskgroup was forcibly dismounted

As we are using ASM, the first step on our SnapClone procedure is to dismount several diskgroups, but technically, this operation should not impact the other database as both databases should use different diskgroups.

When the development DBA created the datafile to extend the tablespace, she did it on the wrong diskgroup (DATA) instead of (DATB). It is an easy mistake as the name is similar.

Now that we know the reason, it is fairly simple to fix it. The steps I followed are:
1.- Check the assumption is correct (datafile was created on a different diskgroup):
SQL> select file_name, status
from dba_data_files
where tablespace_name = 'xxx';
FILE_NAME STATUS
------------------------------------------------ ---------
+DATB/xxx/datafile/yyy.339.718314941 AVAILABLE
+DATB/xxx/datafile/yyy.358.718303181 AVAILABLE
+DATB/xxx/datafile/yyy.338.718314943 AVAILABLE
+DATB/xxx/datafile/yyy.309.718325181 AVAILABLE
+DATA/xxx/datafile/yyy.366.724161769 AVAILABLE
+DATB/xxx/datafile/yyy.281.724238729 AVAILABLE

2.- Change the status of the datafile to offline:
SQL> alter database datafile '+DATA/xxx/datafile/yyy.366.724161769' offline;

3.- Copy the datafile using RMAN:
[oracle@ssssssssssssss bdump]$ rman target /
Recovery Manager: Release 10.2.0.4.0 - Production on Thu Jul 22 15:52:52 2010
Copyright (c) 1982, 2007, Oracle. All rights reserved.
connected to target database: xxx (DBID=4255406167)
RMAN> copy datafile '+DATA/xxx/datafile/yyy.366.724161769' to '+DATB';

during the output of the command you can see the source and the destination of the datafile:
Starting backup at 22-JUL-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=403 devtype=DISK
channel ORA_DISK_1: starting datafile copy
input datafile fno=00068 name=+DATA/xxx/datafile/yyy.366.724161769
output filename=+DATB/xxx/datafile/yyy.280.725039609
tag=TAG20100722T155329 recid=1 stamp=725039763
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:02:35
Finished backup at 22-JUL-10

4.- Update the Oracle data dictionary with the new information:
SQL> alter database rename
file '+DATA/xxx/datafile/yyy.366.724161769'
to '+DATB/xxx/datafile/yyy.280.725039609';

Once the datafile has been renamed in the data dictionary, Oracle removes the old copy of the file in ASM

5.- Rename the ASM datafile copy using RMAN.
[oracle@ssssssssssssss bdump]$ rman target /
Recovery Manager: Release 10.2.0.4.0 - Production on Thu Jul 22 15:58:31 2010
Copyright (c) 1982, 2007, Oracle. All rights reserved.
connected to target database: xxx (DBID=4255406167)
RMAN> switch datafile '+DATB/xxxx/datafile/yyy.280.725039609' to copy;

The output should be similar to:

using target database control file instead of recovery catalog
datafile 68 switched to datafile copy "+DATB/xxx/datafile/yyy.280.725039609"

6.- Perform media recovery of the datafile:
RMAN> recover datafile '+DATB/xxx/datafile/yyy.280.725039609';

7.- Make the datafile available by changing the status to online:
SQL> alter database datafile '+DATB/xxx/datafile/yyy.280.725039609' online;

8.- Check the status and location of the datafile:
SQL> select file_name, status from dba_data_files where tablespace_name = 'yyy';
FILE_NAME STATUS
------------------------------------------------ ---------
+DATB/xxx/datafile/yyy.339.718314941 AVAILABLE
+DATB/xxx/datafile/yyy.358.718303181 AVAILABLE
+DATB/xxx/datafile/yyy.338.718314943 AVAILABLE
+DATB/xxx/datafile/yyy.309.718325181 AVAILABLE
+DATB/xxx/datafile/yyy.280.725039609 AVAILABLE
+DATB/xxx/datafile/yyy.281.724238729 AVAILABLE

and job done.

As always, comments are welcome.

Creating a 3-node standby database. Part III 29 April 2010

Posted by David Alejo Marcos in ASM, RAC, Standby.
Tags: , ,
2 comments

Time has arrived for us to test the results of our hard work by performing a switch between our single instance database to our three-node database.

I always recommend to start any test with a full backup if the space and time to do it does not jeopardize the deadlines.

Below is a list of some pre-checks worth doing:

1.- Check the full backup finished without errors.

2.- Make sure the init.ora file is the same on all 3 nodes (or pointing to the same spfile).

3.- The password file is correct (or recreate it) on all 3 nodes.

4.- check tnsnames.ora and listener.ora files are correct and the same on all 3 nodes.

5.- services are correct and running.

6.- peform a tnsping to and fro everywere to test connectivity and latency.

Ok, so we are ready for the switchover…., let’s do it:

1.- check the status of primary and standby databases:

SELECT switchover_status, database_role, open_mode FROM v$database;

2.- check the standby database is in synch ( I normally perfomr couple of switch logs and see that are being applied).

alter system switch logfile;
alter system switch logfile;

then you can check alert.log,  v$log_history, v$archived_log, etc…

3.- on Primary:

Disable job queue before the switchover on Primary Database (only in memory):

alter system set job_queue_processes=0;

4.- On primary:

Check for any active sessions appart from your own:

SELECT SID, PROCESS, PROGRAM
FROM GV$SESSION
WHERE TYPE = 'USER'
AND SID <> (SELECT DISTINCT SID FROM GV$MYSTAT);

5.- On primary

Perform switchover:

alter database commit to switchover to physical standby with session shutdown;
shutdown immediate;
startup mount;

On the alert.log of the new primary database you should see something like:

Sat Apr 24 12:13:17 2010
Media Recovery Log +EVB_RECOVERY/bvmprodsh/archivelog/2010_04_24/thread_1_seq_28879.314.717163985
Identified End-Of-Redo for thread 1 sequence 28879
Sat Apr 24 12:13:17 2010
Media Recovery End-Of-Redo indicator encountered
Sat Apr 24 12:13:17 2010
Media Recovery Applied until change 202225032180
Sat Apr 24 12:13:17 2010
MRP0: Media Recovery Complete: End-Of-REDO (bvmprod3)
Resetting standby activation ID 3970545885 (0xeca9b8dd)

6.- On all standby databases, check the status:

SELECT switchover_status, database_role, open_mode FROM v$database;

7.- Complete the switchover on the NEW Primary Database:

alter database commit to switchover to primary with session shutdown;
srvctl stop database -d bvmprodsh
srvctl start database -d bvmprodsh -i bvmprod1

8.- Enable block Change Tracking if you are using incremental backups is recommended:

alter database enable block change tracking using '+EVB_RECOVERY';

9.- On all standby databases (including the old production database), enable recovery:

alter database recover managed standby database disconnect;

10.- Perform several log switch and verify are being shipped and applied on standby databases:

alter system switch logfile;
alter system switch logfile;

Now you are on a cluster database… but there is some work to be done:

1.- Create undo tablespaces for all instances (this only has to be done the first time you open the RAC database):

create undo tablespace undotbs2 datafile size 5g;
create undo tablespace undotbs3 datafile size 5g;

alter system set undo_tablespace=’undotbs1′ scope=spfile sid=’bvmprod1′;
alter system set undo_tablespace=’undotbs2′ scope=spfile sid=’bvmprod2′;

alter system set undo_tablespace=’undotbs3′ scope=spfile sid=’bvmprod3′;

2.- Add logfiles to threads 2 and 3:

ALTER DATABASE ADD LOGFILE THREAD 2
GROUP 21 '+EVB_REDO1' SIZE 250M,
GROUP 22 '+EVB_REDO1' SIZE 250M,
GROUP 23 '+EVB_REDO1' SIZE 250M,
GROUP 24 '+EVB_REDO1' SIZE 250M,
GROUP 25 '+EVB_REDO1' SIZE 250M;

ALTER DATABASE ADD LOGFILE THREAD 3
GROUP 31 ‘+EVB_REDO1’ SIZE 250M,
GROUP 32 ‘+EVB_REDO1’ SIZE 250M,
GROUP 33 ‘+EVB_REDO1’ SIZE 250M,
GROUP 34 ‘+EVB_REDO1’ SIZE 250M,
GROUP 35 ‘+EVB_REDO1′ SIZE 250M;

alter system set thread=1 scope=spfile sid=’bvmprod1′;
alter system set thread=2 scope=spfile sid=’bvmprod2′;
alter system set thread=3 scope=spfile sid=’bvmprod3’;

3.- Define local_listeners (those entries must exists on the listener.ora on all 3 nodes):

alter system set local_listener='LISTENER_BVMPROD1' scope=spfile sid='bvmprod1';
alter system set local_listener='LISTENER_BVMPROD2' scope=spfile sid='bvmprod2';
alter system set local_listener='LISTENER_BVMPROD3' scope=spfile sid='bvmprod3';

4.- Defile remote_listener (entry must exists on the listener.ora on all 3 nodes):

alter system set remote_listener='listeners_bvmprod' scope=spfile sid='*';

5.- Some RAC parameters…

alter system set instance_number=1 scope=spfile sid='bvmprod1';
alter system set instance_number=2 scope=spfile sid='bvmprod2';
alter system set instance_number=3 scope=spfile sid='bvmprod3';

alter system set cluster_database=true scope=spfile sid=’*’;
alter system set cluster_database_instances=3 scope=spfile sid=’bvmprod3′;

alter database enable instance ‘bvmprod2’;
alter database enable instance ‘bvmprod3’;

alter database enable public thread 2;
alter database enable public thread 3;

6.- Stop the instance and start it as RAC:

srvctl stop database -d bvmprodsh -i bvmprod1
srvctl start database -d bvmprodsh
srvctl start service -d bvmprodsh
srvctl status database -d bvmprodsh
srvctl status service -d bvmprodsh

Thas it pretty much all, the only bit left is to test connectivity and keep monitoring archivelogs (or redo logs) are being transferred to all standby databases and applied.

As always, comments are welcome.

Creating a 3-node standby database. Part II 15 March 2010

Posted by David Alejo Marcos in ASM, RAC, RMAN, Standby.
Tags: , ,
add a comment

On Part I we installed the software and restored the database on a single node or a 3-node RAC Clusterware.

I will explain how I did configure the database on that  CRS and kept it in sync with the current production database. We have to bear in mind that the objective of this exercise is to move from the current production (single node) to an existing 3-node RAC using ASM.

Now we have a database that is an exact copy of our production database. We need to configure it as standby so we can keep it up to date until the migration.

1.- The first step is to create a copy of the current controlfile as standby:

$ rman target /
RMAN> backup current controlfile for standby format '/tmp/stdbyctl.bkp';

Once we have the controlfile for standby we copy it to the destination box using scp.

2.- We stop the database with shutdown immediate. It is important to check that LOG_FILE_NAME_CONVERT is defined using the correct directories to avoid problems in the following steps.

3.- Start the standby database as nomount.

4.- Restore the controlfile for standby:

$ rman nocatalog target /
RMAN> restore standby controlfile from '/tmp/stdbyctl.bkp';

5.- Mount standby database.

RMAN> alter database mount;

6.- Catalog datafiles of the standby database:

the following command will list all files and ask if you want them to be catalog. It is important to review the list and say YES if the list is correct. In our ASM configuration, the path for this database was ‘+EVB_DATA/BVMPRODSH/DATAFILE/’

RMAN> catalog start with '+EVB_DATA/BVMPRODSH/DATAFILE/';

7.- Once the command has finished, we have to “commit” those changes on the controlfile:

RMAN> switch database to copy;
RMAN> EXIT;

8.- If you were using Flashback, it is highly recommended to re-enable it again to avoid problems:

$ sqlplus / as sysdba
SQL> alter database flashback off;

Database altered.

SQL> alter database flashback on;

9.- Now we have to perform some housekeeping… Starting with the redo logs:

9.1.- Find out how many groups we have on the database and clear them:

SQL> select group# from v$log;

GROUP#
———-
1
2
3

SQL> alter database clear logfile group 1;

Database altered.

SQL> alter database clear logfile group 2;

Database altered.

SQL> alter database clear logfile group 3;

Database altered.

9.2.- Now we have to do the same for the standby redo logs:

SQL> select group# from v$standby_log;

GROUP#
———-
4
5
6

SQL> alter database clear logfile group 4;

Database altered.

SQL> alter database clear logfile group 5;

Database altered.

SQL> alter database clear logfile group 6;

Database altered.

OR

9.3.- Create standby redo logs if there are no present:

SQL> select group# from v$standby_log;

no row selected

SQL> alter database add standby logfile group 4 size 250m;

Database altered.

SQL> alter database add standby logfile group 5 size 250m;

Database altered.

SQL> alter database add standby logfile group 6 size 250m;

Database altered.

9.4.- Create spfile in ASM.

This step is not a must, but it is a good idea to use spfiles due to the flexibility that provides:

SQL> create spfile='+EVB_DATA' from pfile;

SQL> shutdown immediate

Edit init.ora file and paste the location of the spfile. This can be found using asmcmd under the diskgroup +EVB_DATA/BVMPRODSH/parameterfile

$ vi $ORACLE_HOME/dbms/init.ora
spfile='+EVB_DATA/BVMPRODSH/parameterfile/spfile.309.712071991'

SQL> startup mount;

10.- Configure the standby parameters, tnsnames.ora,…

10.1.- create entries on the TNSNames file to point to the current production on the standby server. Check you can connect as sys and you can tnsping to the standby server.

10.2.- create entries on the TNSNames file to point to the standby server on the current production server. Check you can connect as sys and you can tnsping to the production server.

10.3.- modify log_archive_config to have the primary database and ALL standby databases on all databases:

SQL> alter system set log_archive_config='DG_CONFIG=(<primary>,<standby1>,<standby2>...)' scope=both sid = '*';

10.4.- configure log_archive_dest_x to point to the new standby database on the primary database. Check log_archive_dest_state_x is enabled:

SQL> alter system set log_archive_dest_3='SERVICE= LGWR ASYNC VALID_FOR=(ALL_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=' scope=both sid = '*';

10.5.- configure log_archive_dest_x to point to the production database on the new standby database. Check log_archive_dest_state_x is enabled:

SQL> alter system set log_archive_dest_3='SERVICE=<db_unique_name> LGWR ASYNC VALID_FOR=(ALL_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=<db_unique_name>' scope=both sid = '*';

10.6.- Configure FAL_CLIENT and FAL_SERVER for gap resolution for primary and standby databases:

SQL> alter system set fal_client=<db_unique_name> scope=both sid='*';
SQL> alter system set fal_server=<db_unique_name for primary and standby databases> scope=both sid='*';

11.- We are good to start the recovery process on this standby database:

SQL> alter database recover managed standby database disconnect from session;

Database altered.

SQL> exit;

12.- Check the primary and standby databases are working properly by switching couple of logfiles and checking are being transferred and applied on all standby databases:

On primary database:

SQL> alter system switch logfile;

Now we have to register the database on the CRS…

1.- Add database
srvctl add database -d bvmprodsh -o -n evbprod

2.- Add instances
srvctl add instance -d bvmprodsh -i bvmprod1 -n
srvctl add instance -d bvmprodsh -i bvmprod2 -n
srvctl add instance -d bvmprodsh -i bvmprod3 -n

3.- Modify instance to add ASM
srvctl modify instance -d bvmprodsh -i bvmprod1 -s +ASM1
srvctl modify instance -d bvmprodsh -i bvmprod2 -s +ASM2
srvctl modify instance -d bvmprodsh -i bvmprod3 -s +ASM3

4.- Create and start services:
$ srvctl add service -d bvmprodsh -s evbprod -r <primary_node1>,<primary_node2>,... -a <available_node1><available_node2>,...
$ srvctl start service -d bvmprodsh -s evbprod
$ srvctl status service -d bvmprodsh

On part III, I will explain how we tested the configuration by doing a switchover from the current production single instance database to the 3-node RAC standby database.

As always, comments are welcome.

Creating a 3-node standby database. Part I 10 March 2010

Posted by David Alejo Marcos in ASM, Enterprise Manager, RAC, Standby.
Tags: , ,
add a comment

As I promised, I will be writing on how I did create a 3-node standby database.

The goal is to move from a single instance database to a 3-node RAC. The steps we are following, as the project is still going, are:

1.- Create a 3-node standby database in the UK. This will be our primary database.

2.- Create a 3-node standby database in the US. This will be our standby database.

3.- Keep current standby up to date.

4.- Test switchover from current single instance production database to 3-node RAC in the UK, then to the 3-node RAC in the US, back to 3-node RAC in the UK and, finally, back to current primary.

5.- Do the switchover for real.

After the switchover, we will keep current production and current DR up to date for rollback.

In this post I will cover the first point. The installation of the software and configuration as standby in the US is quite similar to the configuration we followed for the UK.

I did not have to install CRS as our target servers have a 3-node RAC database currently running. We will be sharing those servers for 2 different databases.

The first step is to download the software from oracle. Our target was 10.2.0.4.1 for Linux X86-64.

Steps to follow:

Installation of Software:

1.- Install Oracle 10.2.0.1 (p4679769_10201_Linux-x86-64.zip).

If you choose the Advance Installation, the installer will identify the server as part of CRS and will prompt you to installed on all servers. My suggestion is to accept, will save you time and headaches.

2.- Install Oracle 10.2.0.4 (p6079224_10204_Linux-x86-64.zip).

3.- Install Opatch 11, necessary for patches and PSU.

4.- Install  PSU 10.2.0.4.1 (p8576156_10204_Linux-x86-64.zip).

Please, bear in mind that the PSU will be installed in rolling mode (all nodes on the CRS).

Once we have the required software on all three nodes we need the storage:

ASM:

We use ASM and ASMLib as standard for our Oracle databases. Once we have checked that we have the required LUNs we need to add them to ASMLib:

ls -lrt /dev/mapper to check the names and sizes:

ls -lr /dev/mapper:
…….
brw-rw—- 1 root disk 253, 16 Nov 28 11:34 bvm_fra
brw-rw—- 1 root disk 253, 15 Nov 28 11:34 bvm_data
brw-rw—- 1 root disk 253, 17 Nov 28 11:34 bvm_ctl_log

For ASMLib we need to connect as Root. The first step is to create Disks:

# cd /etc/init.d

# ./oracleasm createdisk DATA_BVM_1 /dev/mapper/bvm_data_1
Marking disk “DATA_BVM_1” as an ASM disk:                  [  OK  ]

# ./oracleasm createdisk DATA_BVM_2 /dev/mapper/bvm_data_2
Marking disk “DATA_BVM_2” as an ASM disk:                  [  OK  ]

# ./oracleasm createdisk DATA_BVM_3 /dev/mapper/bvm_data_3
Marking disk “DATA_BVM_3” as an ASM disk:                  [  OK  ]

# ./oracleasm createdisk FRA_BVM /dev/mapper/bvm_fra
Marking disk “FRA_BVM” as an ASM disk:                     [  OK  ]

# ./oracleasm createdisk LOGCTL_BVM_1 /dev/mapper/bvm_ctl_log_1
Marking disk “LOGCTL_BVM_1” as an ASM disk:                [  OK  ]

Then we check those disks are visible from ALL 3 nodes. I prefer to execute the following commands on all 3 nodes to avoid problems:

# ./oracleasm scandisks
# ./oracleasm listdisks

If you prefer, you can list the contents of /dev/oracleasm/disks.

Now we need to create the diskgroups.

For this we need to connect as sys on the ASM instance. We use external redundancy for our databases:

1.- Set the environment to ASM using oraenv and connect as sys.

2.- Create diskgroups:

SQL> create diskgroup DATA_BVM external redundancy disk ‘ORCL:DATA_BVM_1′,’ORCL:DATA_BVM_2′,’ORCL:DATA_BVM_3’;

Diskgroup created.

SQL> create diskgroup FRA_BVM external redundancy disk ‘ORCL:FRA_BVM’;

Diskgroup created.

SQL> create diskgroup LOGCTL_BVM external redundancy disk ‘ORCL:LOGCTL_BVM_1′;

Diskgroup created.

set linesize 180
col path form a50
select name, header_status, state, path from v$asm_disk;

NAME                       HEADER_STATU STATE    PATH
———————- ———— ——– ————————————————–
……….
DATA_BVM                   MEMBER       NORMAL   ORCL:DATA_BVM
FRA_BVM                      MEMBER       NORMAL   ORCL:FRA_BVM
LOGCTL_BVM                MEMBER       NORMAL   ORCL:LOGCTL_BVM

17 rows selected.

3.- Mount diskgroups on ALL ASM instances on the CRS.

SQL> alter diskgroup LOGCTL_BVM mount;

….

Now we can create the database.

There are different ways to create a database, DBCA, export-import, RMAN, etc.

The way I decided to create this standby database was using RMAN (duplicate target command); it is very straight forward.

the syntax I used is:

rman target sys/xxxx@prod
connect auxiliary /
run {
set until time “to_date(’06-MAR-2010 08:00:00′,’DD-MON-YYYY HH24:MI:SS’)” ;
ALLOCATE AUXILIARY CHANNEL c1 DEVICE TYPE disk;
ALLOCATE AUXILIARY CHANNEL c2 DEVICE TYPE disk;
duplicate target database to preprod nofilenamecheck;
}

you will need to have an entry on the tnsnames.ora to connect to production. Full backup was copied from production to one of the nodes.

On Part II, I will start with the dataguard configuration and the DR test.

OUI-10150: Datafile storage location for RAC should be on shared file system. 5 March 2010

Posted by David Alejo Marcos in ASM, RAC.
Tags: , ,
add a comment

I spent the last two weekends installing and configuring two new DRs for one of our main databases. I will cover this job in a different post, but the interesting part was to solve a problem no the installation of the software itself.

Lets see the problem and how to solve it.

The problem:

We have a three node cluster already running, we want to install a brand new Oracle Home for a new 3-node database sharing the same CRS.

so far so good. We have 10.2.0.1 binaries for Linux 64 x86-64, patch to upgrade to 10.2.0.4,  patch for OPatch 11 and PSU 1.

we connect to one of the nodes (a/b/c) and execute ./runInstaller.

Oracle installer recognises the cluster and ask us if we want to install it on all 3 nodes… Sure, it will save us time… and then:

For some reason, the storage is not properly detected… Even more, why do we need to install the software on a shared file system?

I did not find much information regarding this problem, and the extract of the install.log does not say much:

localNode = uswhdrorc001c
INFO: Query Returned: false
INFO: *** Error Dialog: OUI-10150:Error: The datafile storage location for Oracle Real Application Clusters should be on a shared file system. in component Oracle Database 10g 10.2.0.1.0 . Installation cannot continue for this component. ***
INFO: User Selected: Stop installation of all products.

INFO: The datafile storage location for Oracle Real Application Clusters should be on a shared file system.
INFO: Setting variable ‘PREREQ_CONFIG_LOCATION’ to ”. Received the value from variable association.
INFO: This installation is being performed using response file /u01/software/database/install/response/ee.rsp.

We did not learn much about this message.

One of the guys suggested to unset all Oracle variables on the environment, from ORACLE_SID and ORACLE_HOME to ORA_CRS_HOME.

it helped on the first installation (London data centre), but not on the US installation (New York data centre).

So, what was the difference?

The Solution:

simple, very simple, like most riddles once you know the answer.

The way we tried to install the software was incorrect. The very first page you are presented is:

Looks familiar, right? If you select the option “Basic Install” (with or without database) it will fail, regardless of the version you are trying to install (yes, including “Enterprise Edition”).

you have to choose “Advance Installation” to deploy the software on more than one node, regardless if the session has the ORACLE_SID, ORACLE_HOME, ORA_CRS_HOME or any other oracle variable:

As you can see on the figure above, for that session I had ORACLE_HOME and ORACLE_SID defined (in fact, the whole environment, including library path) and worked, simply by choosing the “Advance Installation”.

lesson learned?, working too many weekends in a row can and will takes its toll and you may make very simple mistakes.

As always, comments are welcome.