jump to navigation

Getting into Exadata. 25 June 2011

Posted by David Alejo Marcos in Exadata.
Tags:
add a comment

It has been some time since I wrote on my blog, mostly due to work commitments.

There has also been some important changes in my life among them, starting my own company; David Marcos Consulting Ltd.

I have also being lucky to start working with Oracle Exadata.

I will start posting very soon regarding my impression, findings, etc on Oracle Exadata but, I must admit, I am very excited to be able to work  on Oracle latest product.

As always, comments are welcome.

Testing your Oracle Standby configuration using Oracle Broker. 4 April 2011

Posted by David Alejo Marcos in Dataguard Broker, Oracle 11.2, Standby.
Tags: , ,
add a comment

Several weeks ago I wrote a post regarding testing your standby database ( https://davidalejomarcos.wordpress.com/2011/03/01/how-to-test-your-dataguard-configuration-standby-database/ ). The database I tested did not have Oracle Broker configured.

Today I will show you how I did perform a switchover with Oracle Broker configured.

The problem:

Perform a switchover using Oracle Broker.

The solution:

It proved to be quite simple and trouble-free.

Below are the steps I followed:

Note.- Primary is my production primary database during all the exercise, independently of the role the database is running. PRMRY relates to the commands to be run on the current production database.

Note.- Standby is my production standby database during all the exercise, independently of the role the database is running. STNDBY relates to the commands to be run on the current standby database.

1.- Check primary and standby database are in sync:

PRMRY> select sequence# seq,thread# thread  from v$log where status = 'CURRENT' order by thread#;

SEQ     THREAD
---------- ----------
33388        1

PRMRY> alter system switch logfile;

System altered.

PRMRY> /

System altered.

PRMRY> /

System altered.

PRMRY> select sequence# seq,thread# thread  from v$log where status = 'CURRENT' order by thread#;

SEQ     THREAD
---------- ----------
33391        1

PRMRY> exit

STNDBY> select distinct max( sequence#) over (partition by thread# order by thread#) seq, thread# thread from gv$log_history;

SEQ     THREAD
---------- ----------
33390        1

STNDBY> exit

2.- Connect to Oracle Broker from the server where PRMRY is running and:

[oracle@sssss ~]$ dgmgrl
DGMGRL for Linux: Version 11.2.0.1.0 - 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect /
Connected.

2.1.- Check the configuration for errors:

DGMGRL> show configuration

Configuration - ssss_DG

Protection Mode: MaxPerformance
Databases:
PRIMARY    - Primary database
STANDBY    - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

2.2.- Check primary is in “TRANSPORT-ON” state


DGMGRL> show database primary

Database - primary

Enterprise Manager Name: sss_primary
Role:                    PRIMARY
Intended State:          TRANSPORT-ON
Instance(s):
sss

Database Status:
SUCCESS

2.3.- standby database is in “APPLY-ON” state:

DGMGRL> show database standby

Database - standby

Enterprise Manager Name: sss_standby
Role:                    PHYSICAL STANDBY
Intended State:          APPLY-ON
Transport Lag:           0 seconds
Apply Lag:               0 seconds
Real Time Query:         OFF
Instance(s):
sss

Database Status:
SUCCESS

3.- Perform the switchover using Oracle Broker:

DGMGRL> switchover to standby
Performing switchover NOW, please wait...
New primary database "standby" is opening...
Operation requires shutdown of instance "sss" on database "primary"
Shutting down instance "sss"...
ORA-01031: insufficient privileges

Warning: You are no longer connected to ORACLE.

Please complete the following steps to finish switchover:
shut down instance "sss" of database "primary"
start up and mount instance "sss" of database "primary"

DGMGRL>

4.- Restart the new standby database and register it with the listener:

[oracle@ssss ~]$ sqlplus '/as sysdba'

SQL*Plus: Release 11.2.0.1.0 Production on Sat Apr 2 09:02:20 2011

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning option

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
MOUNTED

SQL> shutdown immediate

ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.

SQL> startup mount

Total System Global Area 1.2827E+10 bytes
Fixed Size            2225336 bytes
Variable Size         1207962440 bytes
Database Buffers     1.1610E+10 bytes
Redo Buffers            7348224 bytes
Database mounted.

SQL> alter system register;

System altered.

SQL> exit

5.- Check log files are being sent and applied on the new standby database.

As always, comments are welcome.

————————-

Appendix:

Below is the section of the primary database while switching over to standby:

Sat Apr 02 08:58:37 2011
ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY WITH SESSION SHUTDOWN
ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY  [Process Id: 8432] (sss)
Sat Apr 02 08:58:37 2011
Thread 1 cannot allocate new log, sequence 33392
Private strand flush not complete
Current log# 3 seq# 33391 mem# 0: /u03/oradata/sss/redo03.rdo
Thread 1 advanced to log sequence 33392 (LGWR switch)
Current log# 1 seq# 33392 mem# 0: /u03/oradata/sss/redo01.rdo
Waiting for all non-current ORLs to be archived...
Waiting for the ORL for thread 1 sequence 33391 to be archived...
Sat Apr 02 08:58:44 2011
Archived Log entry 68782 added for thread 1 sequence 33391 ID 0x2a7ee921 dest 1:
Sat Apr 02 08:58:51 2011
ORL for thread 1 sequence 33391 has been archived...
All non-current ORLs have been archived.
Waiting for all FAL entries to be archived...
All FAL entries have been archived.
Waiting for dest_id 2 to become synchronized...
Sat Apr 02 08:59:01 2011
Active, synchronized Physical Standby  switchover target has been identified
Sat Apr 02 08:59:01 2011
Thread 1 cannot allocate new log, sequence 33393
Private strand flush not complete
Current log# 1 seq# 33392 mem# 0: /u03/oradata/sss/redo01.rdo
Thread 1 advanced to log sequence 33393 (LGWR switch)
Current log# 2 seq# 33393 mem# 0: /u03/oradata/sss/redo02.rdo
ARCH: Standby redo logfile selected for thread 1 sequence 33392 for destination LOG_ARCHIVE_DEST_2
Archived Log entry 68783 added for thread 1 sequence 33392 ID 0x2a7ee921 dest 1:
Sat Apr 02 08:59:01 2011
Stopping background process CJQ0
Sat Apr 02 08:59:01 2011
SMON: disabling tx recovery
Stopping background process QMNC
CLOSE: killing server sessions.
Sat Apr 02 08:59:14 2011
CLOSE: all sessions shutdown successfully.
Sat Apr 02 08:59:14 2011
SMON: disabling cache recovery
Sat Apr 02 08:59:15 2011
Shutting down archive processes
Archiving is disabled
Sat Apr 02 08:59:15 2011
ARCH shutting down
Sat Apr 02 08:59:15 2011
ARCH shutting down
ARC9: Archival stopped
ARC7: Archival stopped
Sat Apr 02 08:59:15 2011
ARCH shutting down
ARC5: Archival stopped
Sat Apr 02 08:59:15 2011
ARCH shutting down
Sat Apr 02 08:59:15 2011
ARCH shutting down
ARC3: Archival stopped
ARC0: Archival stopped
Sat Apr 02 08:59:15 2011
ARCH shutting down
Sat Apr 02 08:59:15 2011
ARCH shutting down
ARC4: Archival stopped
ARC6: Archival stopped
Sat Apr 02 08:59:15 2011
ARCH shutting down
ARC2: Archival stopped
Sat Apr 02 08:59:15 2011
ARCH shutting down
ARC8: Archival stopped
Sat Apr 02 08:59:15 2011
ARCH shutting down
ARC1: Archival stopped
Thread 1 closed at log sequence 33393
Successful close of redo thread 1
ARCH: Noswitch archival of thread 1, sequence 33393
ARCH: End-Of-Redo Branch archival of thread 1 sequence 33393
Archived Log entry 68785 added for thread 1 sequence 33393 ID 0x2a7ee921 dest 1:
ARCH: Archiving is disabled due to current logfile archival
Primary will check for some target standby to have received all redo
Final check for a synchronized target standby. Check will be made once.
LOG_ARCHIVE_DEST_2 is a potential Physical Standby  switchover target
Active, synchronized target has been identified
Target has also applied all redo
Backup controlfile written to trace file /opt/oracle/diag/rdbms/primary/sss/trace/sss_rsm0_8432.trc
Clearing standby activation ID 712960289 (0x2a7ee921)
The primary database controlfile was created using the
'MAXLOGFILES 12' clause.
There is space for up to 9 standby redo logfiles
Use the following SQL commands on the standby database to create
standby redo logfiles that match the primary database:
ALTER DATABASE ADD STANDBY LOGFILE 'srl1.f' SIZE 262144000;
ALTER DATABASE ADD STANDBY LOGFILE 'srl2.f' SIZE 262144000;
ALTER DATABASE ADD STANDBY LOGFILE 'srl3.f' SIZE 262144000;
ALTER DATABASE ADD STANDBY LOGFILE 'srl4.f' SIZE 262144000;
Archivelog for thread 1 sequence 33393 required for standby recovery
Switchover: Primary controlfile converted to standby controlfile succesfully.
Sat Apr 02 08:59:19 2011
MRP0 started with pid=18, OS id=10456
MRP0: Background Managed Standby Recovery process started (sss)
Serial Media Recovery started
Managed Standby Recovery not using Real Time Apply
Online logfile pre-clearing operation disabled by switchover
Media Recovery Log /u01/oradata/sss/arch/log1_33393_708427666.arc
Identified End-Of-Redo for thread 1 sequence 33393
Resetting standby activation ID 0 (0x0)
Media Recovery End-Of-Redo indicator encountered
Media Recovery Applied until change 241462639865
MRP0: Media Recovery Complete: End-Of-REDO (sss)
MRP0: Background Media Recovery process shutdown (sss)
Sat Apr 02 08:59:25 2011
Switchover: Complete - Database shutdown required (sss)
Completed: ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY WITH SESSION SHUTDOWN
Using STANDBY_ARCHIVE_DEST parameter default value as /u01/oradata/sss/arch
ALTER SYSTEM SET log_archive_dest_2='' SCOPE=BOTH;
Using STANDBY_ARCHIVE_DEST parameter default value as /u01/oradata/sss/arch
ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=BOTH;
Sat Apr 02 08:59:29 2011
RFS[1]: Assigned to RFS process 11886
RFS[1]: Identified database type as 'physical standby': Client is ARCH pid 5025
RFS[1]: Opened log for thread 1 sequence 33394 dbid 666702615 branch 708427666
Sat Apr 02 08:59:29 2011
RFS[2]: Assigned to RFS process 11894
RFS[2]: Identified database type as 'physical standby': Client is ARCH pid 5021
Sat Apr 02 08:59:29 2011
RFS[3]: Assigned to RFS process 11898
RFS[3]: Identified database type as 'physical standby': Client is ARCH pid 5029
RFS[3]: Opened log for thread 1 sequence 33395 dbid 666702615 branch 708427666
Archived Log entry 68787 added for thread 1 sequence 33394 rlc 708427666 ID 0x2cbe3767 dest 2:
Archived Log entry 68788 added for thread 1 sequence 33395 rlc 708427666 ID 0x2cbe3767 dest 2:
Sat Apr 02 08:59:33 2011
RFS[4]: Assigned to RFS process 12348
RFS[4]: Identified database type as 'physical standby': Client is ARCH pid 5037
RFS[4]: Opened log for thread 1 sequence 33396 dbid 666702615 branch 708427666
Archived Log entry 68789 added for thread 1 sequence 33396 rlc 708427666 ID 0x2cbe3767 dest 2:
Sat Apr 02 08:59:33 2011
RFS[5]: Assigned to RFS process 12342
RFS[5]: Identified database type as 'physical standby': Client is LGWR ASYNC pid 24051
Primary database is in MAXIMUM PERFORMANCE mode
RFS[5]: Opened log for thread 1 sequence 33397 dbid 666702615 branch 708427666
Archived Log entry 68790 added for thread 1 sequence 33397 rlc 708427666 ID 0x2cbe3767 dest 2:
RFS[5]: Opened log for thread 1 sequence 33398 dbid 666702615 branch 708427666
Sat Apr 02 09:00:17 2011
RFS[6]: Assigned to RFS process 18674
RFS[6]: Identified database type as 'physical standby': Client is ARCH pid 5021
Sat Apr 02 09:00:23 2011
Archived Log entry 68791 added for thread 1 sequence 33398 rlc 708427666 ID 0x2cbe3767 dest 2:
RFS[5]: Opened log for thread 1 sequence 33399 dbid 666702615 branch 708427666

and here is the section of the alert.log file of the production standby database during the switchover:

Sat Apr 02 08:58:41 2011
Media Recovery Waiting for thread 1 sequence 33392
Sat Apr 02 08:58:41 2011
Archived Log entry 34084 added for thread 1 sequence 33391 ID 0x2a7ee921 dest 1:
Sat Apr 02 08:58:51 2011
RFS[6]: Assigned to RFS process 23317
RFS[6]: Identified database type as 'physical standby': Client is ARCH pid 8452
Sat Apr 02 08:59:01 2011
RFS[7]: Assigned to RFS process 23326
RFS[7]: Identified database type as 'physical standby': Client is Foreground pid 8432
RFS[7]: Selected log 6 for thread 1 sequence 33392 dbid 666702615 branch 708427666
Sat Apr 02 08:59:01 2011
Archived Log entry 34085 added for thread 1 sequence 33392 ID 0x2a7ee921 dest 1:
Sat Apr 02 08:59:02 2011
Media Recovery Log /u01/oradata/sss/arch/log1_33392_708427666.arc
Media Recovery Waiting for thread 1 sequence 33393
Sat Apr 02 08:59:11 2011
RFS[8]: Assigned to RFS process 23340
RFS[8]: Identified database type as 'physical standby': Client is ARCH pid 8452
Sat Apr 02 08:59:18 2011
RFS[9]: Assigned to RFS process 24039
RFS[9]: Identified database type as 'physical standby': Client is Foreground pid 8432
RFS[9]: Opened log for thread 1 sequence 33393 dbid 666702615 branch 708427666
Archived Log entry 34086 added for thread 1 sequence 33393 rlc 708427666 ID 0x2a7ee921 dest 2:
Sat Apr 02 08:59:18 2011
Media Recovery Log /u01/oradata/sss/arch/log1_33393_708427666.arc
Identified End-Of-Redo for thread 1 sequence 33393
Resetting standby activation ID 712960289 (0x2a7ee921)
Media Recovery End-Of-Redo indicator encountered
Media Recovery Continuing
Resetting standby activation ID 712960289 (0x2a7ee921)
Media Recovery Waiting for thread 1 sequence 33394
Sat Apr 02 08:59:19 2011
RFS[10]: Assigned to RFS process 24045
RFS[10]: Identified database type as 'physical standby': Client is Foreground pid 8432
Sat Apr 02 08:59:25 2011
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
MRP0: Background Media Recovery cancelled with status 16037
Errors in file /opt/oracle/diag/rdbms/standby/sss/trace/sss_pr00_6126.trc:
ORA-16037: user requested cancel of managed recovery operation
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Waiting for MRP0 pid 6122 to terminate
Errors in file /opt/oracle/diag/rdbms/standby/sss/trace/sss_pr00_6126.trc:
ORA-16037: user requested cancel of managed recovery operation
Sat Apr 02 08:59:26 2011
MRP0: Background Media Recovery process shutdown (sss)
Managed Standby Recovery Canceled (sss)
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WAIT WITH SESSION SHUTDOWN
ALTER DATABASE SWITCHOVER TO PRIMARY (sss)
Maximum wait for role transition is 15 minutes.
Backup controlfile written to trace file /opt/oracle/diag/rdbms/standby/sss/trace/sss_rsm0_5074.trc
SwitchOver after complete recovery through change 241462639865
Online log /u03/oradata/sss/redo01.rdo: Thread 1 Group 1 was previously cleared
Online log /u03/oradata/sss/redo02.rdo: Thread 1 Group 2 was previously cleared
Online log /u03/oradata/sss/redo03.rdo: Thread 1 Group 3 was previously cleared
Standby became primary SCN: 241462639863
Switchover: Complete - Database mounted as primary
Completed: ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WAIT WITH SESSION SHUTDOWN
Sat Apr 02 08:59:26 2011
ARC7: Becoming the 'no SRL' ARCH
Sat Apr 02 08:59:29 2011
ARC8: Becoming the 'no SRL' ARCH
ALTER SYSTEM SET log_archive_dest_2='service="horvitz"','LGWR ASYNC NOAFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300 db_unique_name="horvitz" net_timeout=30','valid_for=(all_logfiles,primary_role)' SCOPE=BOTH;
ALTER DATABASE OPEN
Data Guard Broker initializing...
Switching redo format version from 11.1.0.0.0 to 11.2.0.0.0 at change 241462639868
Thread 1 advanced to log sequence 33395 (COMPATIBLE advance)
Sat Apr 02 08:59:29 2011
Assigning activation ID 750663527 (0x2cbe3767)
Thread 1 advanced to log sequence 33396 (thread open)
Sat Apr 02 08:59:29 2011
ARC9: Becoming the 'no SRL' ARCH
Thread 1 opened at log sequence 33396
Current log# 3 seq# 33396 mem# 0: /u03/oradata/sss/redo03.rdo
Successful open of redo thread 1
Sat Apr 02 08:59:29 2011
ARC0: Becoming the 'no SRL' ARCH
Archived Log entry 34087 added for thread 1 sequence 33394 ID 0x2cbe3767 dest 1:
Sat Apr 02 08:59:29 2011
NSA2 started with pid=32, OS id=24051
Sat Apr 02 08:59:29 2011
ARC1: Becoming the 'no SRL' ARCH
Archived Log entry 34088 added for thread 1 sequence 33395 ID 0x2cbe3767 dest 1:
Sat Apr 02 08:59:29 2011
SMON: enabling cache recovery
Thread 1 advanced to log sequence 33397 (LGWR switch)
Current log# 1 seq# 33397 mem# 0: /u03/oradata/sss/redo01.rdo
******************************************************************
LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
******************************************************************
Sat Apr 02 08:59:32 2011
Archived Log entry 34091 added for thread 1 sequence 33396 ID 0x2cbe3767 dest 1:
Successfully onlined Undo Tablespace 1.
Dictionary check beginning
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is AL32UTF8
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Sat Apr 02 08:59:36 2011
Starting background process QMNC
Sat Apr 02 08:59:36 2011
QMNC started with pid=33, OS id=24060
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Completed: ALTER DATABASE OPEN
ALTER SYSTEM SET log_archive_trace=0 SCOPE=BOTH SID='sss';
ALTER SYSTEM SET log_archive_format='log%t_%s_%r.arc' SCOPE=SPFILE SID='sss';
ALTER SYSTEM SET standby_file_management='AUTO' SCOPE=BOTH SID='*';
ALTER SYSTEM SET archive_lag_target=1800 SCOPE=BOTH SID='*';
ALTER SYSTEM SET log_archive_max_processes=10 SCOPE=BOTH SID='*';
ALTER SYSTEM SET log_archive_min_succeed_dest=1 SCOPE=BOTH SID='*';
ALTER SYSTEM ARCHIVE LOG
Thread 1 advanced to log sequence 33398 (LGWR switch)
Current log# 2 seq# 33398 mem# 0: /u03/oradata/sss/redo02.rdo
Sat Apr 02 08:59:37 2011
ARC7: STARTING ARCH PROCESSES
Archived Log entry 34094 added for thread 1 sequence 33397 ID 0x2cbe3767 dest 1:
Sat Apr 02 08:59:37 2011
ARCa started with pid=34, OS id=24064
ARCa: Archival started
ARC7: STARTING ARCH PROCESSES COMPLETE
Sat Apr 02 08:59:43 2011
Starting background process CJQ0
Sat Apr 02 08:59:43 2011
CJQ0 started with pid=39, OS id=24099
Setting Resource Manager plan SCHEDULER[0xFD16]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Sat Apr 02 08:59:47 2011
Starting background process VKRM
Sat Apr 02 08:59:47 2011
VKRM started with pid=38, OS id=24118
Sat Apr 02 09:00:17 2011
Shutting down archive processes
Sat Apr 02 09:00:17 2011
ARCH shutting down
ARCa: Archival stopped
Sat Apr 02 09:00:20 2011
ALTER SYSTEM ARCHIVE LOG
Sat Apr 02 09:00:20 2011
Thread 1 cannot allocate new log, sequence 33399
Checkpoint not complete
Current log# 2 seq# 33398 mem# 0: /u03/oradata/sss/redo02.rdo
Thread 1 advanced to log sequence 33399 (LGWR switch)
Current log# 3 seq# 33399 mem# 0: /u03/oradata/sss/redo03.rdo
Archived Log entry 34096 added for thread 1 sequence 33398 ID 0x2cbe3767 dest 1:

How to test your Dataguard Configuration (Standby Database) 1 March 2011

Posted by David Alejo Marcos in Oracle 11.2, RAC, Standby.
Tags: , ,
3 comments

In a couple of weeks we will have to test our DR (based on standby database) configuration for our 3-node cluster.

This cluster has 3 databases running and, as part of the DR test, we will do a controlled fail over to DR, run some batches over there and fail back to production.

The problem:

Activate physical standby for testing. As part of the test, we will run some batches to ensure DR if fit to purpose, flashback any changes, and fail back to production.

The solution:

Below are the steps we will follow.

— Create restore point on standby
STANDBY> SELECT flashback_on, log_mode FROM v$database;
STANDBY> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
STANDBY> CREATE RESTORE POINT before_dr_test GUARANTEE FLASHBACK DATABASE;
STANDBY> SELECT scn, database_incarnation#, guarantee_flashback_database, storage_size, time, name FROM gv$restore_point;
STANDBY> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;

— Perform couple of logfile switches.
PRIMARY> ALTER SYSTEM SWITCH LOGFILE;
PRIMARY> ALTER SYSTEM SWITCH LOGFILE;
PRIMARY> ALTER SYSTEM SWITCH LOGFILE;

— Next step only if you are using Dataguard Broker.
PRIMARY> ALTER SYSTEM SET DG_BROKER_START=FALSE;

PRIMARY> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2=DEFER;

— We will not ship any logfiles to old primary due to change of role.

STANDBY> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

STANDBY> ALTER SYSTEM SET LOG_ARCHVE_DEST_STATE_2=DISABLE;

— Convert standby database to a primary database.
STANDBY> ALTER DATABASE ACTIVATE STANDBY DATABASE;

STANDBY> STARTUP MOUNT FORCE;

— OPTIONAL. change dataguard protection mode to maximum performance if needed

STANDBY> ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PERFORMANCE;

— Open database

STANDBY> ALTER DATABASE OPEN;

— Standby database is open for testing. No changes should be sent to primary database as destination is disabled.

— After testing, we perform Flashback to undo any changes and activate database as physical standby.
STANDBY> SHUTDOWN IMMEDIATE;

STANDBY> STARTUP MOUNT;

STANDBY> FLASHBACK DATABASE TO RESTORE POINT before_dr_test;

STANDBY> ALTER DATABASE CONVERT TO PHYSICAL STANDBY;

STANDBY> SHUTDOWN IMMEDIATE;

STANDBY> STARTUP MOUNT;

STANDBY> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;

— Enable logfile transfer on both, primary and standby.
PRIMARY> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2=ENABLE;

STANDBY> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2=ENABLE;

— Perform couple of logfile switches to verify configuration is working.
PRIMARY> ALTER SYSTEM SWITCH LOGFILE;
PRIMARY> ALTER SYSTEM SWITCH LOGFILE;
PRIMARY> ALTER SYSTEM SWITCH LOGFILE;

PRIMARY> SELECT DEST_ID, DEST_NAME, STATUS, PROTECTION_MODE, DESTINATION, ERROR, APPLIED_THREAD#, APPLIED_SEQ#, SYNCHRONIZED, GAP_STATUS FROM V$ARCHIVE_DEST_STATUS;

— Drop restore point on DR (this will avoid to run out of space on your FRA).
STANDBY> DROP RESTORE POINT before_dr_test;

As always, comments are welcome.

Sysdate returns “wrong” time (time with timezone). 25 February 2011

Posted by David Alejo Marcos in Oracle 11.2, SQL - PL/SQL.
Tags: , ,
3 comments

We are all used to execute sysdate to have the date + time of the database, what we are not so used to is to retrieve different values (depending of your timezone) from Oracle.

Sysdate, per Oracle documentation will do the following:

“SYSDATE returns the current date and time set for the operating system on which the database server resides. The data type of the returned value is DATE, and the format returned depends on the value of the NLS_DATE_FORMAT initialization parameter. The function requires no arguments. In distributed SQL statements, this function returns the date and time set for the operating system of your local database.”

Most people only reads the first line, where it says, it will return the date and time set for the operating system on which the database server resides; but right at the end, there is something we are all very familiar with, distributed SQL statements.

It is very hard to find a database not being accessed from different timezones than the server hosting it.  Either because hosting is cheaper or because we have business in different countries.

The Problem:

If I execute a simple query to return the date and time from our US server:

SQL> alter session set nls_date_format='dd/mm/yyyy hh24:mi:ss';

Session altered.

SQL> select sysdate from dual;

SYSDATE
-------------------
24/02/2011 14:12:18

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning and Automatic Storage Management options

I receive the date and time in UTC:

[oracle@sssssssss ~]$ date
Thu Feb 24 14:14:18 GMT 2011

From sqlplus in the UK:

C:\Documents and Settings\david.marcos>sqlplus daviddba@dddd

SQL*Plus: Release 10.2.0.4.0 - Production on Thu Feb 24 14:14:30 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Enter password:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning and Automatic Storage Management options

DAVIDDBA@dddd>

DAVIDDBA@dddd> alter session set nls_date_format='dd/mm/yyyy hh24:mi:ss';

Session altered.

Elapsed: 00:00:00.14
DAVIDDBA@dddd> select sysdate from dual;

SYSDATE
-------------------
24/02/2011 09:14:55

Elapsed: 00:00:00.20
DAVIDDBA@dddd>

It returns the date and time in NYC.

The Solution:

Quite simple, we have a file on the server called /etc/sysconfig/clock. This file configures the system clock to Universal or Local time.

[oracle@ssss ~]$ cat /etc/sysconfig/clock

ZONE="America/New_York"
UTC=true
ARC=false

Zone should be :

[oracle@sssssss ~]$ cat /etc/sysconfig/clock
ZONE="UTC"
UTC=true
ARC=false

So clients also uses UTC. After amending the value we were back to business.

There is another quick way to fix the problem, providing you do not have local_listener configured on your system as you can manipulate the timezone for your sessions using your listeners. Let me explain, if you connect from the server, you will probably not use the listener, so sysdate should be the same than your servers date and time.

Starting the listener with different timezone (for example, using TZ OS environment variable) you will be able to have two different sysdate results (one if you select from the server and a different one if you connect using the listener).

So, to fix your problem you will need to do the following:

1.- Configure TZ variable on your server (i.e. export TZ=”UTC”)

2.- Stop listener

3.- Start listener

Done.

If you have local_listener, you will need to stop/start your database with the right TZ environment variable to fix the issue.

As always, questions are welcome.

ORA-01555 caused by SQL statement below… on Active Dataguard after upgrade to 11.2.0.2 25 January 2011

Posted by David Alejo Marcos in Oracle 11.2, Standby, Tuning.
Tags: , ,
6 comments

Sometimes I wonder why we do upgrade our databases…

The problem:

after upgrading from 11.2.0.1 to 11.2.0.2, our standby databases (we are using active dataguard) was recording ORA-01555 on the alert log file after being running for some time.

We actively use our standby databases to run some queries for reporting.

The solution:

We have two standby databases, one local (less than 50km away) and another one on a remote location (250km) from primary.

The first thing we did was to raise a ticket with Oracle and change one of the standby databases from “open read only” to “mounted” (I had experience with problems on active dataguard in the past).

This proved useful in several ways:

1.- We narrowed the problem to active dataguard. The database in mount status was applying log files without issues and not logging any errors. The other standby started to log errors after a few hours of running (seems time depends on the workload the database is under).

2.- Once we stop the database and open it in read only, it will work for some time (not very elegant but still useful).

3.- Oracle Support came with a bug and a fix.

The bug number is 10320455 and the new patch is 10018789 (to be applied on top of 11.2.0.2).

The problem itself seems to be a hang (lock) in the library cache. This lock will cause the workload to accumulate and do not release resources and logging ORA-01555 after some time.

As the patch only updates binaries (546 kb in size) we might apply it to DR and, once we secure some downtime over the weekend, apply it to production. This will help us to minimize the time we impact business by having the standby databases in mounted status.

As always, comments are welcome.

Saving current statistics before gathering. 24 January 2011

Posted by David Alejo Marcos in Oracle 11.2, Tuning.
Tags: ,
add a comment

One of the things I like the most about my job is the constant learning. Does not matter for how long you have been working with Oracle, there is always something new to learn, even from old releases.

The problem:

As part of our upgrade to 11.2.0.2 and subsequent performance degradation, it was decided to gather statistics at schema level.

As precaution, I wanted to perform a backup of the current statistics, in case the new statistics did not perform as well as expected.

In the past, I would have used DBMS_STATS.EXPORT_xxx_STATS to create a backup, where xxx is the object (database, schema, table, index, etc.)

DBMS_STATS.EXPORT_SCHEMA_STATS (
 ownname VARCHAR2,
 stattab VARCHAR2,
 statid VARCHAR2 DEFAULT NULL,
 statown VARCHAR2 DEFAULT NULL);

But that was in the past.

The solution:

Reading the 11g documentation,  I noticed a couple of parameters I never used before (From Oracle doc):

1.- stabtab: User statistics table identifier describing where to save the current statistics.

2.- statid: Identifier (optional) to associate with these statistics within stattab.

3.- statown: Schema containing stattab (if different from ownname).

To backup current statistics to a table will be as simple as:

1.- Create backup table:

SYS@xxx> begin
 2  dbms_stats.create_stat_table(
 3  ownname => 'MYTEST',
 4  stattab => 'DAVID_STATS',
 5  tblspace => 'USERS'
 6  );
 7  end;
 8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.11
SYS@xxx>

2.- Gather stats:

SYS@xxx> BEGIN
 2  DBMS_STATS.GATHER_SCHEMA_STATS (
 3   ownname =>'TEST',
 4   degree => 12,
 5   cascade => true,
 6   method_opt=>"for all column size repeat",
 7   stattab => 'DAVID_STATS',
 8   statid =>'JANUARY_110122',
 9   statown =>'MYTEST',
 10  no_invalidate =>false);
 11 END;
 12 /

PL/SQL procedure successfully completed.

Elapsed: 00:44:09.44
SYS@xxx>

if you decide to rollback statistics using your backup table as source, you will need to execute:

begin
 dbms_stats.import_schema_stats(ownname=>'TEST',
 stattab=>'DAVID_STATS',
 statown=>'MYTEST',
 statid=>'JANUARY_110122');
end;

These parameters have been there for some time now; I just never used them before as I always did a dbms_stats.import_xxx_stats and dbms_stats.export_xxx_stats. This shows me, yet again, there are always different ways to achieve the same goal.

Note.- We are all aware whenever statistics in dictionary are modified, old statistics are automatically being saved for future restoring. But having statistics being backed up on your own table will help you to keep track of them and will not be automatically deleted unless I decide to do it.

As always, comments are welcome.

Removing Fatal NI connect error xxxx from your alert.log 18 January 2011

Posted by David Alejo Marcos in Oracle 11.2.
Tags:
9 comments

We have a script to monitor the alert.log file for any ORA-xxxx errors.

With 11g we found out many entries in regards of:

Fatal NI connect error

The problem:

alert.log file has many entries for Fatal NI connect error.

Examples:

***********************************************************************

Fatal NI connect error 12514, connecting to:
 (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=xxxx)(PORT=nnn))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=sss)(CID=(PROGRAM=oracle)(HOST=xxx)(USER=oracle))))

 VERSION INFORMATION:
 TNS for Linux: Version 11.2.0.2.0 - Production
 TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.2.0 - Production
 Time: 17-JAN-2011 10:20:45
 Tracing not turned on.
 Tns error struct:
 ns main err code: 12564

TNS-12564: TNS:connection refused
 ns secondary err code: 0
 nt main err code: 0
 nt secondary err code: 0
 nt OS err code: 0
Error 12514 received logging on to the standby
Mon Jan 17 10:21:45 2011


***********************************************************************

or

***********************************************************************

Fatal NI connect error 12170.

 VERSION INFORMATION:
 TNS for Linux: Version 11.2.0.2.0 - Production
 Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.2.0 - Production
 TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.2.0 - Production
 Time: 18-JAN-2011 15:13:08
 Tracing not turned on.
 Tns error struct:
 ns main err code: 12535

TNS-12535: TNS:operation timed out
 ns secondary err code: 12560
 nt main err code: 505

TNS-00505: Operation timed out
 nt secondary err code: 110
 nt OS err code: 0
 Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=xx.xx.x.xx)(PORT=nnnn))
Tue Jan 18 15:40:20 2011

The solution:

We will have an entry on the alert.log file every time a client connection of any kind fails to complete or is aborted.

One way to minimize the impact is by using the parameter SQLNET.INBOUND_CONNECT_TIMEOUT (default to 60 seconds on 10gR2 and 11g) but, sometimes, this value is not adequate.

Oracle also mentions the occurrence of this error if you use DB Console or Enterprise Manager to monitor your databases and the emagent will try to connect to the target database repeatedly and, statistically, some will fail (frequency will depend on how busy your system is).

Most of the time (certainly for DB Console and Enterprise Manager Agent) the application will try to connect again and it will succeed.

To fix the problem you could increase the value of SQLNET.INBOUND_CONNECT_TIMEOUT on the listener.ora and the sqlnet.ora file located on the server side.

If you already have a value you have considered adequate, you might want to add the following line on your listener.ora file:

DIAG_ADR_ENABLED_listener_name=OFF

This line will hide the error on the alert.log file and the error will be posted on the $ORACLE_HOME/network/log/sqlnet.log file

As always, comments are welcome.

ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [xxx] after upgrade to 11.2.0.2 18 January 2011

Posted by David Alejo Marcos in Oracle 11.2, SQL - PL/SQL.
Tags: ,
1 comment so far

First of all, I would like to say thanks to Coskan (Josh) Gundogar for his invaluable assistance with resolving this problem. You can find more about Coskan here.

On Saturday January 15th we had to upgrade one of our busiest and biggest databases to 11.2.0.2 to fix several critical bugs on 11.2.0.1 that affected our systems.

After upgrading the database and initial test, sign off was given for the upgrade.

On Sunday January 16th (only one day after) I had several emails and couple of phone calls to check the database as we were having ORA-00600.

Please, be aware that ORA-00600 should be raised with Oracle. The steps below worked for us, but you will need to confirm with Oracle, your business and your Developers.

The problem:

The command generating the error was a very simple one:

ERROR at line 1:
ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [1002287], [], [], [], [], [], [], [], [], [], []
ORA-06512: at "xxx.DELETE_SOURCE", line 54
ORA-06512: at line 1

The command at line 54 is:

DELETE FROM xxx.SOURCE WHERE SOURCEID = v_source;

I could find very little information on Oracle Support regarding this error.

The solution:

First thing we did was to raise a Sev. 1 with Oracle. Their suggestion was to restore the database from backup, meaning we had to downgrade primary database plus 2 standby databases.

After successfully reproducing the error I tried a different SQL, to find out if it was something wrong with the table.

I did try to execute select, insert and update, and all three worked fine:

SQL> select *  FROM xxx.SOURCE WHERE SOURCEID = 49;

SOURCEID SOURCENAME  ABBREV
---------- ----------- ----------------
49 xxxxx yyyyy

SQL> select count(*) from xxx.source where sourceid  = 90000;

COUNT(*)
----------
0

SQL> delete from  xxx.source where sourceid  = 90000;

0 rows deleted.

SQL> rollback;

SQL> update xxx.source set sourcename = 'pr newswire' where sourceid = 49;

1 row updated.

SQL> rollback;

Rollback complete.

But the delete for an existing row failed:

SQL>   DELETE FROM xxx.SOURCE WHERE SOURCEID = 49;
DELETE FROM xxx.SOURCE WHERE SOURCEID = 49
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound],
[1002300], [], [], [], [], [], [], [], [], [], []

After some trial and error and different not very successful solutions we run a trace on the SQL (something to remember for next time, one of the first things to do and well worth to spend time reading).

The trace file showed the following:

=====================
PARSING IN CURSOR #47580064284056 len=283 dep=2 uid=0 oct=3 lid=0 tim=1295189617113218 hv=1612870922 ad='a06c7d880' sqlid='gnkrt49h24x8a'
select pctfree_stg, pctused_stg, size_stg,initial_stg, next_stg, minext_stg, maxext_stg, maxsiz_stg, lobret_stg,mintim_stg, pctinc_stg, initra_stg, maxtra_stg, optimal_stg, maxins_stg,frlins_stg, flags_stg, bfp_stg, enc_stg, cmpflag_stg, cmplvl_stg from deferred_stg$  where obj# =:1
END OF STMT
PARSE #47580064284056:c=1000,e=602,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,plh=0,tim=1295189617113216
BINDS #47580064284056:
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=2b46190f4a70  bln=22  avl=04  flg=05
value=1002300
EXEC #47580064284056:c=1000,e=1571,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,plh=866645418,tim=1295189617114927
FETCH #47580064284056:c=0,e=34,p=0,cr=1,cu=0,mis=0,r=0,dep=2,og=4,plh=866645418,tim=1295189617114999
STAT #47580064284056 id=1 cnt=0 pid=0 pos=1 obj=1062363 op='TABLE ACCESS BY INDEX ROWID DEFERRED_STG$ (cr=1 pr=0 pw=0 time=49 us cost=1 size=31 card=1)'
STAT #47580064284056 id=2 cnt=0 pid=1 pos=1 obj=1062364 op='INDEX UNIQUE SCAN I_DEFERRED_STG1 (cr=1 pr=0 pw=0 time=35 us cost=0 size=0 card=1)'
CLOSE #47580064284056:c=0,e=6,dep=2,type=3,tim=1295189617115210

*** 2011-01-16 14:53:37.357
<strong>DDE: Problem Key 'ORA 600 [kkpo_rcinfo_defstg:objnotfound]' was flood controlled (0x6) (incident: 812751)</strong>
<strong>ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [1002300], [], [], [], [], [], [], [], [], [], []</strong>
=====================
PARSE ERROR#47580064285064:len=92 dep=1 uid=0 oct=3 lid=0 tim=1295189617357412 err=600
select /*+ all_rows */ count(1) from "xxx"."INSTRUMENT_INSTRUMENTTAG" where "SOURCEID" = :1
EXEC #47580064671128:c=4059383,e=4121486,p=5,cr=331742,cu=129,mis=0,r=0,dep=0,og=1,plh=3685465056,tim=1295189617357720
ERROR #14:err=600 tim=1295189617357745
STAT #47580064671128 id=1 cnt=0 pid=0 pos=1 obj=0 op='DELETE  SOURCE (cr=0 pr=0 pw=0 time=17 us)'
STAT #47580064671128 id=2 cnt=1 pid=1 pos=1 obj=1063845 op='INDEX UNIQUE SCAN PK_SOURCE (cr=1 pr=0 pw=0 time=39 us cost=0 size=12 card=1)'
WAIT #47580064671128: nam='SQL*Net break/reset to client' ela= 20 driver id=1413697536 break?=1 p3=0 obj#=-1 tim=1295189617358676
WAIT #47580064671128: nam='SQL*Net break/reset to client' ela= 9860 driver id=1413697536 break?=0 p3=0 obj#=-1 tim=1295189617368572
WAIT #47580064671128: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1295189617368700

*** 2011-01-16 14:53:48.940
WAIT #47580064671128: nam='SQL*Net message from client' ela= 11571422 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1295189628940169
CLOSE #47580064285064:c=0,e=19,dep=1,type=0,tim=1295189628940357
CLOSE #47580064671128:c=0,e=96,dep=0,type=0,tim=1295189628940417
=====================

Note.- The above is just an extract, the trace file identified a few more tables with the same problem.

The error was occurring on a different table. This was happening as SOURCE is one of our key tables with many references to it.

After performing a select on the failing table we had the following:

select * from xxx.instrumenttag
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [1002298], [], [], [], [], [], [], [], [], [], []

After a conversation with the business, we confirmed those tables were created on production couple of weekends before the upgrade and they were not being used.

As we only had integrity reference and were not being used, we decided to recreate them using the DDL from QA.

Once all tables (40+) and associate indexes were created successfully, the code worked:

SQL> begin
 2   xxx.DELETE_SOURCE(49);
 3  end;
 4  /

PL/SQL procedure successfully completed.

SQL>

Thoughts:

1.- This problem was a tough one. Ora-00600 are never good, but this was one of the hardest on production, in my humble opinion, to troubleshoot without restoring and downgrading a primary and 2 standby databases.

2.- It is my believe the second parameter on the error identifies the missing object (I cannot test it now):

ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [1002287], [], [], [], [], [], [], [], [], [], []

If my assumption is right, this would have helped while troubleshooting.

3.- Oracle has a specific statement that catupgrd.sql can be rerun (Note.- ID 1100828.1), I am not so sure…

catupgrd.sql is re-runnable and the bug states the errors can be ignored

4.- We did not have the same problem on QA as those empty objects were created after we did the upgrade. This proves how difficult testing can be.

5.- With 11.2 we have something new called “Deferred Segment Creation”. It is my believe that new empty objects without segments might have been the root cause of the error during the upgrade.

This is the reason behind this new feature (From Oracle):

“When you create a non-partitioned heap table, the table segment creation is deferred to the first row insert. This functionality is enabled by default with the initialization parameter DEFERRED_SEGMENT_CREATION  set to TRUE.”

The default value for this parameter is true (find more here.

The failing command was:

PARSING IN CURSOR #47580064284056 len=283 dep=2 uid=0 oct=3 lid=0 tim=1295189617113218 hv=1612870922 ad='a06c7d880' sqlid='gnkrt49h24x8a'
select pctfree_stg, pctused_stg, size_stg,initial_stg, next_stg, minext_stg, maxext_stg, maxsiz_stg, lobret_stg,mintim_stg, pctinc_stg, initra_stg, maxtra_stg, optimal_stg, maxins_stg,frlins_stg, flags_stg, bfp_stg, enc_stg, cmpflag_stg, cmplvl_stg from deferred_stg$  where obj# =:1
END OF STMT

After recreating the object, the same query worked:

SYS@xxx> select object_id from dba_objects where object_name = 'INSTRUMENT_INSTRUMENTTAG';

 OBJECT_ID
----------
 1063905

Elapsed: 00:00:00.00
SYS@MFE> select pctfree_stg, pctused_stg, size_stg,initial_stg, next_stg, minext_stg, maxext_stg, maxsiz_stg, lobret_stg,mintim_stg, pctinc_stg, initra_stg, maxtra_stg, optimal_stg, maxins_stg,frlins_stg, flags_stg, bfp_stg, enc_stg, cmpflag_stg, cmplvl_stg
 2  from sys.deferred_stg$
 3  where obj# =1063905;

PCTFREE_STG PCTUSED_STG   SIZE_STG INITIAL_STG   NEXT_STG MINEXT_STG MAXEXT_STG MAXSIZ_STG LOBRET_STG MINTIM_STG PCTINC_STG INITRA_STG MAXTRA_STG OPTIMAL_STG MAXINS_STG FRLINS_STG
----------- ----------- ---------- ----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------- ---------- ----------
 FLAGS_STG    BFP_STG    ENC_STG CMPFLAG_STG CMPLVL_STG
---------- ---------- ---------- ----------- ----------
 10           0                                                                                                   0          1        255
 8          0                      0          0

Elapsed: 00:00:00.02
SYS@xxx>

I am  not entirely convince catupgrd.sql will perform the link between segments (tables, indexes and LOBs) and the associate  entry on sys.deferred_stg$. Something to test on QA.

6.- Query to find tables without segments:

6.1.- Coskan’s version used while troubleshooting:

SELECT 'select count(*) from ' || owner || '.' || table_name || ' ;'
 FROM dba_tables
 WHERE segment_created = 'NO';

6.2.- My version to verify objects (valid for INDEX PARTITION, TABLE PARTITION, TABLE, INDEX although the version below only checks tables):

SELECT owner, object_name, object_type
 FROM sys.deferred_stg$, dba_objects
 WHERE obj# = object_id
 AND object_type = 'TABLE';

As always, comments are welcome.

Monitoring tablespaces with dynamic thresholds 13 December 2010

Posted by David Alejo Marcos in SQL - PL/SQL.
Tags: ,
1 comment so far

One of the problems we face while monitoring tablespaces is the difference in size. Some of our tablespaces are 10-50 GB while other are close to TB.

10% free space on a 10GB tablespace is not much on a fast growing system, while 10% on a 1TB tablespace is enough to keep us going for 1 month.

The problem:

We need a query that performs really quick, it is not heavy on the system and is clever enough to alert us if 20% is critical, warning or can wait.

The solution:

I came up with the following query:

SELECT (CASE
 WHEN free_space_mb <=  DECODE (allocation_type, 'UNIFORM', min_extlen, maxextent) * free_extents THEN 'CRITICAL'
 WHEN free_space_mb <=  DECODE (allocation_type, 'UNIFORM', min_extlen, maxextent) * free_extents + 20 THEN 'WARNING'
 ELSE 'N/A'
 END)
 alert,
 tablespace_name,
 space_used_mb,
 extend_bytes/1024/1024 extend_mb,
 free_space/1024/1024 free_mb,
 pct_free,
 free_extents,
 free_space_mb,
 max_size_mb,
 maxextent
 FROM (SELECT c.tablespace_name,
 NVL (ROUND ( (a.extend_bytes + b.free_space) / (bytes + a.extend_bytes) * 100,2), 0)  pct_free,
 NVL ( ROUND ( (a.extend_bytes + b.free_space) / 1024 / 1024, 2), 0) free_space_mb,
 (CASE
 WHEN NVL ( ROUND ( (maxbytes) / power(1024,3), 2), 0) <=  30 THEN 60
 WHEN NVL ( ROUND ( (maxbytes) / power(1024,3), 2), 0) <= 100 THEN 120
 WHEN NVL ( ROUND ( (maxbytes) / power(1024,3), 2), 0) <= 300 THEN 200
 WHEN NVL ( ROUND ( (maxbytes) / power(1024,3), 2), 0) <= 800 THEN 300
 ELSE 340
 END) free_extents,
 a.extend_bytes, b.free_space,
 ROUND (maxbytes / 1024 / 1024, 2) max_size_mb,
 nvl (round(a.bytes - b.free_space ,2) /1024/1024, 0) space_used_mb,
 c.allocation_type,
 GREATEST (c.min_extlen / 1024 / 1024, 64) min_extlen,
--               (SELECT  NVL(MAX (bytes) / 1024 / 1024, 64) maxextent
--                   FROM dba_extents
--                  WHERE tablespace_name = a.tablespace_name) maxextent
 64 maxextent
 FROM (  SELECT tablespace_name,
 SUM(DECODE (SIGN (maxbytes - BYTES), -1, 0, maxbytes - BYTES)) AS extend_bytes,
 SUM (BYTES) AS BYTES,
 SUM (maxbytes) maxbytes
 FROM DBA_DATA_FILES
 GROUP BY tablespace_name) A,
 (  SELECT tablespace_name,
 SUM (BYTES) AS free_space,
 MAX (BYTES) largest
 FROM DBA_FREE_SPACE
 GROUP BY tablespace_name) b,
 dba_tablespaces c
 WHERE     c.contents not in ('UNDO','TEMPORARY')
 AND b.tablespace_name(+) = c.tablespace_name
 AND a.tablespace_name = c.tablespace_name
 AND NVL (ROUND ( (a.extend_bytes + b.free_space) / (bytes + a.extend_bytes) * 100,2), 0) < 20
) A
WHERE free_space_mb <= DECODE (allocation_type, 'UNIFORM', min_extlen, maxextent) * free_extents + 20;

and this is how it works:

1.- Line 43. We do not raise alerts for Undo or Temp tablespaces.
2.- Line 44. Outer join between dba_tablespaces and dba_free_space is necessary or you will not receive alerts once you do not have any free space.
3.- Line 46. Only evaluate tablespaces with less than 20% free space.
4.- Line 30. I started monitoring the largest extent for the tablespace being monitored (query below), but we hit some bugs with dba_extents not being able to have statistics, generating a bad plan and running for up to 2 minutes. If you are using tablespaces with Extent Allocation = Automatic, the biggest extent will be 64 MB, so I decided to use a hard-coded value instead.

(SELECT  NVL(MAX (bytes) / 1024 / 1024, 64) maxextent
    FROM dba_extents
  WHERE tablespace_name = a.tablespace_name) maxextent

5.- Lines 18 to 24. This is where the logic happens. Depending of the size of the tablespace (including max autoextend), we define thresholds under free_extents. This values might need to be adjusted to your systems.
6.- Lines 1 to 6. Depending of the results on point 5, we define alerts as N/A (Not Applicable, Warning or Critical). Worth mentioning the distinction between Extent Allocation Uniform or Automatic. I assume a max extent of 64 MB for both to have consistency.

Please, share your thoughts about this query, there is always more than one way to achieve the same goal using SQL.

As always, comments are welcome.

ORA-01548: active rollback segment ‘_SYSSMU1_xxxx’ found, terminate dropping tablespace while deleting UNDOTBS 9 November 2010

Posted by David Alejo Marcos in Oracle 11.2, SQL - PL/SQL.
Tags: ,
12 comments

Last week I had to solve a strange error on one of our production databases. Luckily for us, it was the RMAN catalog.

I said luckily for us because this database is internal to us. The same problem on a production database  serving our clients could have been disastrous as we have tight deadlines.

The problem:

The UNDOTBS for our RMAN Catalog database was dropped. Any attempts to switch to a newly created UNDOTBS or removing the old UNDOTBS failed with the following error:

SQL> drop tablespace UNDOTBS01 including contents;
drop tablespace UNDOTBS01 including contents
*
ERROR at line 1:
ORA-01548: active rollback segment '_SYSSMU1_1255349037$' found, terminate dropping tablespace

The solution:

Our RMAN Catalog has a standby database for protection. The first thing I did was to stop the standby database from applying more archivelogs as the UNDOTBS01 existed. This would have allowed me to perform a failover if necessary.

As I mentioned before, I created a new UNDOTBS3 to check if I could modify the undo_tablespace parameter, but it did not work:

SQL> ALTER SYSTEM SET UNDO_TABLESPACE = undotbs3;
System altered.

SQL> drop tablespace UNDOTBS01 including contents;
drop tablespace UNDOTBS01 including contents
*
ERROR at line 1:
ORA-01548: active rollback segment '_SYSSMU1_1255349037$' found, terminate dropping tablespace

Tried to delete the rollback segment, but it did not work:

SQL> drop rollback segment '_SYSSMU1_1255349037$';
drop rollback segment '_SYSSMU1_1255349037$'
*
ERROR at line 1:
ORA-02175: invalid rollback segment name

In fact, it did not even exist:

SQL> select * from v$rollname ;
USN NAME
---------- ------------------------------
0 SYSTEM
18 _SYSSMU18_650722988$
19 _SYSSMU19_176786070$
20 _SYSSMU20_2971238361$
21 _SYSSMU21_1337078425$
22 _SYSSMU22_378998127$
23 _SYSSMU23_2828850578$
24 _SYSSMU24_3342375805$
25 _SYSSMU25_1071547613$
26 _SYSSMU26_357778769$
27 _SYSSMU27_3256159541$

So, how to fix it?

I did execute the following steps:

1.- Create a copy of the spfile:

create pfile from spfile;

So I could rename the spfile and work with this init.ora file until I fixed the problem.

2.- Stop the database:

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> exit

3.- Amended the init.ora file with the following:

*.undo_management='MANUAL'
*.undo_retention=1800
*.undo_tablespace='UNDOTBS3'
*._offline_rollback_segments=_SYSSMU1_1255349037$

4.- Start database up and try to delete the UNDOTBS01>

SQL> drop tablespace undotbs01 including contents and datafiles;
drop tablespace undotbs01 including contents and datafiles
*
ERROR at line 1:
ORA-01548: active rollback segment '_SYSSMU3_1255349037$' found, terminate dropping tablespace

As you can see, the rollback segment is different, it changed from _SYSSMU1_1255349037$ to _SYSSMU3_1255349037$ (the difference is the digit after SYSSMU).

After several tries, I realised I had to add as many entries to _offline_rollback_segments as entries I had on v$rollname where NAME LIKE ‘%SYSSMU%’ (10 in my case).

so I stopped the database again and made the necessary changes:

5.- Edit init.ora file and add as many entries as rows has V$rollname where name like ‘%SYSSMU%’:

*._offline_rollback_segments=(_SYSSMU1_1255349037$,_SYSSMU2_1255349037$,
_SYSSMU3_1255349037$,_SYSSMU4_1255349037$,
_SYSSMU5_1255349037$,_SYSSMU6_1255349037$,
_SYSSMU7_1255349037$,_SYSSMU8_1255349037$,
_SYSSMU9_1255349037$,_SYSSMU10_1255349037$)

6.- Start up the database and drop UNDOTBS01

SQL> drop tablespace undotbs01 including contents and datafiles;
Tablespace dropped.

7.- and create UNDOTBS01:

SQL> CREATE UNDO TABLESPACE "UNDOTBS01" DATAFILE '/u01/oradata/CATALOG/undo01.dbf' SIZE 1024M;
Tablespace created.

7.- Shutdown the database, rename (or delete) the init.ora file and start up the database using the spfile.

8.- Set UNDOTBS01 as the UNDO tablespace:

SQL> alter system set undo_tablespace=undotbs01 scope=both;

9.- Check parameters:

SQL> show parameter undo

NAME                                                   TYPE              VALUE
———————————— ———– ——————————
undo_management                      string            AUTO
undo_retention                             integer         1800
undo_tablespace                          string             UNDOTBS01

10.- and drop UNDOTBS3:

SQL> drop tablespace undotbs3 including contents and datafiles;
Tablespace dropped.

As soon as the database was running as normal I executed a backup to ensure all was ok.

After the verification, I enabled the recovery to keep the standby database in synch with the primary.

Note.- It is not necessary to add the full name _SYSSMU1_1255349037$ to _offline_rollback_segments. Only the first part (SYSSMU1) would have been enough.

_offline_rollback_segments=(_SYSSMU1_1255349037$,_SYSSMU2_1255349037$) is the same than

_offline_rollback_segments=(_SYSSMU1,_SYSSMU2)

As always, comments are welcome.