jump to navigation

ORA-01555 caused by SQL statement below… on Active Dataguard after upgrade to 11.2.0.2 25 January 2011

Posted by David Alejo Marcos in Oracle 11.2, Standby, Tuning.
Tags: , ,
6 comments

Sometimes I wonder why we do upgrade our databases…

The problem:

after upgrading from 11.2.0.1 to 11.2.0.2, our standby databases (we are using active dataguard) was recording ORA-01555 on the alert log file after being running for some time.

We actively use our standby databases to run some queries for reporting.

The solution:

We have two standby databases, one local (less than 50km away) and another one on a remote location (250km) from primary.

The first thing we did was to raise a ticket with Oracle and change one of the standby databases from “open read only” to “mounted” (I had experience with problems on active dataguard in the past).

This proved useful in several ways:

1.- We narrowed the problem to active dataguard. The database in mount status was applying log files without issues and not logging any errors. The other standby started to log errors after a few hours of running (seems time depends on the workload the database is under).

2.- Once we stop the database and open it in read only, it will work for some time (not very elegant but still useful).

3.- Oracle Support came with a bug and a fix.

The bug number is 10320455 and the new patch is 10018789 (to be applied on top of 11.2.0.2).

The problem itself seems to be a hang (lock) in the library cache. This lock will cause the workload to accumulate and do not release resources and logging ORA-01555 after some time.

As the patch only updates binaries (546 kb in size) we might apply it to DR and, once we secure some downtime over the weekend, apply it to production. This will help us to minimize the time we impact business by having the standby databases in mounted status.

As always, comments are welcome.

Saving current statistics before gathering. 24 January 2011

Posted by David Alejo Marcos in Oracle 11.2, Tuning.
Tags: ,
add a comment

One of the things I like the most about my job is the constant learning. Does not matter for how long you have been working with Oracle, there is always something new to learn, even from old releases.

The problem:

As part of our upgrade to 11.2.0.2 and subsequent performance degradation, it was decided to gather statistics at schema level.

As precaution, I wanted to perform a backup of the current statistics, in case the new statistics did not perform as well as expected.

In the past, I would have used DBMS_STATS.EXPORT_xxx_STATS to create a backup, where xxx is the object (database, schema, table, index, etc.)

DBMS_STATS.EXPORT_SCHEMA_STATS (
 ownname VARCHAR2,
 stattab VARCHAR2,
 statid VARCHAR2 DEFAULT NULL,
 statown VARCHAR2 DEFAULT NULL);

But that was in the past.

The solution:

Reading the 11g documentation,  I noticed a couple of parameters I never used before (From Oracle doc):

1.- stabtab: User statistics table identifier describing where to save the current statistics.

2.- statid: Identifier (optional) to associate with these statistics within stattab.

3.- statown: Schema containing stattab (if different from ownname).

To backup current statistics to a table will be as simple as:

1.- Create backup table:

SYS@xxx> begin
 2  dbms_stats.create_stat_table(
 3  ownname => 'MYTEST',
 4  stattab => 'DAVID_STATS',
 5  tblspace => 'USERS'
 6  );
 7  end;
 8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.11
SYS@xxx>

2.- Gather stats:

SYS@xxx> BEGIN
 2  DBMS_STATS.GATHER_SCHEMA_STATS (
 3   ownname =>'TEST',
 4   degree => 12,
 5   cascade => true,
 6   method_opt=>"for all column size repeat",
 7   stattab => 'DAVID_STATS',
 8   statid =>'JANUARY_110122',
 9   statown =>'MYTEST',
 10  no_invalidate =>false);
 11 END;
 12 /

PL/SQL procedure successfully completed.

Elapsed: 00:44:09.44
SYS@xxx>

if you decide to rollback statistics using your backup table as source, you will need to execute:

begin
 dbms_stats.import_schema_stats(ownname=>'TEST',
 stattab=>'DAVID_STATS',
 statown=>'MYTEST',
 statid=>'JANUARY_110122');
end;

These parameters have been there for some time now; I just never used them before as I always did a dbms_stats.import_xxx_stats and dbms_stats.export_xxx_stats. This shows me, yet again, there are always different ways to achieve the same goal.

Note.- We are all aware whenever statistics in dictionary are modified, old statistics are automatically being saved for future restoring. But having statistics being backed up on your own table will help you to keep track of them and will not be automatically deleted unless I decide to do it.

As always, comments are welcome.

Removing Fatal NI connect error xxxx from your alert.log 18 January 2011

Posted by David Alejo Marcos in Oracle 11.2.
Tags:
9 comments

We have a script to monitor the alert.log file for any ORA-xxxx errors.

With 11g we found out many entries in regards of:

Fatal NI connect error

The problem:

alert.log file has many entries for Fatal NI connect error.

Examples:

***********************************************************************

Fatal NI connect error 12514, connecting to:
 (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=xxxx)(PORT=nnn))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=sss)(CID=(PROGRAM=oracle)(HOST=xxx)(USER=oracle))))

 VERSION INFORMATION:
 TNS for Linux: Version 11.2.0.2.0 - Production
 TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.2.0 - Production
 Time: 17-JAN-2011 10:20:45
 Tracing not turned on.
 Tns error struct:
 ns main err code: 12564

TNS-12564: TNS:connection refused
 ns secondary err code: 0
 nt main err code: 0
 nt secondary err code: 0
 nt OS err code: 0
Error 12514 received logging on to the standby
Mon Jan 17 10:21:45 2011


***********************************************************************

or

***********************************************************************

Fatal NI connect error 12170.

 VERSION INFORMATION:
 TNS for Linux: Version 11.2.0.2.0 - Production
 Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.2.0 - Production
 TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.2.0 - Production
 Time: 18-JAN-2011 15:13:08
 Tracing not turned on.
 Tns error struct:
 ns main err code: 12535

TNS-12535: TNS:operation timed out
 ns secondary err code: 12560
 nt main err code: 505

TNS-00505: Operation timed out
 nt secondary err code: 110
 nt OS err code: 0
 Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=xx.xx.x.xx)(PORT=nnnn))
Tue Jan 18 15:40:20 2011

The solution:

We will have an entry on the alert.log file every time a client connection of any kind fails to complete or is aborted.

One way to minimize the impact is by using the parameter SQLNET.INBOUND_CONNECT_TIMEOUT (default to 60 seconds on 10gR2 and 11g) but, sometimes, this value is not adequate.

Oracle also mentions the occurrence of this error if you use DB Console or Enterprise Manager to monitor your databases and the emagent will try to connect to the target database repeatedly and, statistically, some will fail (frequency will depend on how busy your system is).

Most of the time (certainly for DB Console and Enterprise Manager Agent) the application will try to connect again and it will succeed.

To fix the problem you could increase the value of SQLNET.INBOUND_CONNECT_TIMEOUT on the listener.ora and the sqlnet.ora file located on the server side.

If you already have a value you have considered adequate, you might want to add the following line on your listener.ora file:

DIAG_ADR_ENABLED_listener_name=OFF

This line will hide the error on the alert.log file and the error will be posted on the $ORACLE_HOME/network/log/sqlnet.log file

As always, comments are welcome.

ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [xxx] after upgrade to 11.2.0.2 18 January 2011

Posted by David Alejo Marcos in Oracle 11.2, SQL - PL/SQL.
Tags: ,
1 comment so far

First of all, I would like to say thanks to Coskan (Josh) Gundogar for his invaluable assistance with resolving this problem. You can find more about Coskan here.

On Saturday January 15th we had to upgrade one of our busiest and biggest databases to 11.2.0.2 to fix several critical bugs on 11.2.0.1 that affected our systems.

After upgrading the database and initial test, sign off was given for the upgrade.

On Sunday January 16th (only one day after) I had several emails and couple of phone calls to check the database as we were having ORA-00600.

Please, be aware that ORA-00600 should be raised with Oracle. The steps below worked for us, but you will need to confirm with Oracle, your business and your Developers.

The problem:

The command generating the error was a very simple one:

ERROR at line 1:
ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [1002287], [], [], [], [], [], [], [], [], [], []
ORA-06512: at "xxx.DELETE_SOURCE", line 54
ORA-06512: at line 1

The command at line 54 is:

DELETE FROM xxx.SOURCE WHERE SOURCEID = v_source;

I could find very little information on Oracle Support regarding this error.

The solution:

First thing we did was to raise a Sev. 1 with Oracle. Their suggestion was to restore the database from backup, meaning we had to downgrade primary database plus 2 standby databases.

After successfully reproducing the error I tried a different SQL, to find out if it was something wrong with the table.

I did try to execute select, insert and update, and all three worked fine:

SQL> select *  FROM xxx.SOURCE WHERE SOURCEID = 49;

SOURCEID SOURCENAME  ABBREV
---------- ----------- ----------------
49 xxxxx yyyyy

SQL> select count(*) from xxx.source where sourceid  = 90000;

COUNT(*)
----------
0

SQL> delete from  xxx.source where sourceid  = 90000;

0 rows deleted.

SQL> rollback;

SQL> update xxx.source set sourcename = 'pr newswire' where sourceid = 49;

1 row updated.

SQL> rollback;

Rollback complete.

But the delete for an existing row failed:

SQL>   DELETE FROM xxx.SOURCE WHERE SOURCEID = 49;
DELETE FROM xxx.SOURCE WHERE SOURCEID = 49
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound],
[1002300], [], [], [], [], [], [], [], [], [], []

After some trial and error and different not very successful solutions we run a trace on the SQL (something to remember for next time, one of the first things to do and well worth to spend time reading).

The trace file showed the following:

=====================
PARSING IN CURSOR #47580064284056 len=283 dep=2 uid=0 oct=3 lid=0 tim=1295189617113218 hv=1612870922 ad='a06c7d880' sqlid='gnkrt49h24x8a'
select pctfree_stg, pctused_stg, size_stg,initial_stg, next_stg, minext_stg, maxext_stg, maxsiz_stg, lobret_stg,mintim_stg, pctinc_stg, initra_stg, maxtra_stg, optimal_stg, maxins_stg,frlins_stg, flags_stg, bfp_stg, enc_stg, cmpflag_stg, cmplvl_stg from deferred_stg$  where obj# =:1
END OF STMT
PARSE #47580064284056:c=1000,e=602,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,plh=0,tim=1295189617113216
BINDS #47580064284056:
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=2b46190f4a70  bln=22  avl=04  flg=05
value=1002300
EXEC #47580064284056:c=1000,e=1571,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,plh=866645418,tim=1295189617114927
FETCH #47580064284056:c=0,e=34,p=0,cr=1,cu=0,mis=0,r=0,dep=2,og=4,plh=866645418,tim=1295189617114999
STAT #47580064284056 id=1 cnt=0 pid=0 pos=1 obj=1062363 op='TABLE ACCESS BY INDEX ROWID DEFERRED_STG$ (cr=1 pr=0 pw=0 time=49 us cost=1 size=31 card=1)'
STAT #47580064284056 id=2 cnt=0 pid=1 pos=1 obj=1062364 op='INDEX UNIQUE SCAN I_DEFERRED_STG1 (cr=1 pr=0 pw=0 time=35 us cost=0 size=0 card=1)'
CLOSE #47580064284056:c=0,e=6,dep=2,type=3,tim=1295189617115210

*** 2011-01-16 14:53:37.357
<strong>DDE: Problem Key 'ORA 600 [kkpo_rcinfo_defstg:objnotfound]' was flood controlled (0x6) (incident: 812751)</strong>
<strong>ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [1002300], [], [], [], [], [], [], [], [], [], []</strong>
=====================
PARSE ERROR#47580064285064:len=92 dep=1 uid=0 oct=3 lid=0 tim=1295189617357412 err=600
select /*+ all_rows */ count(1) from "xxx"."INSTRUMENT_INSTRUMENTTAG" where "SOURCEID" = :1
EXEC #47580064671128:c=4059383,e=4121486,p=5,cr=331742,cu=129,mis=0,r=0,dep=0,og=1,plh=3685465056,tim=1295189617357720
ERROR #14:err=600 tim=1295189617357745
STAT #47580064671128 id=1 cnt=0 pid=0 pos=1 obj=0 op='DELETE  SOURCE (cr=0 pr=0 pw=0 time=17 us)'
STAT #47580064671128 id=2 cnt=1 pid=1 pos=1 obj=1063845 op='INDEX UNIQUE SCAN PK_SOURCE (cr=1 pr=0 pw=0 time=39 us cost=0 size=12 card=1)'
WAIT #47580064671128: nam='SQL*Net break/reset to client' ela= 20 driver id=1413697536 break?=1 p3=0 obj#=-1 tim=1295189617358676
WAIT #47580064671128: nam='SQL*Net break/reset to client' ela= 9860 driver id=1413697536 break?=0 p3=0 obj#=-1 tim=1295189617368572
WAIT #47580064671128: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1295189617368700

*** 2011-01-16 14:53:48.940
WAIT #47580064671128: nam='SQL*Net message from client' ela= 11571422 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1295189628940169
CLOSE #47580064285064:c=0,e=19,dep=1,type=0,tim=1295189628940357
CLOSE #47580064671128:c=0,e=96,dep=0,type=0,tim=1295189628940417
=====================

Note.- The above is just an extract, the trace file identified a few more tables with the same problem.

The error was occurring on a different table. This was happening as SOURCE is one of our key tables with many references to it.

After performing a select on the failing table we had the following:

select * from xxx.instrumenttag
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [1002298], [], [], [], [], [], [], [], [], [], []

After a conversation with the business, we confirmed those tables were created on production couple of weekends before the upgrade and they were not being used.

As we only had integrity reference and were not being used, we decided to recreate them using the DDL from QA.

Once all tables (40+) and associate indexes were created successfully, the code worked:

SQL> begin
 2   xxx.DELETE_SOURCE(49);
 3  end;
 4  /

PL/SQL procedure successfully completed.

SQL>

Thoughts:

1.- This problem was a tough one. Ora-00600 are never good, but this was one of the hardest on production, in my humble opinion, to troubleshoot without restoring and downgrading a primary and 2 standby databases.

2.- It is my believe the second parameter on the error identifies the missing object (I cannot test it now):

ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [1002287], [], [], [], [], [], [], [], [], [], []

If my assumption is right, this would have helped while troubleshooting.

3.- Oracle has a specific statement that catupgrd.sql can be rerun (Note.- ID 1100828.1), I am not so sure…

catupgrd.sql is re-runnable and the bug states the errors can be ignored

4.- We did not have the same problem on QA as those empty objects were created after we did the upgrade. This proves how difficult testing can be.

5.- With 11.2 we have something new called “Deferred Segment Creation”. It is my believe that new empty objects without segments might have been the root cause of the error during the upgrade.

This is the reason behind this new feature (From Oracle):

“When you create a non-partitioned heap table, the table segment creation is deferred to the first row insert. This functionality is enabled by default with the initialization parameter DEFERRED_SEGMENT_CREATION  set to TRUE.”

The default value for this parameter is true (find more here.

The failing command was:

PARSING IN CURSOR #47580064284056 len=283 dep=2 uid=0 oct=3 lid=0 tim=1295189617113218 hv=1612870922 ad='a06c7d880' sqlid='gnkrt49h24x8a'
select pctfree_stg, pctused_stg, size_stg,initial_stg, next_stg, minext_stg, maxext_stg, maxsiz_stg, lobret_stg,mintim_stg, pctinc_stg, initra_stg, maxtra_stg, optimal_stg, maxins_stg,frlins_stg, flags_stg, bfp_stg, enc_stg, cmpflag_stg, cmplvl_stg from deferred_stg$  where obj# =:1
END OF STMT

After recreating the object, the same query worked:

SYS@xxx> select object_id from dba_objects where object_name = 'INSTRUMENT_INSTRUMENTTAG';

 OBJECT_ID
----------
 1063905

Elapsed: 00:00:00.00
SYS@MFE> select pctfree_stg, pctused_stg, size_stg,initial_stg, next_stg, minext_stg, maxext_stg, maxsiz_stg, lobret_stg,mintim_stg, pctinc_stg, initra_stg, maxtra_stg, optimal_stg, maxins_stg,frlins_stg, flags_stg, bfp_stg, enc_stg, cmpflag_stg, cmplvl_stg
 2  from sys.deferred_stg$
 3  where obj# =1063905;

PCTFREE_STG PCTUSED_STG   SIZE_STG INITIAL_STG   NEXT_STG MINEXT_STG MAXEXT_STG MAXSIZ_STG LOBRET_STG MINTIM_STG PCTINC_STG INITRA_STG MAXTRA_STG OPTIMAL_STG MAXINS_STG FRLINS_STG
----------- ----------- ---------- ----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------- ---------- ----------
 FLAGS_STG    BFP_STG    ENC_STG CMPFLAG_STG CMPLVL_STG
---------- ---------- ---------- ----------- ----------
 10           0                                                                                                   0          1        255
 8          0                      0          0

Elapsed: 00:00:00.02
SYS@xxx>

I am  not entirely convince catupgrd.sql will perform the link between segments (tables, indexes and LOBs) and the associate  entry on sys.deferred_stg$. Something to test on QA.

6.- Query to find tables without segments:

6.1.- Coskan’s version used while troubleshooting:

SELECT 'select count(*) from ' || owner || '.' || table_name || ' ;'
 FROM dba_tables
 WHERE segment_created = 'NO';

6.2.- My version to verify objects (valid for INDEX PARTITION, TABLE PARTITION, TABLE, INDEX although the version below only checks tables):

SELECT owner, object_name, object_type
 FROM sys.deferred_stg$, dba_objects
 WHERE obj# = object_id
 AND object_type = 'TABLE';

As always, comments are welcome.