ORA-01555 caused by SQL statement below… on Active Dataguard after upgrade to 11.2.0.2 25 January 2011
Posted by David Alejo Marcos in Oracle 11.2, Standby, Tuning.Tags: Oracle 11.2, Standby, Tuning
comments closed
Sometimes I wonder why we do upgrade our databases…
The problem:
after upgrading from 11.2.0.1 to 11.2.0.2, our standby databases (we are using active dataguard) was recording ORA-01555 on the alert log file after being running for some time.
We actively use our standby databases to run some queries for reporting.
The solution:
We have two standby databases, one local (less than 50km away) and another one on a remote location (250km) from primary.
The first thing we did was to raise a ticket with Oracle and change one of the standby databases from “open read only” to “mounted” (I had experience with problems on active dataguard in the past).
This proved useful in several ways:
1.- We narrowed the problem to active dataguard. The database in mount status was applying log files without issues and not logging any errors. The other standby started to log errors after a few hours of running (seems time depends on the workload the database is under).
2.- Once we stop the database and open it in read only, it will work for some time (not very elegant but still useful).
3.- Oracle Support came with a bug and a fix.
The bug number is 10320455 and the new patch is 10018789 (to be applied on top of 11.2.0.2).
The problem itself seems to be a hang (lock) in the library cache. This lock will cause the workload to accumulate and do not release resources and logging ORA-01555 after some time.
As the patch only updates binaries (546 kb in size) we might apply it to DR and, once we secure some downtime over the weekend, apply it to production. This will help us to minimize the time we impact business by having the standby databases in mounted status.
As always, comments are welcome.
Saving current statistics before gathering. 24 January 2011
Posted by David Alejo Marcos in Oracle 11.2, Tuning.Tags: Oracle 11.2, Tuning
comments closed
One of the things I like the most about my job is the constant learning. Does not matter for how long you have been working with Oracle, there is always something new to learn, even from old releases.
The problem:
As part of our upgrade to 11.2.0.2 and subsequent performance degradation, it was decided to gather statistics at schema level.
As precaution, I wanted to perform a backup of the current statistics, in case the new statistics did not perform as well as expected.
In the past, I would have used DBMS_STATS.EXPORT_xxx_STATS to create a backup, where xxx is the object (database, schema, table, index, etc.)
DBMS_STATS.EXPORT_SCHEMA_STATS ( ownname VARCHAR2, stattab VARCHAR2, statid VARCHAR2 DEFAULT NULL, statown VARCHAR2 DEFAULT NULL);
But that was in the past.
The solution:
Reading the 11g documentation, I noticed a couple of parameters I never used before (From Oracle doc):
1.- stabtab: User statistics table identifier describing where to save the current statistics.
2.- statid: Identifier (optional) to associate with these statistics within stattab.
3.- statown: Schema containing stattab (if different from ownname).
To backup current statistics to a table will be as simple as:
1.- Create backup table:
SYS@xxx> begin 2 dbms_stats.create_stat_table( 3 ownname => 'MYTEST', 4 stattab => 'DAVID_STATS', 5 tblspace => 'USERS' 6 ); 7 end; 8 / PL/SQL procedure successfully completed. Elapsed: 00:00:00.11 SYS@xxx>
2.- Gather stats:
SYS@xxx> BEGIN 2 DBMS_STATS.GATHER_SCHEMA_STATS ( 3 ownname =>'TEST', 4 degree => 12, 5 cascade => true, 6 method_opt=>"for all column size repeat", 7 stattab => 'DAVID_STATS', 8 statid =>'JANUARY_110122', 9 statown =>'MYTEST', 10 no_invalidate =>false); 11 END; 12 / PL/SQL procedure successfully completed. Elapsed: 00:44:09.44 SYS@xxx>
if you decide to rollback statistics using your backup table as source, you will need to execute:
begin dbms_stats.import_schema_stats(ownname=>'TEST', stattab=>'DAVID_STATS', statown=>'MYTEST', statid=>'JANUARY_110122'); end;
These parameters have been there for some time now; I just never used them before as I always did a dbms_stats.import_xxx_stats and dbms_stats.export_xxx_stats. This shows me, yet again, there are always different ways to achieve the same goal.
Note.- We are all aware whenever statistics in dictionary are modified, old statistics are automatically being saved for future restoring. But having statistics being backed up on your own table will help you to keep track of them and will not be automatically deleted unless I decide to do it.
As always, comments are welcome.
Removing Fatal NI connect error xxxx from your alert.log 18 January 2011
Posted by David Alejo Marcos in Oracle 11.2.Tags: Oracle 11.2
comments closed
We have a script to monitor the alert.log file for any ORA-xxxx errors.
With 11g we found out many entries in regards of:
Fatal NI connect error
The problem:
alert.log file has many entries for Fatal NI connect error.
Examples:
*********************************************************************** Fatal NI connect error 12514, connecting to: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=xxxx)(PORT=nnn))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=sss)(CID=(PROGRAM=oracle)(HOST=xxx)(USER=oracle)))) VERSION INFORMATION: TNS for Linux: Version 11.2.0.2.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.2.0 - Production Time: 17-JAN-2011 10:20:45 Tracing not turned on. Tns error struct: ns main err code: 12564 TNS-12564: TNS:connection refused ns secondary err code: 0 nt main err code: 0 nt secondary err code: 0 nt OS err code: 0 Error 12514 received logging on to the standby Mon Jan 17 10:21:45 2011 ***********************************************************************
or
*********************************************************************** Fatal NI connect error 12170. VERSION INFORMATION: TNS for Linux: Version 11.2.0.2.0 - Production Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.2.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.2.0 - Production Time: 18-JAN-2011 15:13:08 Tracing not turned on. Tns error struct: ns main err code: 12535 TNS-12535: TNS:operation timed out ns secondary err code: 12560 nt main err code: 505 TNS-00505: Operation timed out nt secondary err code: 110 nt OS err code: 0 Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=xx.xx.x.xx)(PORT=nnnn)) Tue Jan 18 15:40:20 2011
The solution:
We will have an entry on the alert.log file every time a client connection of any kind fails to complete or is aborted.
One way to minimize the impact is by using the parameter SQLNET.INBOUND_CONNECT_TIMEOUT (default to 60 seconds on 10gR2 and 11g) but, sometimes, this value is not adequate.
Oracle also mentions the occurrence of this error if you use DB Console or Enterprise Manager to monitor your databases and the emagent will try to connect to the target database repeatedly and, statistically, some will fail (frequency will depend on how busy your system is).
Most of the time (certainly for DB Console and Enterprise Manager Agent) the application will try to connect again and it will succeed.
To fix the problem you could increase the value of SQLNET.INBOUND_CONNECT_TIMEOUT on the listener.ora and the sqlnet.ora file located on the server side.
If you already have a value you have considered adequate, you might want to add the following line on your listener.ora file:
DIAG_ADR_ENABLED_listener_name=OFF
This line will hide the error on the alert.log file and the error will be posted on the $ORACLE_HOME/network/log/sqlnet.log file
As always, comments are welcome.
ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [xxx] after upgrade to 11.2.0.2 18 January 2011
Posted by David Alejo Marcos in Oracle 11.2, SQL - PL/SQL.Tags: Oracle 11.2, SQL - PL/SQL
comments closed
First of all, I would like to say thanks to Coskan (Josh) Gundogar for his invaluable assistance with resolving this problem. You can find more about Coskan here.
On Saturday January 15th we had to upgrade one of our busiest and biggest databases to 11.2.0.2 to fix several critical bugs on 11.2.0.1 that affected our systems.
After upgrading the database and initial test, sign off was given for the upgrade.
On Sunday January 16th (only one day after) I had several emails and couple of phone calls to check the database as we were having ORA-00600.
Please, be aware that ORA-00600 should be raised with Oracle. The steps below worked for us, but you will need to confirm with Oracle, your business and your Developers.
The problem:
The command generating the error was a very simple one:
ERROR at line 1: ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [1002287], [], [], [], [], [], [], [], [], [], [] ORA-06512: at "xxx.DELETE_SOURCE", line 54 ORA-06512: at line 1
The command at line 54 is:
DELETE FROM xxx.SOURCE WHERE SOURCEID = v_source;
I could find very little information on Oracle Support regarding this error.
The solution:
First thing we did was to raise a Sev. 1 with Oracle. Their suggestion was to restore the database from backup, meaning we had to downgrade primary database plus 2 standby databases.
After successfully reproducing the error I tried a different SQL, to find out if it was something wrong with the table.
I did try to execute select, insert and update, and all three worked fine:
SQL> select * FROM xxx.SOURCE WHERE SOURCEID = 49; SOURCEID SOURCENAME ABBREV ---------- ----------- ---------------- 49 xxxxx yyyyy SQL> select count(*) from xxx.source where sourceid = 90000; COUNT(*) ---------- 0 SQL> delete from xxx.source where sourceid = 90000; 0 rows deleted. SQL> rollback; SQL> update xxx.source set sourcename = 'pr newswire' where sourceid = 49; 1 row updated. SQL> rollback; Rollback complete.
But the delete for an existing row failed:
SQL> DELETE FROM xxx.SOURCE WHERE SOURCEID = 49; DELETE FROM xxx.SOURCE WHERE SOURCEID = 49 * ERROR at line 1: ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [1002300], [], [], [], [], [], [], [], [], [], []
After some trial and error and different not very successful solutions we run a trace on the SQL (something to remember for next time, one of the first things to do and well worth to spend time reading).
The trace file showed the following:
===================== PARSING IN CURSOR #47580064284056 len=283 dep=2 uid=0 oct=3 lid=0 tim=1295189617113218 hv=1612870922 ad='a06c7d880' sqlid='gnkrt49h24x8a' select pctfree_stg, pctused_stg, size_stg,initial_stg, next_stg, minext_stg, maxext_stg, maxsiz_stg, lobret_stg,mintim_stg, pctinc_stg, initra_stg, maxtra_stg, optimal_stg, maxins_stg,frlins_stg, flags_stg, bfp_stg, enc_stg, cmpflag_stg, cmplvl_stg from deferred_stg$ where obj# =:1 END OF STMT PARSE #47580064284056:c=1000,e=602,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,plh=0,tim=1295189617113216 BINDS #47580064284056: Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=2b46190f4a70 bln=22 avl=04 flg=05 value=1002300 EXEC #47580064284056:c=1000,e=1571,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,plh=866645418,tim=1295189617114927 FETCH #47580064284056:c=0,e=34,p=0,cr=1,cu=0,mis=0,r=0,dep=2,og=4,plh=866645418,tim=1295189617114999 STAT #47580064284056 id=1 cnt=0 pid=0 pos=1 obj=1062363 op='TABLE ACCESS BY INDEX ROWID DEFERRED_STG$ (cr=1 pr=0 pw=0 time=49 us cost=1 size=31 card=1)' STAT #47580064284056 id=2 cnt=0 pid=1 pos=1 obj=1062364 op='INDEX UNIQUE SCAN I_DEFERRED_STG1 (cr=1 pr=0 pw=0 time=35 us cost=0 size=0 card=1)' CLOSE #47580064284056:c=0,e=6,dep=2,type=3,tim=1295189617115210 *** 2011-01-16 14:53:37.357 <strong>DDE: Problem Key 'ORA 600 [kkpo_rcinfo_defstg:objnotfound]' was flood controlled (0x6) (incident: 812751)</strong> <strong>ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [1002300], [], [], [], [], [], [], [], [], [], []</strong> ===================== PARSE ERROR#47580064285064:len=92 dep=1 uid=0 oct=3 lid=0 tim=1295189617357412 err=600 select /*+ all_rows */ count(1) from "xxx"."INSTRUMENT_INSTRUMENTTAG" where "SOURCEID" = :1 EXEC #47580064671128:c=4059383,e=4121486,p=5,cr=331742,cu=129,mis=0,r=0,dep=0,og=1,plh=3685465056,tim=1295189617357720 ERROR #14:err=600 tim=1295189617357745 STAT #47580064671128 id=1 cnt=0 pid=0 pos=1 obj=0 op='DELETE SOURCE (cr=0 pr=0 pw=0 time=17 us)' STAT #47580064671128 id=2 cnt=1 pid=1 pos=1 obj=1063845 op='INDEX UNIQUE SCAN PK_SOURCE (cr=1 pr=0 pw=0 time=39 us cost=0 size=12 card=1)' WAIT #47580064671128: nam='SQL*Net break/reset to client' ela= 20 driver id=1413697536 break?=1 p3=0 obj#=-1 tim=1295189617358676 WAIT #47580064671128: nam='SQL*Net break/reset to client' ela= 9860 driver id=1413697536 break?=0 p3=0 obj#=-1 tim=1295189617368572 WAIT #47580064671128: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1295189617368700 *** 2011-01-16 14:53:48.940 WAIT #47580064671128: nam='SQL*Net message from client' ela= 11571422 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1295189628940169 CLOSE #47580064285064:c=0,e=19,dep=1,type=0,tim=1295189628940357 CLOSE #47580064671128:c=0,e=96,dep=0,type=0,tim=1295189628940417 =====================
Note.- The above is just an extract, the trace file identified a few more tables with the same problem.
The error was occurring on a different table. This was happening as SOURCE is one of our key tables with many references to it.
After performing a select on the failing table we had the following:
select * from xxx.instrumenttag * ERROR at line 1: ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [1002298], [], [], [], [], [], [], [], [], [], []
After a conversation with the business, we confirmed those tables were created on production couple of weekends before the upgrade and they were not being used.
As we only had integrity reference and were not being used, we decided to recreate them using the DDL from QA.
Once all tables (40+) and associate indexes were created successfully, the code worked:
SQL> begin 2 xxx.DELETE_SOURCE(49); 3 end; 4 / PL/SQL procedure successfully completed. SQL>
Thoughts:
1.- This problem was a tough one. Ora-00600 are never good, but this was one of the hardest on production, in my humble opinion, to troubleshoot without restoring and downgrading a primary and 2 standby databases.
2.- It is my believe the second parameter on the error identifies the missing object (I cannot test it now):
ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [1002287], [], [], [], [], [], [], [], [], [], []
If my assumption is right, this would have helped while troubleshooting.
3.- Oracle has a specific statement that catupgrd.sql can be rerun (Note.- ID 1100828.1), I am not so sure…
catupgrd.sql is re-runnable and the bug states the errors can be ignored
4.- We did not have the same problem on QA as those empty objects were created after we did the upgrade. This proves how difficult testing can be.
5.- With 11.2 we have something new called “Deferred Segment Creation”. It is my believe that new empty objects without segments might have been the root cause of the error during the upgrade.
This is the reason behind this new feature (From Oracle):
“When you create a non-partitioned heap table, the table segment creation is deferred to the first row insert. This functionality is enabled by default with the initialization parameter DEFERRED_SEGMENT_CREATION set to TRUE.”
The default value for this parameter is true (find more here.
The failing command was:
PARSING IN CURSOR #47580064284056 len=283 dep=2 uid=0 oct=3 lid=0 tim=1295189617113218 hv=1612870922 ad='a06c7d880' sqlid='gnkrt49h24x8a' select pctfree_stg, pctused_stg, size_stg,initial_stg, next_stg, minext_stg, maxext_stg, maxsiz_stg, lobret_stg,mintim_stg, pctinc_stg, initra_stg, maxtra_stg, optimal_stg, maxins_stg,frlins_stg, flags_stg, bfp_stg, enc_stg, cmpflag_stg, cmplvl_stg from deferred_stg$ where obj# =:1 END OF STMT
After recreating the object, the same query worked:
SYS@xxx> select object_id from dba_objects where object_name = 'INSTRUMENT_INSTRUMENTTAG'; OBJECT_ID ---------- 1063905 Elapsed: 00:00:00.00 SYS@MFE> select pctfree_stg, pctused_stg, size_stg,initial_stg, next_stg, minext_stg, maxext_stg, maxsiz_stg, lobret_stg,mintim_stg, pctinc_stg, initra_stg, maxtra_stg, optimal_stg, maxins_stg,frlins_stg, flags_stg, bfp_stg, enc_stg, cmpflag_stg, cmplvl_stg 2 from sys.deferred_stg$ 3 where obj# =1063905; PCTFREE_STG PCTUSED_STG SIZE_STG INITIAL_STG NEXT_STG MINEXT_STG MAXEXT_STG MAXSIZ_STG LOBRET_STG MINTIM_STG PCTINC_STG INITRA_STG MAXTRA_STG OPTIMAL_STG MAXINS_STG FRLINS_STG ----------- ----------- ---------- ----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------- ---------- ---------- FLAGS_STG BFP_STG ENC_STG CMPFLAG_STG CMPLVL_STG ---------- ---------- ---------- ----------- ---------- 10 0 0 1 255 8 0 0 0 Elapsed: 00:00:00.02 SYS@xxx>
I am not entirely convince catupgrd.sql will perform the link between segments (tables, indexes and LOBs) and the associate entry on sys.deferred_stg$. Something to test on QA.
6.- Query to find tables without segments:
6.1.- Coskan’s version used while troubleshooting:
SELECT 'select count(*) from ' || owner || '.' || table_name || ' ;' FROM dba_tables WHERE segment_created = 'NO';
6.2.- My version to verify objects (valid for INDEX PARTITION, TABLE PARTITION, TABLE, INDEX although the version below only checks tables):
SELECT owner, object_name, object_type FROM sys.deferred_stg$, dba_objects WHERE obj# = object_id AND object_type = 'TABLE';
As always, comments are welcome.