jump to navigation

ORA-16765 + ORA-16826: apply service state is inconsistent with the DelayMins property with Dataguard Broker 28 October 2010

Posted by David Alejo Marcos in Dataguard Broker, Oracle 11.2, Standby.
Tags: , ,
3 comments

Last Saturday I had to start-up our DR Database as the server went down 3 days before due to maintenance.

It should have been an easy task as the DR database was lagging “only” 400 files behind and all archivelogs were on the primary database.

The problem:

I had the following errors on the Dataguard Broker:

ORA-16765: Redo Apply is running

and

ORA-16826: apply service state is inconsistent with the DelayMins property

I first thought it was related to how much time we allow the standby database to be behind the primary, but it is not the case.

The solution:

I did copy all necessary archivelogs from the primary server to the standby server using SCP.

Once I had the necessary archivelogs, I registered using the following script:


spool /tmp/register.sql
begin
for i in 38010 .. 38060 loop
dbms_output.put_line ('alter database register logfile ''/u01/oradata/dbname/arch/1_'||i||'_697561259.arc'';');
end loop;
end;
/
@/tmp/register.sql

where:
1.- 38010 is the first archivelog file the standby was waiting for.
2.- 38060 is the last archivelog being generated by the primary database.

Once archivelogs were registered, the database started to apply them.

With only two logfiles behind, I decided to enable dataguard broker but the same error happened.

The only way I could get around was:

1.- To remove the standby database from the configuration:
DGMGRL> remove database 'dr_database';
Removed database "dr_database" from the configuration

2.- Add the database back to the configuration:
DGMGRL> add database 'dr_database' as connect identifier is 'dr_database';
Database "dr_database" added

3.- And then enable it again:
DGMGRL> enable database 'dr_database';
Enabled.

And voila, it started to work again.

As always, comments are welcome.

How to copy SQL Profile from one database to another one. 21 October 2010

Posted by David Alejo Marcos in Oracle 11.2, Tuning.
Tags: ,
add a comment

This is a very straight forward procedure.

The problem:

I need to copy SQL Profiles from one database to another database for stability and testing.

The solution:

6 simple steps:

1.- Create staging table to store the SQL Profiles to be copied on Source database:

MYUSER@MYDB> EXEC DBMS_SQLTUNE.CREATE_STGTAB_SQLPROF (table_name => 'PROFILE_STGTAB');

2.- Copy SQL Profiles from SYS to the staging table:

MYUSER@MYDB> EXEC DBMS_SQLTUNE.PACK_STGTAB_SQLPROF (profile_category => '%', staging_table_name => 'PROFILE_STGTAB');

As I needed to copy all SQL Profiles on my database ‘%’ value for profile_category was the best option.

3.- Export staging table.

4.- Create staging table on Destination Database:

MYUSER@MYDB> EXEC DBMS_SQLTUNE.CREATE_STGTAB_SQLPROF (table_name => 'PROFILE_STGTAB');

5.- Import data on Destination database.

6.- Create SQL Profiles on Destination database using data stored on staging table:

MYUSER@MYDB> EXEC DBMS_SQLTUNE.UNPACK_STGTAB_SQLPROF(replace => TRUE, staging_table_name => 'PROFILE_STGTAB');

I used replace = TRUE as I needed to have the same SQL Profiles on both databases.

Note.- Please, refer to http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_sqltun.htm#CACFCAEC for a full list of parameters and options.

As always, comments are welcome.

How to retrieve the hash value for a password in Oracle 11g 20 October 2010

Posted by David Alejo Marcos in Oracle 11.2, SQL - PL/SQL.
Tags:
8 comments

As we all know, Oracle has decided to do not show the hash value for the password on the dba_users view.

The problem:

Saving your passwords to a file as below does not work anymore:

SELECT 'ALTER USER ' || username || ' identified by values '''|| password || ''';' FROM dba_users;

This is necessary, in our case, for releases, DEV/QA refreshes, etc…

The solution:

I first wrote the following query:

MYUSER@mydb> SELECT substr (trim(DBMS_METADATA.get_ddl ('USER','MYUSER')),
2 REGEXP_INSTR(trim(DBMS_METADATA.get_ddl ('USER','MYUSER')),'''',1,1,0,'m'), 20
3 ) PASSWD
4 FROM DUAL;
PASSWD
--------------------------------------------------------------------------------
'2YYYYY4DXXXXXX'

Note.- The hash value has been modified for security reasons.

I was working under the assumption Oracle will not change the algorithm any time soon (currently the length is 16 characters); but I also found “dangerous” to make this assumption.

So my next query is:

select REGEXP_SUBSTR(DBMS_METADATA.get_ddl ('USER','MYUSER'), '''[^'']+''') PASSWD from dual;

which I find more elegant.

MYUSER@MYDB> select REGEXP_SUBSTR(DBMS_METADATA.get_ddl ('USER','MYUSER'), '''[^'']+''') PASSWD from dual;
PASSWD
--------------------------------------------------------------------------------
'XXXXX4YYYYYZZZZZZ'

The code we need to use from now on is:

set long 150
set linesize 150
set longchunksize 150
select 'alter user '||username||' identified by values '||REGEXP_SUBSTR(DBMS_METADATA.get_ddl ('USER',USERNAME), '''[^'']+''')||';' from dba_users

As always, comments are welcome.

Enterprise Manager Grid Control does not send emails 20 October 2010

Posted by David Alejo Marcos in Enterprise Manager, Grid Control, Oracle 11.2.
Tags: , ,
4 comments

Have you ever wondered why EM Grid Control is not sending you email notifications for your alerts?

The problem:

I have been working with EM Grid Control for some time. We have some rules and templates created to monitor our databases and we normally receive emails when a Warning or Critical threshold have been crossed.

We did upgrade our OEM repository to 11g 1 month ago and I noticed we stopped receiving emails. I did review the alerts, tested emails and all was working fine, we just did not receive emails from our alerts.

The solution:

Quite simple, but it took me some time to find it.

As part of the upgrade, the parameter JOB_QUEUE_PROCESSES was set to 0.

SQL> show parameter job_queue_processes
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes integer 0

as soon as I changed to a different number (please, bear in mind 10 is the minimum recommended by Oracle)

SQL> alter system set job_queue_processes=15 scope=memory;

I started to receive emails.

Notes:
1.- I have not tested SNMP Traps to feed our Nagios as it is still in development, but I will assume the fix is the same.

2.- Remember to change the parameter on your spfile.ora or init.ora file.

3.- As soon as you change the parameter you will start receiving all alerts (including old), I did receive over 1300 emails in 20 minutes, so you might consider doing the change on Friday by EOD and clear your inbox on Monday.

As always, comments are welcome.

appagent.bin + unable to fork new process 19 October 2010

Posted by David Alejo Marcos in Oracle 11.2, RAC.
Tags: ,
4 comments

Yesterday (18/10/2010)  we had a problem on our QA cluster.

After refreshing the database from production, several of our DBAs noticed they could not connect to the first node from time to time.

At this point, I did have a look.

The problem:

We had intermittent disconnections from node a (we have a 3-node cluster). I was told the error being raised was “unable to fork new process”.

I did have a look on Metalink, but I could not find any entry related to appagent.bin.

The solution:

We did have the same problem some time ago and we fixing by stopping the server. This type of fix is not our prefer option as we have 3 databases running on the same clusterware.

To identify the process with more threads we used the following command (Thanks to Coskan (Josh) Gundogar):

[oracle@ssssa trace]$ ps -eLf | grep -i appagent | wc -l
30793

we compared this number with node 2:

[oracle@ssssb trace]$  ps -eLf | grep appagent | wc -l
24

and it proved something was not right with the process.

Something interesting about this process on the first node was the date:

[oracle@ssssa trace]$ ps -eLf | grep appagent
oracle   14768     1  9159  0 30792 Oct15 ?       00:00:00 /u01/crs/oracle/product/11.2.0/grid/bin/appagent.bin
oracle   14768     1  9223  0 30792 Oct15 ?       00:00:00 /u01/crs/oracle/product/11.2.0/grid/bin/appagent.bin
oracle   14768     1  9317  0 30792 Oct15 ?       00:00:00 /u01/crs/oracle/product/11.2.0/grid/bin/appagent.bin

so all of them (30793) were 3 days old.

I found some errors on the alert.log file regarding this process:

[oracle@ssssssa]$ cd /u01/crs/oracle/product/11.2.0/grid/log/sssssssa
[oracle@sssssssa]$ tail -100f alertssssssssa.log

[client(17908)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/crs/oracle/product/11.2.0/grid/log/ssssssa/client/css.log.
2010-10-18 14:16:25.223
[/u01/crs/oracle/product/11.2.0/grid/bin/oraagent.bin(9413)]CRS-5013:Agent "/u01/crs/oracle/product/11.2.0/grid/bin/oraagent.bin" failed to start process "/u01/crs/oracle/product/11.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/u01/crs/oracle/product/11.2.0/grid/log/ssssssa/agent/crsd/oraagent_oracle/oraagent_oracle.log"
2010-10-18 14:16:37.512
[client(10936)]CRS-2313:Fatal error: failed to initialize GIPC communication layer. GIPC error: gipcretInitialization (2).
2010-10-18 14:16:37.512
[client(10936)]CRS-2316:Fatal error: cannot initialize GPnP, CLSGPNP_ERR (Generic GPnP error).
2010-10-18 14:16:37.539
[client(10936)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/crs/oracle/product/11.2.0/grid/log/ssssssa/client/css.log.
2010-10-18 14:16:38.920
[/u01/crs/oracle/product/11.2.0/grid/bin/oraagent.bin(9413)]CRS-5013:Agent "/u01/crs/oracle/product/11.2.0/grid/bin/oraagent.bin" failed to start process "/u01/crs/oracle/product/11.2.0/grid/opmn/bin/onsctli" for action "check": details at "(:CLSN00008:)" in "/u01/crs/oracle/product/11.2.0/grid/log/ssssssa/agent/crsd/oraagent_oracle/oraagent_oracle.log"
2010-10-18 14:16:53.100
[/u01/crs/oracle/product/11.2.0/grid/bin/oraagent.bin(9413)]CRS-5013:Agent "/u01/crs/oracle/product/11.2.0/grid/bin/oraagent.bin" failed to start process "/u01/crs/oracle/product/11.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/u01/crs/oracle/product/11.2.0/grid/log/ssssssa/agent/crsd/oraagent_oracle/oraagent_oracle.log"
2010-10-18 14:17:25.227
[/u01/crs/oracle/product/11.2.0/grid/bin/oraagent.bin(9413)]CRS-5013:Agent "/u01/crs/oracle/product/11.2.0/grid/bin/oraagent.bin" failed to start process "/u01/crs/oracle/product/11.2.0/grid/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/u01/crs/oracle/product/11.2.0/grid/log/sssssa/agent/crsd/oraagent_oracle/oraagent_oracle.log"
2010-10-18 14:17:38.921
[/u01/crs/oracle/product/11.2.0/grid/bin/oraagent.bin(9413)]CRS-5013:Agent "/u01/crs/oracle/product/11.2.0/grid/bin/oraagent.bin" failed to start process "/u01/crs/oracle/product/11.2.0/grid/opmn/bin/onsctli" for action "check": details at "(:CLSN00008:)" in "/u01/crs/oracle/product/11.2.0/grid/log/sssssa/agent/crsd/oraagent_oracle/oraagent_oracle.log"

and I was told one of the filesystem filled on Friday October 15th.

As we had an agreement with all 3 businesses to have a downtime I decided to kill the process:

[oracle@ssssssa trace]$ kill -9 14768

and it fixed the problem:

[oracle@ssssa trace]$ ps -eLf | grep appagent | wc -l
14

[oracle@ssssssa trace]$ ps -eLf | grep appagent
oracle   18655     1 18655  0   13 15:17 ?        00:00:00 /u01/crs/oracle/product/11.2.0/grid/bin/appagent.bin
oracle   18655     1 18657  0   13 15:17 ?        00:00:00 /u01/crs/oracle/product/11.2.0/grid/bin/appagent.bin
oracle   18655     1 18658  0   13 15:17 ?        00:00:00 /u01/crs/oracle/product/11.2.0/grid/bin/appagent.bin
oracle   18655     1 18659  0   13 15:17 ?        00:00:00 /u01/crs/oracle/product/11.2.0/grid/bin/appagent.bin
oracle   18655     1 18660  0   13 15:17 ?        00:00:00 /u01/crs/oracle/product/11.2.0/grid/bin/appagent.bin
oracle   18655     1 18661  0   13 15:17 ?        00:00:00 /u01/crs/oracle/product/11.2.0/grid/bin/appagent.bin
oracle   18655     1 18662  0   13 15:17 ?        00:00:00 /u01/crs/oracle/product/11.2.0/grid/bin/appagent.bin
oracle   18655     1 18663  0   13 15:17 ?        00:00:00 /u01/crs/oracle/product/11.2.0/grid/bin/appagent.bin
oracle   18655     1 18664  0   13 15:17 ?        00:00:00 /u01/crs/oracle/product/11.2.0/grid/bin/appagent.bin
oracle   18655     1 18665  0   13 15:17 ?        00:00:00 /u01/crs/oracle/product/11.2.0/grid/bin/appagent.bin
oracle   18655     1 18670  0   13 15:17 ?        00:00:00 /u01/crs/oracle/product/11.2.0/grid/bin/appagent.bin
oracle   18655     1 18673  0   13 15:17 ?        00:00:00 /u01/crs/oracle/product/11.2.0/grid/bin/appagent.bin
oracle   18655     1 18694  0   13 15:17 ?        00:00:00 /u01/crs/oracle/product/11.2.0/grid/bin/appagent.bin
oracle   19230 22249 19230  0    1 15:18 pts/4    00:00:00 grep appagent

Conclusions:

I am not sure if the problem with filesystems getting full is related to this issue or “The OCR location in an ASM disk group is inaccessible” logged on the alert.log.

For the above, we have an SR with Oracle.

As always, comments are welcome.

ORA-01187: cannot read from file because it failed verification tests + ORA-01110 on tempfiles 5 October 2010

Posted by David Alejo Marcos in Oracle 11.2, Standby.
Tags: ,
3 comments

Today I had a call from a Java developer asking me to help him with an error from our applications. The error being reported was:

java.sql.SQLException: ORA-01187: cannot read from file because it failed verification tests ORA-01110: data file 206: '/u03/oradata/xxx/tempapp22.dbf'

so I did have a look.

The problem:

ORA-01187 + ORA-01110 on the application logs. This part of the application runs on an Active Standby database.

This database has two temporary tablespaces called TEMPAPP and TEMPUSER. We assign TEMPAPP to application users and TEMPUSER to physical users.

The solution:

The first thing I did was to check the existence of the tempfile on the filesystem:

[oracle@sssss ~]$ ls -lrt /u03/oradata/xxx/tempapp22.dbf
-rw-r--r-- 1 oracle oinstall 26843553792 Oct 2 11:26 /u03/oradata/xxx/tempapp22.dbf

ok, this is a step forward as the file exists on the server. What did not look ok was the date of the file (October 2nd) and today is October 5th… maybe there is not much activity on the database?

Then I did have a look on the database:

SQL> select file_name from dba_temp_files;
ERROR:
ORA-01187: cannot read from file because it failed verification tests
ORA-01110: data file 205: '/u03/oradata/xxx/tempapp11.dbf'

This did not look good, specially because the tempfile being returned was a different one from the application. After some trial and error I decided the following strategy:

1.- Drop the tempfile from the database:
SQL> alter database tempfile '/u03/oradata/xxx/tempapp11.dbf' drop;

2.- Create a new tempfile for the temporary tablespace:
alter tablespace TEMPAPP1 add tempfile ‘/u03/oradata/xxx/tempapp11.dbf’ SIZE 25600M REUSE;

I did have to drop all tempfiles (from TEMPAPP and TEMPUSER) as all of them were giving problems when selecting from dba_temp_files

Once I did drop all tempfiles and created them again, the query did return the expected rows:

SQL> select file_name from dbA_temp_files;
FILE_NAME
--------------------------------------------------------------------------------
/u03/oradata/xxx/tempapp12.dbf
/u03/oradata/xxx/tempapp21.dbf
/u03/oradata/xxx/tempapp11.dbf
/u03/oradata/xxx/tempapp22.dbf
/u03/oradata/xxx/tempusers11.dbf
/u03/oradata/xxx/tempusers21.dbf

As always, comments are welcome.