jump to navigation

RMAN backup fails (ORA-27211: Failed to load Media Management Library) 16 July 2010

Posted by David Alejo Marcos in Oracle 11.2, RMAN.
Tags: ,
trackback

The current backup setup for some of our systems is to perform a backup using NetBackup.

We have experiencing some problems on one of our databases. The interesting thing is the backup from the standby database was working fine, but the same backup from production was failing.

The problem:

RMAN> crosscheck backup;
released channel: ORA_DISK_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of crosscheck command at 07/15/2010 11:41:52
ORA-19554: error allocating device, device type: SBT_TAPE, device name:
ORA-27211: Failed to load Media Management Library
Additional information: 2

The solution:

I spoke with our SAs and I was told we were running the right version of NetBackup. After some investigation, I decided to check the libraries myself, and this is what I found:

oracle@standby dbhome_1]$ ls -lrt ./lib/libobk.so
lrwxrwxrwx 1 oracle oinstall 36 Jul 14 11:09 ./lib/libobk.so -> /usr/openv/netbackup/bin/libobk.so64
[oracle@primary dbhome_1]$ ls -lrt ./lib/libobk.so
lrwxrwxrwx 1 oracle oinstall 34 Jul 14 14:14 ./lib/libobk.so -> /usr/openv/netbackup/bin/libobk.so

On our primary database, the link on the ORACLE_HOME was for 32 bits, while on the standby, it was for 64. As our platform is 64 bits, I asked the SA to relink the library.

After this was done, I proceeded to run another test. The result was a different error:

channel c01: starting piece 1 at 15-07-2010 10:04:37
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on c01 channel at 07/15/2010 12:05:42
RMAN-10038: database session for channel c01 terminated unexpectedly

I did check the alert.log on the primary database and this is what I found:

Thu Jul 15 10:31:59 2010
SERVER COMPONENT id=UTLRP_BGN: timestamp=2010-07-15 10:31:59
SERVER COMPONENT id=UTLRP_END: timestamp=2010-07-15 10:32:01
Thu Jul 15 10:32:32 2010
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0xF] [PC:0x3C2B078820, strcpy()+16] [flags: 0x0, count: 1]
Errors in file /opt/oracle/diag/rdbms/ssssss/yyyyyy/trace/yyyyy_ora_25177.trc  (incident=39130):
ORA-07445: exception encountered: core dump [strcpy()+16] [SIGSEGV] [ADDR:0xF] [PC:0x3C2B078820] [Address not mapped to object] []
Incident details in: /opt/oracle/diag/rdbms/ssssss/yyyyyy/incident/incdir_39130/yyyy_ora_25177_i39130.trc
Thu Jul 15 10:32:32 2010
Trace dumping is performing id=[cdmp_20100715103232]
Thu Jul 15 10:32:35 2010
Sweep [inc][39130]: completed
Sweep [inc2][39130]: completed

The following is and extract from /opt/oracle/diag/rdbms/sssssss/yyyyyy/trace/yyyyy_ora_25177.trc:

*** 2010-07-15 10:32:32.012
*** SESSION ID:(138.18967) 2010-07-15 10:32:32.012
*** CLIENT ID:() 2010-07-15 10:32:32.012
*** SERVICE NAME:(SYS$USERS) 2010-07-15 10:32:32.012
*** MODULE NAME:(rman@mandela.marketxs.com (TNS V1-V3)) 2010-07-15 10:32:32.012
*** ACTION NAME:(0000006 STARTED62) 2010-07-15 10:32:32.012
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0xF] [PC:0x3C2B078820, strcpy()+16] [flags: 0x0, count: 1]
Incident 39130 created, dump file: /opt/oracle/diag/rdbms/sssssss/yyyyy/incident/incdir_39130/yyy_ora_25177_i39130.trc
ORA-07445: exception encountered: core dump [strcpy()+16] [SIGSEGV] [ADDR:0xF] [PC:0x3C2B078820] [Address not mapped to object] []
ssexhd: crashing the process...
Shadow_Core_Dump = PARTIAL

I could not find any information regarding ORA-07445 and strcpy()+16, and the contents of /opt/oracle/diag/rdbms/ssssss/yyyyyy/incident/incdir_39130/yyyy_ora_25177_i39130.trc did not help much:

Dump continued from file: /opt/oracle/diag/rdbms/horvitz/UXS/trace/UXS_ora_25177.trc
ORA-07445: exception encountered: core dump [strcpy()+16] [SIGSEGV] [ADDR:0xF] [PC:0x3C2B078820][Address not mapped to object] []
========= Dump for incident 39130 (ORA 7445 [strcpy()+16]) ========
----- Beginning of Customized Incident Dump(s) -----
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0xF] [PC:0x3C2B078820, strcpy()+16] [flags: 0x0, count: 1]
Registers:
%rax: 0x3720393931393233 %rbx: 0x00007fff096551e0 %rcx: 0x0000000000000001
%rdx: 0x3720393931393233 %rdi: 0x3720393931393233 %rsi: 0x000000000000000f
%rsp: 0x00007fff09655138 %rbp: 0x00007fff09655160  %r8: 0x0000000000000004
%r9: 0x0000003c2b118760 %r10: 0x0000003c2b351a30 %r11: 0x0000000000000000
%r12: 0x000000001820db10 %r13: 0x0000000000000048 %r14: 0x000000000000000f
%r15: 0xffffffffffffffff %rip: 0x0000003c2b078820 %efl: 0x0000000000010213
> (0x3c2b078820) mov (%rsi),%al
(0x3c2b078822) test %al,%al
(0x3c2b078824) mov %al,(%rdx)
(0x3c2b078826) jz 0x3c2b0788e8
(0x3c2b07882c) inc %rsi

So I decided to run the backup with log and trace:

[oracle@xxxxx trace]$ cd /tmp
[oracle@xxxxxxx tmp]$ORACLE_HOME/bin/rman debug trace rman.trc log rman.log

The log file did not provide much information, but the trace file contained some interesting bits:

DBGMISC:      EXITED krmice [13:18:42.647] elapsed time [00:00:00:00.127]
Calling krmmpem from krmmexe
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of crosscheck command on ORA_SBT_TAPE_1 channel at 07/15/2010 13:18:42
RMAN-10032: unhandled exception during execution of job step 1:
ORA-03113: end-of-file on communication channel
ORA-06512: at line 223
RMAN-10031: RPC Error: ORA-03113  occurred during call to DBMS_BACKUP_RESTORE.VALIDATEBACKUPPIECE
DBGMISC:      ENTERED krmkursr [13:18:42.647]

I knew the package was compiled (I checked it), so my guess was a problem at NetBackup level.

The SA confirmed we were running the latest version (6.5.5), so I decide to have a look to those libraries myself and I spotted the problem:

-r-xr-xr-x 1 oracle oinstall    89873 May  1  2009 libobk.so64

The library had not been updated when the SAs deployed the patch, somehow the library was old, as the patch was deployed early June.

As soon as the SA copied libobk.so64 from the standby server all started to work as expected:

Starting backup at 15-07-2010 18:26:01
channel c01: starting full datafile backup set
channel c01: specifying datafile(s) in backup set
including current SPFILE in backup set
channel c01: starting piece 1 at 15-07-2010 16:26:02
channel c01: finished piece 1 at 15-07-2010 16:26:37
piece handle=83lis1oq_1_1 tag=TAG20100715T162601 comment=API Version 2.0,MMS Version 5.0.0.0
channel c01: backup set complete, elapsed time: 00:00:35
Finished backup at 15-07-2010 18:26:37
released channel: c01
RMAN>

So the problem was fixed.

Note.- Maybe I should have checked the NetBackup libraries just after finding out the link was wrong; this would have save me 3 hours of troubleshooting.

On the other hand, it was a lesson learned and quite interesting to follow the RMAN trace file.

As always, comments are welcome.

Advertisements

Comments»

1. www.rmanbackup.com - 5 October 2010

Thank you so much for sharing this precious information with us.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: