jump to navigation

Extending Oracle Enterprise Manager (EM) monitoring. 29 January 2012

Posted by David Alejo Marcos in Grid Control, Oracle 11.2, SQL - PL/SQL.
Tags: , , ,
trackback

I always found Oracle Enterprise Manager (EM) to be an interesting tool for different reasons. The only thing I missed was an easy way to create my own alerts.

It is very simple to create a KSH, Perl, etc script to do some customised monitoring and notify you by email, Nagios, NetCool, etc.

By integrating your scripts with OEM, you will have an easy way to enhance your monitoring and still have notification by email, SNMP traps, etc. as you would currently have if your company is using OEM for monitoring your systems.

The problem:
Develop an easy way to integrate your monitoring scripts with OEM.

The solution:
I decided to use an Oracle type and an Oracle function to accomplish this goal. Using the steps described below, we can monitor pretty much whatever aspect of the database providing you can put the logic into a function.

As example, I had added the steps to create two new User-Defined SQL Metrics, as Oracle calls them:
1.- Long Running Sessions (LRS).
2.- Tablespace Monitoring.

The reason to have my own TBS monitoring is to enhance the existing as it has “hard-coded” thresholds. I might have tablespaces in your database which are 6TBS in size and other with only 2Gb, so raising an alert at 95% for both of them is, in my opinion, not adequate.

You can find more about the query I developed here.

The steps to create a script to monitor long running sessions (LRS) are:

1.- create types

CREATE OR REPLACE TYPE lrs_obj as OBJECT
(
user_name VARCHAR2(256),
error_message varchar(2000));
/

CREATE OR REPLACE TYPE lrs_array AS TABLE OF lrs_obj;
/

2.- create function.

CREATE OR REPLACE FUNCTION lrs RETURN lrs_array IS
long_running_data lrs_array := lrs_array();
ln_seconds_active number := 300;
BEGIN
SELECT lrs_obj(username||' Sec: '||sec_running, ', Inst_id: '||inst_id||', SID: '||sid||', Serial: '|| serial||', Logon: '||session_logon_time||', sql_id: '||sql_id)
BULK COLLECT
INTO long_running_data
FROM
(SELECT
/*+ FIRST_ROWS USE_NL(S,SQ,P) */
s.inst_id inst_id,
s.sid sid,
s.serial# serial,
s.last_call_et sec_running,
NVL(s.username, '(oracle)') AS username,
to_char(s.logon_time, 'DD-MM-YYYY HH24:MI:SS') session_logon_time,
s.machine,
NVL(s.osuser, 'n/a') AS osuser,
NVL(s.program, 'n/a') AS program,
s.event,
s.seconds_in_wait,
s.sql_id sql_id,
sq.sql_text
from gv$session s, gv$sqlarea sq
where s.sql_id = sq.sql_id
and s.inst_id = sq.inst_id
and s.status = 'ACTIVE'
and s.last_call_et > ln_seconds_active
and s.paddr not in ( select paddr from gv$bgprocess where paddr != '00' )
and s.type != 'BACKGROUND'
and s.username not in ( 'SYSTEM', 'SYS' )
AND s.event != 'SQL*Net break/reset to client'
) CUSTOMER_QUERY;
RETURN long_running_data;
END lrs;
/

3.- Grant privileges to the users will be executing monitoring scripts:

grant execute on lrs_obj to public;
grant execute on lrs_array to public;
grant execute on lrs to public;

4.- create synonyms

Create public synonym lrs_obj for lrs_obj;
Create public synonym lrs_array for lrs_array;
Create public synonym lrs for lrs;

5.- Query to monitor

SELECT user_name, error_message FROM TABLE(CAST(lrs as lrs_array));

Once we are satisfied with the thresholds (300 seconds on the script), we are ready to add it to EM.

1.- Navigate to User-Defined SQL Metrics (you need to navigate to your database and you will find the link at bottom).
2.- Create new User-Defined SQL Metric and fill the gaps (I have attached some values for reference). The most important thing right now is to make sure metric_Type = String, Sql Query Output = two columns, comparison Operator = CONTAINS and warning has a value returned by the query (i did decide to go for Inst_id).

The only thing left now is to add this monitoring to your templates and to your rules so notifications are being sent.

Once all has been configure, you should start seeing alerts like this:

Target Name=lab01 
Target type=Cluster Database 
Host=mylab 
Occurred At=Jan 22, 2012 14:35:47 PM GMT 
Message=LRS alert: key = DAVIDLAB Sec: 530, value = , Inst_id: 1, SID: 153, Serial: 1597, Logon: 22-01-2012 12:21:46, sql_id: 3m72fjep12w8r 
Metric=StrValue 
Metric value=, Inst_id: 1, SID: 153, Serial: 1597, Logon: 22-01-2012 12:21:46, sql_id: 3m72fjep12w8r 
Metric ID=lrs
Key=DAVIDLAB Sec: 530
Severity=Warning 
Acknowledged=No 
Notification Rule Name=david alerts 
Notification Rule Owner=DAVIDLAB 

For Tablespace monitoring the steps will be the same as described above:

1.- create types

CREATE OR REPLACE TYPE tbs_obj as OBJECT
(
tablespace_name VARCHAR2(256),
error_message varchar(2000));
/

CREATE OR REPLACE TYPE tbs_array AS TABLE OF tbs_obj;
/

2.- create function.

CREATE OR REPLACE FUNCTION calc_tbs_free_mb RETURN tbs_array IS
tablespace_data tbs_array := tbs_array();
BEGIN
SELECT tbs_obj(tablespace_name, alert||', Used_MB: '||space_used_mb||', PCT_Free: '||pct_free||', FreeMB: '|| free_space_mb) BULK COLLECT
INTO tablespace_data
FROM
(SELECT (CASE
WHEN free_space_mb <= DECODE (allocation_type, 'UNIFORM', min_extlen, maxextent) * free_extents THEN 'CRITICAL'
WHEN free_space_mb <= DECODE (allocation_type, 'UNIFORM', min_extlen, maxextent) * free_extents + 20 THEN 'WARNING'
ELSE 'N/A'
END)
alert,
tablespace_name,
space_used_mb,
ROUND (free_space/power(1024,3), 2) free_gb,
free_space_mb,
pct_free,
ROUND (extend_bytes/power(1024,3), 2) extend_gb,
free_extents,
max_size_gb,
maxextent
FROM (SELECT c.tablespace_name,
NVL (ROUND ( (a.extend_bytes + b.free_space) / (bytes + a.extend_bytes) * 100,2), 0) pct_free,
NVL ( ROUND ( (a.extend_bytes + b.free_space) / 1024 / 1024, 2), 0) free_space_mb,
(CASE
WHEN NVL ( ROUND ( (maxbytes) / power(1024,3), 2), 0) <= 30 THEN 60
WHEN NVL ( ROUND ( (maxbytes) / power(1024,3), 2), 0) <= 100 THEN 120
WHEN NVL ( ROUND ( (maxbytes) / power(1024,3), 2), 0) <= 300 THEN 200
WHEN NVL ( ROUND ( (maxbytes) / power(1024,3), 2), 0) <= 800 THEN 300
ELSE 340
END) free_extents,
a.extend_bytes, b.free_space,
ROUND (maxbytes /power(1024,3), 2) max_size_gb,
nvl (round(a.bytes - b.free_space ,2) /1024/1024, 0) space_used_mb, 
c.allocation_type,
GREATEST (c.min_extlen / 1024 / 1024, 64) min_extlen,
64 maxextent
FROM ( SELECT tablespace_name,
SUM(DECODE (SIGN (maxbytes - BYTES), -1, 0, maxbytes - BYTES)) AS extend_bytes,
SUM (BYTES) AS BYTES,
SUM (maxbytes) maxbytes
FROM DBA_DATA_FILES
GROUP BY tablespace_name) A,
( SELECT tablespace_name,
SUM (BYTES) AS free_space,
MAX (BYTES) largest
FROM DBA_FREE_SPACE
GROUP BY tablespace_name) b,
dba_tablespaces c
WHERE c.contents not in ('UNDO','TEMPORARY') and
b.tablespace_name(+) = c.tablespace_name
AND a.tablespace_name = c.tablespace_name
) 
) CUSTOMER_QUERY;
RETURN tablespace_data;
END calc_tbs_free_mb;
/

3.- Grant privileges to the users will be executing monitoring scripts:

grant execute on tbs_obj to public;
grant execute on tbs_array to public;
grant execute on calc_tbs_free_mb to public;

4.- create synonyms

Create public synonym tbs_obj for tbs_obj;
Create public synonym tbs_array for tbs_array;
Create public synonym calc_tbs_free_mb for calc_tbs_free_mb;

5.- Query to monitor

SELECT * FROM TABLE(CAST(calc_tbs_free_mb as tbs_array));

Please, remember to use comparison operator = CONTAINS, warning = WARNING and critical=CRITICAL

As always, comments are welcome.

David Alejo-Marcos.

David Marcos Consulting Ltd.

About these ads

Comments»

1. jim boles - 29 January 2012

Thx David –

I have this on my to-do stack to try out. May be a while before I get to it. I’ll let you know how it goes.

thx again…

2. Carl - 10 April 2012

Hi. I am getting an error when I try to test the metric.

Received unexpected response text :

That’s all it gives me. I know for sure the query isn’t returning any rows however. Is this expected?

thx,
Carl

3. Carl - 10 April 2012

oh. forgot to mention that I have created a udm for the lrs function.
thx

David Alejo Marcos - 18 May 2012

Hello Carl,

Thank you for your comments.

I have not experienced this error. If the query does not return any rows, no alert will be raised.

The following link might help

https://forums.oracle.com/forums/thread.jspa?threadID=518881

Please, post any errors on the em log files as they might be helpful.

Regards,

David Marcos.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: