How to transfer files from ASM to another ASM or filesystem or DBFS… 23 July 2011
Posted by David Alejo Marcos in ASM, Exadata, Oracle 11.2.Tags: ASM, Exadata, Oracle 11.2
2 comments
I had a requirement of transferring files from our PROD ASM to our UAT ASM as DBFS is proving to be slow.
The problem:
We are currently refreshing UAT schemas using Oracle Datapump to DBFS and then transferring those files to UAT using SCP.
DBFS does not provided us with the performance we need as datapump files are quite big. Same export onto ASM or NFS proves to be much, much faster.
We are currently testing exports to ASM, but, how to move dmp files from PROD ASM to UAT ASM?
The solution:
The answer for us is using DBMS_FILE_TRANSFER. It is very simple to set up (steps below) and it has proved to be fast.
DBMS_FILE_TRANSFER will copy files from one ORACLE DIRECTORY to another ORACLE DIRECTORY. The directory can point to a folder on ASM, DBFS, Filesystem, etc, so the transfer is “heterogeneous”.
I decided to go for GET_FILE, so most of the work will be done on UAT, the other option is PUT_FILE.
The syntax is as follows:
DBMS_FILE_TRANSFER.GET_FILE ( source_directory_object IN VARCHAR2, source_file_name IN VARCHAR2, source_database IN VARCHAR2, destination_directory_object IN VARCHAR2, destination_file_name IN VARCHAR2);
Where:
source_directory_object: The directory object from which the file is copied at the source site. This directory object must exist at the source site.
source_file_name: The name of the file that is copied in the remote file system. This file must exist in the remote file system in the directory associated with the source directory object.
source_database: The name of a database link to the remote database where the file is located.
destination_directory_object: The directory object into which the file is placed at the destination site. This directory object must exist in the local file system.
destination_file_name: The name of the file copied to the local file system. A file with the same name must not exist in the destination directory in the local file system.
These are the steps for my test:
1.- Create directory on destination:
oracle@sssss (+ASM1)$ asmcmd ASMCMD> mkdir +DATA/DPUMP ASMCMD> mkdir +DATA/DPUMP/sid ASMCMD> exit
2.- Create directory on database
SQL> create directory SID_ASM_DPUMP_DIR as '+DATA/DPUMP/sid'; Directory created. SQL> grant read, write, execute on directory SID_ASM_DPUMP_DIR to public; Grant succeeded.
3.- Transfer file
SQL> set timing on time on
17:22:13 SQL> BEGIN
17:22:17 2 dbms_file_transfer.get_file ('SID_ASM_DPUMP_DIR',
17:22:17 3 'expdp.dmp',
17:22:17 4 'TESTSRC',
17:22:17 5 'SID_ASM_DPUMP_DIR',
17:22:17 6 'expdp.dmp');
17:22:17 7 END;
17:22:17 8
17:22:17 9 /
PL/SQL procedure successfully completed.
Elapsed: 00:01:07.57
17:23:31 SQL>
4.- Check files is on destination:
ASMCMD [+DATA/DPUMP/sid] > ls -ls Type Redund Striped Time Sys Name N expdp.dmp => +DATA/sid/DUMPSET/FILE_TRANSFER_0_0.797.756840143
OK, so ls -ls does not work show us the size of the file on the destination directory. The reason is because it is an alias, you need to perform ls -s on the directory where the file is stored.
ASMCMD [+DATA/DPUMP/sid] > ls -ls +DATA/sid/DUMPSET/ Type Redund Striped Time Sys Block_Size Blocks Bytes Space Name DUMPSET MIRROR COARSE JUL 18 17:00:00 Y 4096 1784576 7309623296 14633926656 FILE_TRANSFER_0_0.797.756840143
So, we have managed to transfer 6.8GB of data between two remote servers in 1 minute 8 seconds… not bad.
As always, comments are welcome.
David Alejo-Marcos.
David Marcos Consulting Ltd.
Archive area +RECO has -7440384 free KB remaining (Usable_file_MB is negative) 17 July 2011
Posted by David Alejo Marcos in ASM, Exadata, Oracle 11.2, RMAN.Tags: ASM, Exadata, Oracle 11.2, RMAN
add a comment
I must say, this has been a busy weekend.
We have been promoting a release to production and a guaranteed restore point was created on Friday as rollback strategy. On Sunday I was called as we started to receive alerts.
The problem:
Our monitoring system started to send emails and SNMP Traps with the following alerts:
OEM alert for Automatic Storage Management +ASM4_ssssss4: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100) OEM alert for Automatic Storage Management +ASM2_ssssss2: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100) OEM alert for Automatic Storage Management +ASM3_ssssss3: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100) OEM alert for Automatic Storage Management +ASM7_ssssss7: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100) OEM alert for Automatic Storage Management +ASM8_ssssss8: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100) OEM alert for Automatic Storage Management +ASM6_ssssss6: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100) OEM alert for Automatic Storage Management +ASM1_ssssss1: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100) OEM alert for Automatic Storage Management +ASM5_ssssss5: Disk group RECO has used 100% of safely usable free space. (Current Disk Group Used % of Safely Usable value: 100) OEM alert for Database Instance : Archive area +RECO has -7440384 free KB remaining. #Current Free Archive Area #KB# value: -7440384# OEM alert for Database Instance : Archive area +RECO has -21725184 free KB remaining. (Current Free Archive Area (KB) value: -21725184)
Not very nice in any situation, but not when you are in the middle of a critical, high visible release.
I did have a look and this is what I found.
The solution:
The first thing I did was to check the Flash Recovery Area, as it is configured to write to our +RECO diskgroup:
NAME USED_MB LIMIT_MB PCT_USED -------------------- ---------- ---------- ---------- +RECO 1630569 2048000 79.62 Elapsed: 00:00:00.12 FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES -------------------- ------------------ ------------------------- --------------- CONTROL FILE 0 0 1 REDO LOG 0 0 0 ARCHIVED LOG 0 0 0 BACKUP PIECE 75.66 75.66 86 IMAGE COPY 0 0 0 FLASHBACK LOG 3.96 0 707 FOREIGN ARCHIVED LOG 0 0 0 7 rows selected. Elapsed: 00:00:01.62
Numbers did look ok, some backup files could be reclaimed (Oracle should do it automatically). Lets have a look the ASM:
oracle@ssss (+ASM1)$ asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 4194304 55050240 26568696 5004567 10782064 0 N DATA/ MOUNTED NORMAL N 512 4096 4194304 35900928 3192844 3263720 -35438 0 N RECO/ MOUNTED NORMAL N 512 4096 4194304 4175360 4061640 379578 1841031 0 N SYSTEMDG/ Elapsed: 00:00:01.62
Bingo, this is where our problem is. USABLE_FILE_MB (+RECO diskgroup) indicates the amount of free space that can be utilized, including the mirroring space, and being able to restore redundancy after a disk failure. A negative number on this column, could be critical in case of disk failure for the system as we might not have enough space perform a restore of all files to the surviving of disk.
Our backups goes to ASM and we copy them to tape afterwards. Our retention policy on disk is between 2 or 3 days, depending of the systems.
When I did check the contents of the backupset on ASM I found some old backups:
Type Redund Striped Time Sys Name Y 2011_07_17/ Y 2011_07_16/ Y 2011_07_15/ Y 2011_07_14/ Y 2011_07_13/ Y 2011_07_12/ Elapsed: 00:00:01.62
To delete those old backups I executed the following script from RMAN:
RMAN> delete backup tag EOD_DLY_110712 device type disk; allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=1430 instance=<instance> device type=DISK allocated channel: ORA_DISK_2 channel ORA_DISK_2: SID=2138 instance=<instance> device type=DISK allocated channel: ORA_DISK_3 channel ORA_DISK_3: SID=8 instance=<instance> device type=DISK allocated channel: ORA_DISK_4 channel ORA_DISK_4: SID=150 instance=<instance> device type=DISK List of Backup Pieces BP Key BS Key Pc# Cp# Status Device Type Piece Name ------- ------- --- --- ----------- ----------- ---------- 8292 4020 1 1 AVAILABLE DISK +RECO//backupset/2011_07_12/sssss_eod_dly_110712_0.nnnn.nnnnn 8293 4021 1 1 AVAILABLE DISK +RECO//backupset/2011_07_12/sssss_eod_dly_110712_0.nnnn.nnnnn Do you really want to delete the above objects (enter YES or NO)? yes deleted backup piece backup piece handle=+RECO//backupset/2011_07_12/sssss_eod_dly_110712_0.nnnn.nnnnnn RECID=8292 STAMP=nnnn deleted backup piece backup piece handle=+RECO//backupset/2011_07_12/sssss_eod_dly_110712_0.nnnn.nnnnnn RECID=8293 STAMP=nnnn Deleted 2 objects Elapsed: 00:00:01.62
After deleting a two more old backups, the number looked much better:
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 4096 4194304 55050240 26568696 5004567 10782064 0 N DATA/ MOUNTED NORMAL N 512 4096 4194304 35900928 3548260 3263720 142270 0 N RECO/ MOUNTED NORMAL N 512 4096 4194304 4175360 4061640 379578 1841031 0 N SYSTEMDG/
Note.- There is another temporary fix. I could changed db_recovery_file_dest to point to +DATA instead of +RECO, but as we have a guaranteed restore point, I thought releasing space from old backups was easier.
As always, comments are welcome.
David Alejo-Marcos.
David Marcos Consulting Ltd.
Exadata Administration – CellCLI 16 July 2011
Posted by David Alejo Marcos in Exadata, Oracle 11.2.add a comment
One of the big differences between Exadata and Non-Exadata systems is the necessity to administer the Exadata Storage Server.
The first time you have to configure the Server side, it has to be done through KVM (Keyboard, Video, Mouse), meaning you will need to be physically near your server. Once the initial configuration steps have been performed, we shall be able to administer the Exadata Storage Servers over the network (i.e. SSH protocol or redirect the KVM console to your desktop using the Sun Integrated Lights Out Management – ILOM – remote client).
Once you are in the server, some tools will be available to us, including Cell Command Line Interface (CellCLI).
CellCLI is only available on the Storage Server. If we need to execute it from outside the storage server (or on more than one storage server), we will need to use the Distributed Command Line Interface (dcli).
CellCLI will allow us to perform administration task like startup, shut down, monitoring activities as well as maintenance.
The syntax to follow is:
[admin command / object command] [options];
Being:
1.- Admin Commands: Administration actions like START, QUIT, HELP, SPOOL.
2.- Object Command: Administration action to be performed on the cell objects like ALTER, CREATE, LIST.
3.- Options: Will allow us to specify additional parameters to the command.
[celladmin@ssssss ~]$ cellcli CellCLI: Release 11.2.2.2.0 - Production on Sat Jul 16 17:28:08 BST 2011 Copyright (c) 2007, 2009, Oracle. All rights reserved. Cell Efficiency Ratio: 17 CellCLI> list cell ssss_admin online CellCLI> exit quitting
If you prefer to execute the command from command-line, we will have to use -e:
[celladmin@ssssss ~]$ cellcli -e list cell ssss_admin online [celladmin@ssssss ~]$
Or
[celladmin@sssss ~]$ cellcli -e help HELP [topic] Available Topics: ALTER ALTER ALERTHISTORY ALTER CELL ALTER CELLDISK ALTER GRIDDISK ALTER IORMPLAN ALTER LUN ALTER PHYSICALDISK ALTER QUARANTINE ALTER THRESHOLD ASSIGN KEY CALIBRATE CREATE CREATE CELL CREATE CELLDISK CREATE FLASHCACHE CREATE GRIDDISK CREATE KEY CREATE QUARANTINE CREATE THRESHOLD DESCRIBE DROP DROP ALERTHISTORY DROP CELL DROP CELLDISK DROP FLASHCACHE DROP GRIDDISK DROP QUARANTINE DROP THRESHOLD EXPORT CELLDISK IMPORT CELLDISK LIST LIST ACTIVEREQUEST LIST ALERTDEFINITION LIST ALERTHISTORY LIST CELL LIST CELLDISK LIST FLASHCACHE LIST FLASHCACHECONTENT LIST GRIDDISK LIST IORMPLAN LIST KEY LIST LUN LIST METRICCURRENT LIST METRICDEFINITION LIST METRICHISTORY LIST PHYSICALDISK LIST QUARANTINE LIST THRESHOLD SET SPOOL START [celladmin@ssssss ~]$
so, what about security?, CellCLI does rely on OS authentication.
There are three predefined users on the Exadata Storage Server:
1.- cellmonitor: This user can perform monitoring tasks.
2.- cellamin: This user can perform most of the administration task such CREATE, ALTER, MODIFY cell objects (cannot perform CALIBRATE).
3.- root: This user has super-user privileges.
I plan my next post to be about dcli and how to minimise work by executing the same CellCLI command on more than one Exadata Storage Server without having to log in on all of them.
As always, comments are welcome.
Smart scan on Exadata and direct path reads. 10 July 2011
Posted by David Alejo Marcos in Uncategorized.4 comments
Couple of days ago I was called to investigate a performance problem on one of our developement databases. People complained of slowness without much indication of what was slow.
Production, DR and UAT are running on a full rack Exadata machine while development runs on a single server.
The problem:
Database is slow. This is what I was given to start with. After monitoring the system I did notice several sessions where performing direct path reads, db file scattered read and some other events.
I am aware of some people having experienced problems with Oracle doing direct path reads instead of db scattered reads so I thought we might be having the same problem.
The solution:
To cut a long story short, direct path reads was not our problem, at least not this time. I managed to find a long running session consuming many of the database resources. The session was running for 7 days, details below:
SQL Text ------------------------------ UPDATE .... Global Information ------------------------------ Status : EXECUTING Duration : 613427s Global Stats ====================================================================================== | Elapsed | Cpu | IO | Concurrency | Buffer | Read | Read | Write | Write | | Time(s) | Time(s) | Waits(s) | Waits(s) | Gets | Reqs | Bytes | Reqs | Bytes | ====================================================================================== | 617320 | 64631 | 552688 | 0.82 | 619M | 52M | 6TB | 47M | 2TB | ====================================================================================== SQL Plan Monitoring Details (Plan Hash Value=2108859552) ============================================================================================================================================================================================ | Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows | Read | Read | Write | Write | Mem | Temp | Activity | Activity Detail | | | | | (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes | Reqs | Bytes | | | (%) | (# samples) | ============================================================================================================================================================================================ | 0 | UPDATE STATEMENT | | | | | | 1 | | | | | | | | | | | 1 | UPDATE | TABLE1 | | | | | 1 | | | | | | | | | | | 2 | FILTER | | | | 613362 | +63 | 1 | 0 | | | | | | | | | | 3 | NESTED LOOPS | | | | 613362 | +63 | 1 | 6905 | | | | | | | | | | 4 | NESTED LOOPS | | 9 | 27146 | 613362 | +63 | 1 | 108K | | | | | | | | | | 5 | SORT UNIQUE | | 839 | 24624 | 613405 | +20 | 1 | 6905 | 38 | 7MB | 1243 | 257MB | 10M | 333M | | | | 6 | TABLE ACCESS FULL | TABLE1 | 839 | 24624 | 44 | +20 | 1 | 5M | 1029 | 499MB | | | | | | | | -> 7 | INDEX RANGE SCAN | INDEX01 | 6 | 2 | 613364 | +63 | 6905 | 108K | 792 | 6MB | | | | | | | | -> 8 | TABLE ACCESS BY INDEX ROWID | TABLE1 | 1 | 6 | 613364 | +63 | 108K | 6905 | 48839 | 382MB | | | | | | | | 9 | FILTER | | | | 613334 | +91 | 6905 | 6904 | | | | | | | | | | 10 | HASH GROUP BY | | 798 | 24625 | 252603 | +65 | 6905 | 4G | 24M | 1TB | 47M | 488GB | 14M | 8M | | | | -> 11 | TABLE ACCESS FULL | TABLE1 | 839 | 24624 | 613364 | +63 | 6905 | 4G | 29M | 416GB | | | | | | | ============================================================================================================================================================================================
impressive, right?
The query was terminated and it is being reviewed on development but, why didn’t we have problems on prod?
It is my believe the reason is called smart scan on the Exadata storage
.
Lets have a look how the query looks on Prod:
1.- is it enabled?
PROD> show parameter cell_offload_processing NAME TYPE VALUE ----------------------- -------- ------------------------------ cell_offload_processing boolean TRUE
2.- Check the explain plan:
SQL> SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR(sql_id =>'4xxxxxxxj',format =>'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | | | | |
| 1 | UPDATE | TABLE1 | | | | |
|* 2 | HASH JOIN ANTI | | 9 | 3957K| 1689K| 4524K (0)|
| 3 | NESTED LOOPS | | | | | |
| 4 | NESTED LOOPS | | 9 | | | |
| 5 | SORT UNIQUE | | 839 | 2037K| 607K| 1810K (0)|
|* 6 | TABLE ACCESS STORAGE FULL| TABLE1 | 839 | | | |
|* 7 | INDEX RANGE SCAN | INDEX01 | 6 | | | |
|* 8 | TABLE ACCESS BY INDEX ROWID| TABLE1 | 1 | | | |
| 9 | VIEW | VW_NSO_1 | 839 | | | |
| 10 | SORT GROUP BY | | 839 | 2250K| 629K| 1999K (0)|
|* 11 | TABLE ACCESS STORAGE FULL | TABLE1 | 839 | | | |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("COL_ID"=TO_NUMBER("COL_ID"))
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
6 - storage(("COL01"=:B2 AND NVL("COL02",'N')='N' AND "COL04"=:B3
AND "COL03"=TO_NUMBER(:B1)))
filter(("COL01"=:B2 AND NVL("COL02",'N')='N' AND "COL04"=:B3
AND "COL03"=TO_NUMBER(:B1)))
7 - access("COL05"="COL05" AND "COL06"="COL06" AND
"COL07"="COL07")
8 - filter((NVL("COL02",'N')='N' AND "COL08"="COL08"))
11 - storage(("COL01"=:B2 AND NVL("COL02",'N')='N' AND "COL04"=:B3
AND "COL03"=TO_NUMBER(:B1)))
filter(("COL01"=:B2 AND NVL("COL02",'N')='N' AND "COL04"=:B3
AND "COL03"=TO_NUMBER(:B1)))
On the plan_table_output, we see storage, indicating smart scan was being used, being the runtime 73 seconds:
PROD> select sql_id, executions, elapsed_time/1000000 from gv$sql 2 where sql_id = '4xxxxxxxj'; SQL_ID EXECUTIONS ELAPSED_TIME/1000000 ------------- ---------- -------------------- '4xxxxxxxj' 1 73.166979
I must say, I am quite impressed with Exadata but, would it make it more difficult to spot bad queries?
As always, comments are welcome.
Check status voting disk. 6 July 2011
Posted by David Alejo Marcos in Exadata, Oracle 11.2, RAC.Tags: Exadata, Oracle 11.2, RAC
add a comment
This is a quick blog as to how to check the status of Voting Disks.
The problem:
You receive a call/email from you 1st line support with something similar to:
---- [cssd(9956)]CRS-1604:CSSD voting file is offline: o/xxx.xxx.xx.xx/SYSTEMDG_CD_05_SSSSSin; details at (:CSSNM00058:) in /apps/oracle/grid_11.2/log/sssss/cssd/ocssd.log. ----
The solution:
Easy to check using crsctl:
oracle$ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE a0a559213xxxxxxffa67c2df0fdc12 (o/xxx.xxx.xx.xx/SYSTEMDG_CD_05_SSSSSin) [SYSTEMDG] 2. ONLINE 121231203xxxxxxfc4523c5c34d900 (o/xxx.xxx.xx.xx/SYSTEMDG_CD_05_SSSSSin) [SYSTEMDG] 3. ONLINE a6b3c0281xxxxxxf3f6f9f1fd230ea (o/xxx.xxx.xx.xx/SYSTEMDG_CD_05_SSSSSin) [SYSTEMDG] Located 3 voting disk(s).
As always, comments are welcome.