Steven Edwards' Blog: 2006

Tuesday, October 03, 2006

Opatch with RAC

I got to use Opatch for RAC today to install some one-off patches. That is a subject for later. Man, what a lot of bugs in 10.2. Opatch is cluster aware and will propagate the changes to the other nodes and relink Oracle on the other nodes. There is an issue however. You cannot have UNIX banners being used with ssh. This is the same issue that we ran into on the Clusterware install. So, we had to rename the /etc/issue.net in order to get Opatch to propagate the changes to the other nodes. Opatch actually works really well.

Wednesday, September 27, 2006

ORA-29701 Cluster Manager

I had something strange occur today. A user query kept returning ORA-29701: unable to connect to Cluster Manager. Therefore, I searched MetaLink for this error. But, I couldn't find anything that applied to my situation (10gR2 RAC on HP-UX Itanium). This error only showed up on one of the nodes. The other nodes and instances were fine.

After stopping everything on the node, the ASM instance would not start and immediately barked at me with the same ORA-29701 error. At this point I asked someone else that has more experience with Oracle Clusterware than I do.

They checked it out and found that somehow the ASM /dev/rdsks and special files for the OCR and voting disks had changed ownership. Someone with root access must have run insf -e to reinstall the special files. Oh, great!

A sys admin had already created a shell script to change the ownership of the /dev/rdsks to oracle:dba and chmod them to 660. So, all we had to do was ask one of our sysadmins to run the script. They also had to manually chown root:oinstall /dev/voting and chown oracle:oinstall /dev/ocr.

So, if you get an ORA-29701 and can't figure it out, check the owner and permissions of your Oracle devices.

Sunday, September 10, 2006

Database Control RMAN Wizard Bug

Here is a bug that I found in our 10gR1 systems. This bug has to do with Enterprise Manager (EM) Database Control 10.1.0.5 and scheduling backups with the Recovery Manager (RMAN) wizard. Whenever you do a custom RMAN backup and choose to only backup certain tablespaces, the UNDO tablespace is never given as an option.

Now, without the undo tablespace, database recovery is not possible. However, the EM RMAN wizard will not allow you the option of adding the undo tablespace to the list of tablespaces to be selected for backup.

So, here is the workaround:

Modify the RMAN script that is created and manually include the undo tablespace in your list of tablespaces to be backed up. Then submit your job.

I will be creating some RMAN backups for a 10.2 database soon and will let you know if this bug is also in 10gR2.

Saturday, September 09, 2006

Old Dog New Tricks

With all of the assistance I've had lately, this old dog has learned some new tricks. Now, everybody I'm sure already knows these; however, in my career I've had to learn things by the seat of my pants and on the job in production.

tnsnames.ora changes do not require restart of the listener. Kind of cool. And may favorite is "lsnrctl reload". The reload resets the listener without stopping and starting it.

Friday, September 08, 2006

HP EVA8000 Autopath Tuning

Here is some information that might be useful to you. Following our RAC cluster setup and install, the EVA 8000 performance was very disappointing. First, our Oracle performance was disturbing. Then we did some more I/O benchmarks using "dd" and found that I/O performance was actually worse than the first baseline benchmark. So, this indicated a problem with the underlying storage and not the Oracle setup.

The UNIX administrators, HP, and my peers began looking into the problem. I wasn't totally engaged but want to post the information anyway. Props to my colleages. This is more system administration territory; however, as DBAs we need to know the impact of EVA tuning.

HP checked the load balancing policy on the nodes. It was set to “No Load Balancing” which greatly impacts performance. Now, how it was changed to "No Load Balancing" after the first I/O benchmarks is still a mystery.

To see the LUNs, issue the command “autopath display” as root. This will list all the LUNs and show the HP Autopath load balancing policy.

root@hostname:/ # autopath display
...
Load Balancing Policy : No Load Balancing
...

So, HP and my peers recommended that the HP Autopath load balancing policy be set to round robin for all LUNs. The autopath set_lbpolicy command sets the load balancing policy for the specified device path.

autopath set_lbpolicy <{policy name} {path}>
description: sets load balancing policy
usage: autopath set_lbpolicy <{policy name} {path}>
Policy name: The load balancing policy to set
Valid policies are
RR : Round Robin.
SST : Shortest Service Time.
SQL : Shortest Queue Length.
NLB/OFF : No load Balancing.
Path : Device Special File e.g./dev/dsk/c#t#d#

Example:
# autopath set_lbpolicy RR /dev/dsk/c0t0d0
The example above sets the policy to Round Robin.

Here is a little more information about what the policies mean:

RR : Round Robin. - I/O is routed through the paths of the active controller in round-robin order
SST : Shortest Service Time. - is a measurement against the average service time of the IOs on a path
SQL : Shortest Queue Length. - is a calculation on the device queue depth when it is on a certain path.
NLB/OFF : No load Balancing. - No consideration given to service times or queue lengths, typically impacts performance.

Friday, August 25, 2006

ASM Instance Changes

Here are some changes that we had to make after using DBCA to create our ASM instances:

First, we had to increase the large pool to 100M. Next, we had to increase the number of processes. The number of processes should be the default (40) times the number of nodes in the Oracle cluster (40 * # of nodes)

Two MetaLink Notes - Bookmark

Here are two more MetaLink notes to bookmark:

368055.1 Deployment of very large databases (10TB to PB range) with Automatic Storage Management (ASM)
139272.1 HP-UX: Asynchronous i/o

Both of these notes, I found useful during our recent VLDB 10g RAC HP-UX Itanium install. A friend of mine at work found the second note and told me about it. Thanks.

Tuesday, August 22, 2006

We Have An Oracle Cluster

Now, that we have an Oracle cluster and RAC database. Here is a quick high-level touchbase on the RAC environment from the Oracle documentation:

As a DBA, after installation your tasks are to administer your RAC environment at three levels:

Instance Administration
Database Administration
Cluster Administration

For administering Real Application Clusters, use the following tools to perform administrative tasks in RAC:

Cluster Verification Utility (CVU)—Install and use CVU before you install RAC to ensure that your configuration meets the minimum RAC installation requirements. Also use the CVU for on-going administrative tasks, such as node addition and node deletion.
Enterprise Manager—Oracle recommends that you use Enterprise Manager to perform administrative tasks whenever feasible.
Task-specific GUIs such as the Database Configuration Assistant (DBCA) and the Virtual Internet Protocol Configuration Assistant (VIPCA)
Command-line tools such as SQL*Plus, Server Control (SRVCTL), the Oracle Clusterware command-line interface, and the Oracle Interface Configuration tool (OIFCFG)

I've got plenty of posts saved up and will enter them either tonight or soon. It has been crazy at work. Funny, how you learn so much more while in a storm than you do when things are calm. It is like that in life as well as at work. God has a way of tempering you in the storms that you face and bringing you thru them stronger and better than before. Man, I could preach on that especially with things going on in my personal life right now.

Tuesday, August 15, 2006

ORA-00600 [KGHALO4]

We got an ORA-00600 [KGHALO4] error today every time we tried to create the ASM instance with DBCA. We found that this is a bug in 10.2.0.1. See MetaLink Note: 340976.1.

Solutions:

Apply the one-off patch for bug 4414666 or set the _enable_NUMA_optimization=FALSE parameter in the init.ora as a workaround. For an ASM instance, two parameters are needed:
_enable_NUMA_optimization = FALSE _disable_instance_params_check = TRUE. Another solution is to apply the 10.2.0.2 patchset which fixes this problem.

Because this bug happens whenever you are trying to create the ASM Instance, it would be advisable to just apply the one-off patch for the bug and then install the 10.2.0.2 patchset as normal once everything is created and working.

Monday, August 14, 2006

OUI Trace

Here is how we traced the Oracle Universal Installer. Execute the following and redirect the output to a file:

./runInstaller -J-DTRACING.ENABLED=true -J-DTRACING.LEVEL=2

We used this method to review what the OUI was doing and determine that the OUI saw ServiceGuard (SG) and SG libraries on the cluster nodes. See my previous posts for more information.

HP Autopath and Oracle

We saw with the OCR and voting raw disks that we had to create a common device special filename (alias) that mapped to the multiple paths to the disks.

/dev/rdsk/ocr
/dev/rdsk/voting

It was all because of the HP Autopaths and virtual storage platform (EVA8000). Oracle Installer only allows entry of one path to the OCR and voting disks. Therefore, we had to create the special filenames in order to use OUI. mksf is the command. See HP portion in Note: 293819.1

We are now wondering about the creation of the ASM disk groups. Will HP autopathing cause us a problem? We should know soon.

Saturday, August 12, 2006

Cleanup Failed CRS Install on HP-UX Itanium

Since we had to do this several times, the following is how to cleanup after a failed CRS Install on HP-UX Itanium. (Refer to MetaLink Note: 239998.1). There were problems with the rootdelete.sh and rootdeinstall.sh scripts so we always had to cleanup the failed install manually:

srvctl stop nodeapps -n (we didn't have to do this because our failed installs never got this far).
As root:
rm /sbin/init.d/init.cssd
rm /sbin/init.d/init.crs
rm /sbin/init.d/init.crsd
rm /sbin/init.d/evmd
rm /sbin/rc2.d/K001init.crs
rm /sbin/rc2.d/K960init.crs
rm /sbin/rc3.d/K001init.crs
rm /sbin/rc3.d/K960init.crs
rm /sbin/rc3.d/S960init.crs
rm -Rf /var/opt/oracle/scls_scr
rm -Rf /var/opt/oracle/oprocd
rm /etc/inittab.crs
cp /etc/inittab.orig /etc/inittab
If they are not already down, kill the EVM, CRS, and CSS processes.
rm -Rf /var/tmp/.oracle
rm -Rf /tmp/.oracle
remove the ocr.loc file
rm -Rf /* CRS Install Location */
De-install the CRS home in the OUI
Clean out the OCR and voting files with dd commands Example:
dd if=/dev/zero of=/dev/rdsk/voting bs=8192 count=2560
dd if=/dev/zero of=/dev/rdsk/ocr bs=8192 count=12800
rm -Rf /app/oracle/oraInventory

Once those are done, you can restart the OUI install of Clusterware at the very beginning. Oracle has also just released a new cleanup utility for failed CRS installs here is the link:

http://download-west.oracle.com/otndocs/products/clustering/deinstall/clusterdeconfig.zip

The new script didn't work for us either. Of course, I didn't try it after our system admins removed ServiceGuard so it may work now.

Friday, August 11, 2006

Clusterware Install Tips HP-UX Itanium

OK, we finally have a working Oracle Cluster. Praise God! I wanted to add some of the new things that have been discovered since the last post.

Make sure if you are not going to use HP ServiceGuard on your RAC cluster, that all of ServiceGuard has been stopped and uninstalled. Don't leave any libraries laying around. We found out this the hard way, after spending over a week trying to figure out why the Oracle Installer was acting so crazy and bizarre.

Here are the symptoms: First, if the OUI does not give you the option to add the other nodes and you have to use a configuration file, this is a red flag that the OUI thinks that you are using some vendor cluster software (in this case HP ServiceGuard) instead of using Oracle's. Secondly, if you have some variables that are not assigned (see previous post) in the rootconfig script this indicates that it is not really trying to install rather it is trying to upgrade/update the OCR.

If for some reason, you get the clusterware services running on one of the nodes but it doesn't start on the others and locks up. It probably means that your removal of ServiceGuard was incomplete and left a few SG libraries laying around.

We found all of this out the hard way because HP installed and started ServiceGuard when they installed the HP 9000 Superdome!

Finally, here is an undocumented procedure for HP-UX Itanium Clusterware 10gR2 installation: Shutdown the VIP interface BEFORE beginning the install. If you don't, an error message will appear saying that the VIP interface is being used by another system. Then you have to shut the VIPs down before continuing the install. This is weird because the VIP must be up in order for cluvfy nodecon to work.

Friday, August 04, 2006

CVU Shared Storage Accessibility Check

We are almost ready to reinstall Clusterware. However our cluvfy comp ssa check is returning the following in the cluvfy trace file:

ERROR>/tmp/9999//bin/lsnodes: cannot get local node number

Although Cluster Verify Utility is an excellent tool for prerequisites, there is room for improvement. Since we did not run cluvfy as thoroughly on our first attempt, we did not encounter Bug 4714708 - Cvu Cannot See Shared Drives.

It turns out that CVU currently does not work with devices other than SCSI devices.

Thursday, August 03, 2006

Failed To Upgrade OCR Message

Now, we are getting a "Failed to upgrade Oracle Cluster Registry Configuration" message in the rootconfig shell script that is ran by root.sh at the end of the Clusterware installation. A friend of mine found most of the problem and described it as this:

"The rootconfig script has a few problems. The CRS_HOST_NAME_LIST and CRS_NODE_NAME_LIST are not populated. Script failed at this point:

if $CH/bin/ocrconfig -upgrade $CRS_ORACLE_OWNER $CRS_DBA_GROUP;
then$ECHO "Oracle Cluster Registry configuration upgraded successfully"
else$ECHO "Failed to upgrade Oracle Cluster Registry configuration"
exit 1
fi

libocr10.so is missing(running in debug mode)Commented this out and added environment variables and everything began to run. Script is trying to start init.crs but never starts the daemon."

So, once we talked to Oracle Support about this they suggested backing out the Clusterware install. Note 239998.1. Then re-running cluvfy and checking problems. In reviewing the CLUVFY output and installation documentation, we discovered missing OS patches and missing applications. HP C and C++ are not installed on all of the nodes.

We are canceling the re-install until all HP-UX Itanium patches (and those which are superseded) are installed. We are also waiting on a HP C and C++ install on the nodes which are missing them.

We are also asking Oracle Support to research whether or not GNU HP-UX Itanium C and C++ software can be substituted for the more costly HP licensed versions.

Wednesday, August 02, 2006

Clusterware Install Stopped at OCR Location

The Oracle Universal Installer (OUI) stopped while installing Clusterware with the following error :

"The location /dev/dsk/xxx, entered for the Oracle Cluster Registry (OCR) is not shared across all the nodes in the cluster. Specify a shared raw partition or cluster file system file that is visible by the same name on all nodes of the cluster."

This is caused because the /dev/dsk is a block device and OUI does not recognize or use block devices. So in order to get it to continue, we had to point to the /dev/rdsk/xxx raw character device instead.

You need to bind raw devices and use these raw devices within OUI. See MetaLink Note: 363995.1 for a little more detail.

We also had to do the same thing for the voting disk.

A Pattern is Developing...

I'm beginning to see a pattern. This matches what I've been told by other DBAs and Mike Ault during a RAC class. You spend days working on just the clusterware install before beginning the database install. Looks like Oracle would have these difficulties fixed especially by Release 2.

CLUVFY User Equivalence Failure

The runcluvfy.sh comp nodecon kept failing the User Equivalence Test. Turns out the cause was two different issues.

The first issue was with the path to ssh. To solve this issue we had to reference Oracle MetaLink Note 36598.1. Here is an excerpt from the Note:

"Generate a trace file using the cluvfy debugging environment variable SRVM_TRACE=TRUE as in Note 316817.1 and the resulting trace file will show something like:checkRemoteExecutionSetup:: Error checking user equivalence using Secured Shell'/usr/local/bin/ssh'; Fri May 12 15:38:18 CEST 2006. Cluvfy was looking in the wrong location for ssh. As it can be seen in the trace file, the location where the utility is searched is/usr/local/bin/, whilst it resides in /usr/bin.
To implement the solution, please execute the following steps:

add the symbolic links in /usr/local/bin pointing to the real ssh location, ie /usr/bin. Do the same for the scp command.
generate the ssh keys to all the cluster locations.
Run cluvfy again"

The second issue was with the banner. "ssh node date" would return a security banner in addition to the date. In order to get nodecon to work, I had to rename the /etc/issue.net (in HP-UX Itanium) to keep a banner from displaying. See MetaLink note: 338045.1. Of course, once we are finished with the install, I will rename it back.

Monday, July 31, 2006

ASM File Locations

From Mike Ault's book Oracle 10g Grid & Real Application Cluster, the following shows the files locatable on the ASM instance and files that have to reside on the file system:

On the ASM Instance:

Data Files
Redo Files
Control Files
Archive Log Files

On Raw Partitions:

Voting Disk
OCR File

On the Local File System:

Oracle Home Files
Clusterware Home Files
ASM Instance Home Files
Alert Log, Trace Files
Files for External Tables
utl_file_dir location

X11 Forwarding From SUDO

X11 forwarding is not working using PuTTY and ssh for users once you "become" them using SUDO. Here is the error message:

$ Xlib: connection to "xxx.xx.xx.xxx" refused by server
Xlib: PuTTY X11 proxy: wrong authentication protocol attempted
Error: Can't open display: xxx.xx.xx.xxx:xx.x

Below are the steps used to successfully transfer xauth information to another user (Oracle in this case)

Enable X11 forwarding on your terminal application and login as you (as stated above)
‘chmod 644 .Xauthority’ (needs to be done every time service account needs access, it will reset when you log out)
‘become service account x’
‘xauth merge ~username/.Xauthority’ (needs to be done every time service account access is needed)

Once you get a copy of the .Xauthority file to /home/oracle it should work.

Sunday, July 30, 2006

10gR2 RAC Installation Guide

Here is something interesting. The HP-UX Itanium platform doesn't have it's own configuration guide in the Oracle documentation. You have to use the Oracle Clusterware and Oracle Real Application Clusters Installation and Configuration Guide for hp HP-UX PA-RISC (64-Bit). However, the Itanium platform instructions are included in the document also.

Here is my advice. Before beginning the install of RAC read all of the Installation Guides specific to the release and OS that you will using. Don't just rely on the one installation guide. Read the Release Notes also. And read the Oracle Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide. Check for documentation updates.

I plan on following the installation guide step by step. Will create other posts for the sections of the install that are different for HP-UX Itanium or are not clear in the Installation Guide.

Here are some notes that I gathered from the RAC Installation Guides which I felt were noteworthy:

You must install Oracle Clusterware and Oracle Database in separate home directories. If you will use multiple Oracle Database homes with ASM, then you should install a separate Oracle Database home for ASM. You should create the listener in the Oracle Database Oracle home.
With Oracle Database 10g Release 2 (10.2) with RAC, CSS has been modified to allow you to configure CSS with multiple voting disks. In 10g Release 1 (10.1), you could configure only one voting disk. By enabling multiple voting disk configuration, the redundant voting disks allow you to configure a RAC database with multiple voting disks on independent shared physical disks. If you intend to use multiple voting disks managed by Oracle Clusterware, then you must have at least three disks to provide sufficient redundancy, and you must ensure that each voting disk is located on physically independent storage.
For OCR: Configure one disk if you have existing redundancy support. If you intend to use OCR mirroring managed by Oracle Clusterware, then you must have two OCR locations, and you must ensure that each OCR is located on physically independent storage.
Although you can specify a logical volume as a device in an ASM disk group, Oracle does not recommend their use. Because logical volume managers can hide the physical disk architecture, ASM may not operate effectively when logical volumes are specified as disk group devices. (Got to find out about this one because we are going to use logical volumes with ASM)
Cluster Verification Utility (CVU) is a tool that performs system checks. CVU is used to assist you with confirming that your system is properly configured for Oracle Clusterware and Oracle Real Application Clusters installation. CVU does not check kernel parameter settings. This issue is tracked with Oracle bug 4565046.

Oracle recommends that you use the following Oracle Database 10g features to simplify RAC database management:

Oracle Enterprise Manager—Use Enterprise Manager to administer your entire processing environment, not just the RAC database. Enterprise Manager lets you manage a RAC database with its instance targets, listener targets, host targets, and a cluster target, as well as ASM targets if you are using ASM storage for your database.
Automatic undo management—This feature automatically manages undo processing.
Automatic segment-space management—This feature automatically manages segment freelists and freelist groups.
Locally managed tablespaces—This feature enhances space management performance.

In addition to this being my first RAC install, this is also my first time to use ASM (Automatic Storage Management). Am I a glutton for punishment or what? New to RAC and ASM at the same time.

Friday, July 28, 2006

Setting up the VIPs with HP APA

Well, we had our first question - setting up the VIPs along with HP Auto Port Aggregation (APA). This is even before we attempt to install Oracle.

First some background, VIP stands for Virtual IP address. And for more on HP APA, see a previous post of mine. When installing 10g RAC you need at least a minimum of 3 network interfaces for each node in the RAC cluster.

A public interface for normal network communications to the node or partition.
A virtual (public) interface which will be used for failover in case the primary public interface fails. Also used for RAC managment.
A private interface for the cluster interconnect.

We have 4 software APA NICs (Network Interface Cards) and two regular Gb NICs on each node of the RAC cluster. So, here are our questions:

How do we use VIP with APA?
Do they have to be on the same network subnet?

As always, the answer to the first question can be found in a MetaLink document. (296874.1) Configuring the HP-UX Operating System for the Oracle 10g VIP. As stated, there are 2 ways of configuring HP-UX systems for network redundancy to be used for the Virtual IP:

Oracle VIP with MC/ServiceGuard configured networks only, via multiple physical interfaces on many redundant networks.
Oracle VIP with APA(i.e. NIC teaming) only via a single logical interface on many redundant networks.

HP Auto Port Aggregation (APA) is a software product that creates link aggregates, often called "trunks," which provide a logical grouping of two or more physical ports into a single "Fat-Pipe". The link aggregates can be active/active(APA aggregate) or active/standby(hot standby mode). An IP is configured only on the single logical interface (usually lan90X), and failure to a single NIC would be transparent to applications that are dependent on a specific interface name.

Auto Port Aggregation(APA) is a NIC teaming solution provided by HP. Although APA is not required when using MC/ServiceGuard (since MC/ServiceGuard has its own network redundancy solution), it is worthwhile to note that a NIC teaming solution can provide highly available VIPs. APA will configure 2 physical NIC's to 1 logical NIC interface. It is usually configured to be lan90x. All that needs to be done for the VIP is to configure it on that 1 logical NIC, similar to what is done on a single NIC configuration.

The answer to the second question is a simple "Yes". The public interface and the virtual interface must both be on the same subnet.

Thursday, July 27, 2006

10gR2 RAC Install -- Soon

Soon, I will get to install 10gR2 RAC for the first time. I will be working with another DBA who has RAC experience. The install will be on 3 nodes with HP-UX Itanium operating systems. Of course the first thing to install is the Oracle Clusterware. With Oracle Database 10g Release 2 (10.2), Cluster Ready Services, or CRS, is now called Oracle Clusterware.

With the large number of bugs in 10gR2 RAC, I expect some real challenges and multiple patches. There should be lots of entries on this blog to document the "fun".

Wednesday, July 26, 2006

ORA-12516 or TNS-12516

I found an error that is new to me: ORA-12516 TNS:listener could not find instance with matching protocol stack

Every new remote connection into the database from GUI tools like TOAD and SQL Developer received this error. The listener was blocking connections. The problem went away on its own after the server and Oracle became less busy. The server was being slammed at the time of the errors.

(A friend of mine calls GUI tools point-click and drool environments)

After further research and a Service Request with Oracle Support, this is what I found. The error can happen when resources are low. Apparently, when resources are at a premium the listener can block new connections and return this ORA-12516. By increasing the initialization parameter PROCESSES, I hope to keep this from happening again.

Kind of a weird error and response from Oracle whenever resources are low. Oh well, strange error messages keep us gainfully employed.

See Oracle Note: 240710.1 on MetaLink for more information.

Monday, July 17, 2006

Security Checklist

Ok, I admit it. I'm stuck on Audit and Security right now. But, we as oracle DBAs are being bombarded by requirements from everybody because of the malicious attacks by hackers and other countries on our technology infrastructure. During my last audit, an auditor wanted to see my security checklist. So, I gave it to them. In order to continue the theme of this blog in providing help, here is one of my favorite security checklists:

http://www.oracle.com/technology/deploy/security/pdf/twp_security_checklist_db_database.pdf

Again, another white paper from Oracle. Use this checklist as a basis for your own and in accordance with your company's security requirements. Don't you just love audits and security?

How often do you apply the quarterly security patches? Did you know that each patchset released (e.g. 10.1.0.5) also incorporates the last security patch available during that time?

Food for thought and to chew on.

Saturday, July 15, 2006

OS Authentication

I found an excellent SECURITY resource white paper at Oracle Technology Network. It is called "Project Lockdown" by Oracle Ace Arup Nanda. Here is the link:

http://www.oracle.com/technology/pub/articles/project_lockdown/index.html

In particular, I was interested in what Mr Nanda had to say about OS Authentication because it is becoming more of a requirement in companies to keep users from imbeding user and password information in UNIX and SQL applications.

I used to think that OS Authentication removed an additional security control by not forcing an Oracle login and password. However, with the more secure Operating Systems (particularly the big three Sun, HP, and IBM), it becomes a valid option. If your OS is not rock solid and very secure, then I would definitely not use it. If you do use it, then remote OS authentication should be disabled. Disabling remote OS authentication should be a requirement.

This article is definitely required reading. All the procedures in the article should be implemented for the systems you manage. I plan on following these procedures and using this article in my next ET audit.

Thank-you Arup Nanda

Friday, July 14, 2006

9iR2 Datafile Queries

Found that I had some 9iR2 tablespaces which the space was not being evenly distributed across all of the datafiles. Some datafiles had over 50% space utilized and others had < 20% utilized. Seems to be related to the use of uniform extents. However, I haven't proved it yet. Strange how the data is not evenly distributed across the datafiles. Used 9i OEM to catch it. Didn't have any SQL but found the following. This is for those that prefer not to use the point and click GUI approach.

select substr(a.file_name,1,50) as file_name,
substr(a.file_id,1,4) as file_id,
a.bytes as allocated,
nvl(b.free,0) as free,
a.bytes-nvl(b.free,0) as used
from dba_data_files a,
( select file_id, sum(bytes) free
from dba_free_space
group by file_id ) b
where a.file_id = b.file_id (+)
and a.tablespace_name = 'TABLESPACENAME'
order by 1;

Here is a query to show the contents of a particular datafile. The input is the datafile path and name:

REM ------------------------------------------------------------------------
REM REQUIREMENTS:
REM SELECT on DBA_EXTENTS and DBA_DATA_FILES
REM ------------------------------------------------------------------------
REM AUTHOR:
REM Cary Millsap, Oracle Corporation
REM Copyright (c) 1991 by Oracle Corporation
REM ------------------------------------------------------------------------
REM PURPOSE:
REM The following script prints a record of each ORACLE database
REM segment (table, index, cluster, rollback segment, temporary
REM segment or cache segment) physically stored in a datafile.
REM This information is useful to determine the contents of a
REM database file identified as particularly hot in a SQL*DBA
REM monitor of ORACLE file I/O.
REM ------------------------------------------------------------------------
REM DISCLAIMER:
REM This script is provided for educational purposes only. It is NOT
REM supported by Oracle World Wide Technical Support.
REM The script has been tested and appears to work as intended.
REM You should always run new scripts on a test instance initially.
REM
REM ------------------------------------------------------------------------
REM Main text of script follows:
set verify off
define fname = &&1

ttitle left 'Database Segments in File &fname' skip 2

col stype format a12 justify c heading 'Segment Type'
col sname format a66 justify c heading 'Segment Name'

break on stype skip 1

select
segment_type stype,
owner'.'segment_name sname
from
dba_extents ext,
dba_data_files fil
where
fil.file_name = '&fname'
and
ext.file_id = fil.file_id
order by
segment_type,
segment_name
/

undef fname
set verify on

Using tnsping

tnsping is useful in determining whether your Orace NET is working without logging in to the remote database. Here are some Oracle notes on it:

The TNS ping utility (usually named tnsping.exe) can be used to determine whether or not a service on a TNS network can be successfully reached. The service being contacted is an Oracle Listener. If tnsping successfully pings, it will display an estimate of the round trip time to reach the service. If it fails, it will display a message describing the error that occurred. This allows you to see the network error that is occurring without the overhead of a database connection. Tnsping is typically invoked on the command line as follows: tnsping service count

Even though tnsping shows time estimates, it should not be used to test network performance. A very good Oracle note on Oracle NET performance is 67983.1

Tuesday, July 11, 2006

Encrypting Oracle Net Traffic

I've gotten some questions about encryption of the Oracle Net data thru the network and thought that I would post some information. The following contains information that I got from a conversation with Oracle Support.

Without encryption, (using the Oracle Advanced Security Option OAS or network encryption) the data leaving the server or client via Oracle Net will be transmitted in the clear. And, although switched and routed networks in the industry today minimize eavesdropping on broadcast data, the possibility of eavesdropping still exists without encryption. Everything beyond the local zone, data center and DMZ (or secured zone) to the outside should be encrypted by your routers. Additionally, physical security prevents entry to the secured zones.

These days, to actually sniff packets from a routed/switched network, a network analyst needs to shadow a port in a router to even hookup a packet capturing tool like a Network General (<====OLD COMPANY) Sniffer. However, it is possible to shadow/mirror a router port at any router along the path of traffic and reach some clear text data from within the packets as long as the data is not encrypted using the OAS and as long as another method of encryption has not yet been employed along the traffic path.

Physical security and Human Resources screening are more important to companies than worrying about Oracle Net traffic to the listener port.

Bottom line:

I can’t imagine the expertise required to find and enter a secured zone network much less creating a shadow port. Sounds like something from Mission Impossible to me. If you have some network paths which are not encrypted, then install Oracle Advanced Security (OAS) which comes free with the Oracle Enterprise Edition software.

Back From Vacation

Well, I'm back in the saddle now. Been on vacation and just been too busy with work to keep my blog up-to-date. Went on vacation at Cape San Blas in Florida. I would highly recommend it. The beach was not crowded at all and was beautiful. What a great way to clear out the DBA cobwebs by enjoying the Gulf of Mexico beach. Now for some more posts...

Monday, June 19, 2006

Moving Objects To Another Tablespace

We had some tablespaces that were full and I moved some tables to another tablespace and thought it would be neat to document it here. Moving objects between tablespaces can be useful whenever you need to reorganize tables. We've seen where moving objects from one tablespace to another and then back again helps keep the number of extents down. Also, whenever a power user puts data in a tablespace they shouldn't have, then I can move it back to where it should be. If you decide to do some tablespace consolidatation, reorganization or other changes, moving the objects is also useful. Beware there are some limitations to the alter table move tablespace mainly around data types. Please consult the Oracle documentation. I will document some SQL examples below moving a table, an index, a partition, and a sub-partition. If you need to move lots of objects, then you will want to write dynamic SQL which pulls information from the DBA views.

alter table OWNER.TABLE_NAME move tablespace NEW_TBSP parallel 8 nologging;

alter index OWNER.INDEX_NAME rebuild tablespace NEW_TBSP online;

alter table OWNER.TABLE_NAME move partition XX tablespace NEW_TBSP parallel 16;

alter table OWNER.TABLE_NAME move subpartition SUB_PART_NAME tablespace NEW_TBSP parallel 8;

Note: The degree of parallelism above indicates the degree used to do the move not the default degree of parallelism for the object.

Tuesday, June 13, 2006

Aggregate NIC Ports For Better Performance and Redundancy

We are going to aggregate Network Interface Card (NIC) 1 GB ports in order to get a faster connection between servers. A better aggregate throughput is just one of the advantages. Another advantage is redundancy. If one interface goes down, the other interfaces in the aggregate will still work. This is very similar to bonding NICs. We will be using HP Auto Port Aggregation (APA) software in combination with the Cisco PAgP protocol.

From the HP document APA Support Guide at http://www.docs.hp.com :

"HP Auto Port Aggregation (APA) is a software product that creates link aggregates, often called 'trunks,' which provide a logical grouping of two or more physical ports into a single 'fat pipe.' This port arrangement provides more data bandwidth than would otherwise be available. Two additional features are automatic link failure detection and recovery; it also offers optional support for load balancing of network traffic across all of the links in the aggregation. This enables you to build large bandwidth "logical" links into the server that are highly available and completely transparent to the client and server applications."

Now, how does this help our Oracle configurations? Simple, I will use the aggregated ports as the host for a database link between servers. This way, we will be able to move data faster between Oracle databases.

Our plan is to aggregate 4 ports which will give us the capability to transfer between 3.7 and 3.9 GBbit/sec. APA supports up to 8 aggregated ports. When you go over 2 aggregated ports, you begin to see diminishing returns which is why we will not get a total throughput of 4 GBit/sec.

The Da Vinci Code

For anyone that believes the book "The Da Vinci Code" contains anything remotely true, please consider the following links:

http://www.enrichmentjournal.ag.org

http://www.enrichmentjournal.ag.org/top/DaVinciCode_article.cfm

By the way, I did read the book and it was FICTION.

Friday, June 09, 2006

Importance of Table Statistics

The importance of table statistics was driven home to me today. I've always known this, but sometimes you forget how important it is to keep fresh statistics on your objects. We are using an optimization mode of choose which resolves to the cost based optimizer. There was a large join operation that usually takes 2-3 hours which was finishing in 8 hours. So, we took a few minutes trying to figure out why the increase in runtime. The problem with analyzing this change was with the current load on the server. We were hammering the box. Our first thought was the server was overloaded which caused the job to run longer than normal.

Then I had an epiphany from God. It could be stale table statistics for both or one of the tables in the join condition. So this is what I did:

ANALYZE TABLE OWNER.TABLENAME DELETE STATISTICS;

BEGIN
DBMS_STATS.GATHER_TABLE_STATS ('OWNER', 'TABLENAME', NULL, 0.5);
END;
/

I did this for both tables in the join. Now, the runtime for the job is back to 2 hours. Now, you more experienced DBAs will comment about using DBMS_STATS to automate statistics gathering. However, we always assumed our processing cycle and the size of our databases ruled out the automatic statistics gathering option.

Yes, I do need to visit the automatic statistics gathering options available in 10g. It has been quite a few years since I looked at gathering object statistics. Expect this to be another post later on ;-)

Saturday, June 03, 2006

Don't Update Compressed Objects

We created a huge table (over 1.7 billion rows) that was hash partitioned with each partition compressed that had horrible performance during an update. Here is an explanation as to why updating a compressed table is not a good idea.

Oracle table compression is only useful for tables where data is inserted and not updated. Upon update the row is uncompressed and not re-compressed. Compression occurs when data is inserted with a bulk (direct-path) insert operation. A table can consist of compressed and uncompressed blocks transparently. Any DML operation can be applied to a table storing compressed blocks. However, conventional DML operations cause records to be stored uncompressed after the operations and the operations themselves are subject to a small performance overhead due to the nature of the table compression.

Bottom line: Consider using table compression when your data is mostly read only. Do not use table compression for tables that are updated frequently. UPDATE operations are 10-20% slower for compressed tables on average, mainly due to some complex optimizations that have been implemented for uncompressed tables, and not yet implemented for compressed tables.

In our case the update to just one of these huge partitions was running over 12 hours. So, the update was actually never finishing. IMHO, don't compress any table that may be updated later.

Friday, June 02, 2006

Using Tables in Views With a DBLink

I have dealt a lot this week with views. We were trying to create a view from a table on a distributed database joined to a table on the local database. The view creation failed and I assumed it was because we were trying to create a view using a dblink. The error message received during the view creation was ORA-01031 insufficient privileges. I was wrong.

Turns out the problem was in the way the owner of the view was given select privileges to the underlying tables. See document 271587.1 on MetaLink - "Cannot create a view on a table granted via a role". So because the owner didn't have select access to the underlying tables explicitly granted to the view creator and owner, they couldn't create the view. Once select access was granted to the user, the view creation succeeded.

So, the view creation worked. However, the users of the view kept getting an error saying table not found. They didn't have select access to the table that was used to create the view on the remote database (across the dblink). We couldn't grant select on the remote table because the user community couldn't legally see all of the data on the remote table. Some users had direct access to the remote database.

To create a view, you must meet the following requirements (From Oracle 9i Documentation):

To create a view in your schema, you must have the CREATE VIEW privilege. To create a view in another user's schema, you must have the CREATE ANY VIEW system privilege. You can acquire these privileges explicitly or through a role.
The owner of the view (whether it is you or another user) must have been explicitly granted privileges to access all objects referenced in the view definition. The owner cannot have obtained these privileges through roles. Also, the functionality of the view is dependent on the privileges of the view's owner. For example, if the owner of the view has only the INSERT privilege for Scott's emp table, the view can only be used to insert new rows into the emp table, not to SELECT, UPDATE, or DELETE rows.
If the owner of the view intends to grant access to the view to other users, the owner must have received the object privileges to the base objects with the GRANT OPTION or the system privileges with the ADMIN OPTION.

I think the problem was the third one. The owner of the view did not have object privs with the GRANT OPTION or ADMIN OPTION. However, I never got to test it.

Looks like this was a combination problem with views that use DBLINKS to get remote data and permissions. So, we decided to just get around it, by placing the remote table on the local database taking the dblink out of the equation. I will let you know if this worked.

Wednesday, May 31, 2006

Hang Analyze

I got an email from a user about incredibly slow response times and never ending queries. When I attempted to check for users holding locks, the query would lock up and not run. So, I decided to run a hang analyze in order to get some trace dumps for Oracle Support. Here is how:

Login as sysdba

oradebug hanganalyze 3;

wait 2 minutes and note the location of the trace file

oradebug hanganalyze 3;

Start another sqlplus session as sysdba

ALTER SESSION SET EVENTS '10998 trace name context forever, level 1';

alter session set events 'immediate trace name SYSTEMSTATE level 10';

After generating the dumps, used the UNIX top command to look for Oracle processes that were hung. Found an Oracle user process that was using 99% CPU and all of the other Oracle processes were "waiting". So, I decided to kill the process. First I did ps -ef pipe to grep 99999 (where 99999 is the PID) to see what this process was. Then I determined that the process could be safely killed. Next, I killed it and the PID (Process Id) that was associated with the rogue process. [su - oracle followed by kill -9 99999] Once the PID was killed, all other Oracle processes began "running" and the crisis was over. My next step is to open a SR with Oracle Support and upload the trace files.

Check out the comments for more instructions by me.

Tuesday, May 30, 2006

Grouping Objects Into Tablespaces

We need to revisit our tablespaces for migration to a new set of hardware. So, I am thinking about how best to carve up our space into tablespaces. We have let our tablespaces and objects get out of hand. With applications and users putting data everywhere. For our new system, I plan on enforcing tablespace quotas. However, we still need to rethink how to logically divide up the Oracle database into tablespaces. Of course, the required tablespaces will be present: SYSTEM, TEMP, UNDOTBS, SYSAUX, etc. I prefer to keep everything system managed. No more manual extent and FREESP management for me.

The new system is a 10g data warehouse/DSS hybrid. So, tables with similar characteristics will be grouped into a tablespace. For example, all tables that are read-only could be grouped into a single, read-only tablespace. Tables with random I/O patterns could also be grouped together; all small tables should be grouped together. Segregate Indexes in separate tablespaces away from their tables. Partition really large table and indexes into separate tablespaces. Keep working or intermediate objects that will be dropped following a processing cycle in separate tablespaces. These are just a few examples.

Now how large is really large? For my databases, any tables over 100GB would be considered really large. Of course, what constitutes really large in your environment will depend upon the underlying hardware and applications of your database server.

Saturday, May 27, 2006

My First Blog

This is my first blog and first post. If you have any suggestions, please leave a comment. This blog will mainly deal with Oracle DBA topics. However, other topics will be covered as well. Work keeps me so busy that until now didn't think I needed a blog. Then, I attended a class by Mike Ault on RAC and saw his blog. Also, one of my best friends has his own DBA blog (SAMDBA). Thought it was a good idea because Mike said that it is useful for him to leave things on his blog that he can access from anywhere in the world. It would also be awesome to have feedback from other DBAs and provide information that can help other DBAs. I have a ton of notes that I keep and thought it would be better to just keep things that I use often in a blog. The tough part will be to not disclose any proprietary information from my real job. Again, if you have any questions or comments please leave them.