Wednesday, September 27, 2006

ORA-29701 Cluster Manager

I had something strange occur today. A user query kept returning ORA-29701: unable to connect to Cluster Manager. Therefore, I searched MetaLink for this error. But, I couldn't find anything that applied to my situation (10gR2 RAC on HP-UX Itanium). This error only showed up on one of the nodes. The other nodes and instances were fine.

After stopping everything on the node, the ASM instance would not start and immediately barked at me with the same ORA-29701 error. At this point I asked someone else that has more experience with Oracle Clusterware than I do.

They checked it out and found that somehow the ASM /dev/rdsks and special files for the OCR and voting disks had changed ownership. Someone with root access must have run insf -e to reinstall the special files. Oh, great!

A sys admin had already created a shell script to change the ownership of the /dev/rdsks to oracle:dba and chmod them to 660. So, all we had to do was ask one of our sysadmins to run the script. They also had to manually chown root:oinstall /dev/voting and chown oracle:oinstall /dev/ocr.

So, if you get an ORA-29701 and can't figure it out, check the owner and permissions of your Oracle devices.

Sunday, September 10, 2006

Database Control RMAN Wizard Bug

Here is a bug that I found in our 10gR1 systems. This bug has to do with Enterprise Manager (EM) Database Control 10.1.0.5 and scheduling backups with the Recovery Manager (RMAN) wizard. Whenever you do a custom RMAN backup and choose to only backup certain tablespaces, the UNDO tablespace is never given as an option.

Now, without the undo tablespace, database recovery is not possible. However, the EM RMAN wizard will not allow you the option of adding the undo tablespace to the list of tablespaces to be selected for backup.

So, here is the workaround:

  • Modify the RMAN script that is created and manually include the undo tablespace in your list of tablespaces to be backed up. Then submit your job.

I will be creating some RMAN backups for a 10.2 database soon and will let you know if this bug is also in 10gR2.

Saturday, September 09, 2006

Old Dog New Tricks

With all of the assistance I've had lately, this old dog has learned some new tricks. Now, everybody I'm sure already knows these; however, in my career I've had to learn things by the seat of my pants and on the job in production.

tnsnames.ora changes do not require restart of the listener. Kind of cool. And may favorite is "lsnrctl reload". The reload resets the listener without stopping and starting it.

Friday, September 08, 2006

HP EVA8000 Autopath Tuning

Here is some information that might be useful to you. Following our RAC cluster setup and install, the EVA 8000 performance was very disappointing. First, our Oracle performance was disturbing. Then we did some more I/O benchmarks using "dd" and found that I/O performance was actually worse than the first baseline benchmark. So, this indicated a problem with the underlying storage and not the Oracle setup.

The UNIX administrators, HP, and my peers began looking into the problem. I wasn't totally engaged but want to post the information anyway. Props to my colleages. This is more system administration territory; however, as DBAs we need to know the impact of EVA tuning.

HP checked the load balancing policy on the nodes. It was set to “No Load Balancing” which greatly impacts performance. Now, how it was changed to "No Load Balancing" after the first I/O benchmarks is still a mystery.

To see the LUNs, issue the command “autopath display” as root. This will list all the LUNs and show the HP Autopath load balancing policy.

root@hostname:/ # autopath display
...
Load Balancing Policy : No Load Balancing
...

So, HP and my peers recommended that the HP Autopath load balancing policy be set to round robin for all LUNs. The autopath set_lbpolicy command sets the load balancing policy for the specified device path.

autopath set_lbpolicy <{policy name} {path}>
description: sets load balancing policy
usage: autopath set_lbpolicy <{policy name} {path}>
Policy name: The load balancing policy to set
Valid policies are
RR : Round Robin.
SST : Shortest Service Time.
SQL : Shortest Queue Length.
NLB/OFF : No load Balancing.
Path : Device Special File e.g./dev/dsk/c#t#d#

Example:
# autopath set_lbpolicy RR /dev/dsk/c0t0d0
The example above sets the policy to Round Robin.

Here is a little more information about what the policies mean:
  • RR : Round Robin. - I/O is routed through the paths of the active controller in round-robin order
  • SST : Shortest Service Time. - is a measurement against the average service time of the IOs on a path
  • SQL : Shortest Queue Length. - is a calculation on the device queue depth when it is on a certain path.
  • NLB/OFF : No load Balancing. - No consideration given to service times or queue lengths, typically impacts performance.