Steven Edwards' Blog: 2009

Tuesday, May 12, 2009

VIP Address Fun

During a recent RAC install, we were given an IP address for our VIP to be created at install. However, the IP address we got from network was a "private" address and not a "public" address. During the Clusterware install the vipca failed complaining that the interface is not public. As per RFC 1918 networks in the following ranges are considered private. For more details please review http://tools.ietf.org/html/rfc1918

10.0.0.0 - 10.255.255.255 (10/8 prefix)
172.16.0.0 - 172.31.255.255 (172.16/12 prefix
192.168.0.0 - 192.168.255.255 (192.168/16 prefix

Here is another link: http://www.faqs.org/rfcs/rfc1918.html

When the vipca silent installation fails because of the address range, you have to run it manually or issue the command:

$ORA_CRS_HOME/bin/srvctl add nodeapps -n crmnode1 -o /ora/ora10 -A 1.2.3.4/255.255.255.0/lan0

See Note 316583.1 VIPCA FAILS COMPLAINING THAT INTERFACE IS NOT PUBLIC for more information.

Just follow the instructions in the note and you will be fine. However, when I first followed the instructions of the note, I couldn't get the add nodeapps for the VIP to work. Turns out the note did not have the correct syntax. I pointed it out to Oracle Support and the author of the note updated the syntax. So, it is always good to verify syntax from a separate source if you have problems with Oracle documents.

Also, the cluvfy will indicate the following:

WARNING:
Could not find a suitable set of interfaces for VIPs.
Result: Node connectivity check failed.

Even though everything is OK, you are just using a non-standard private VIP.

Sunday, May 03, 2009

Legacy Import Tricks

This blog entry is for those unfortunate to be stuck with the old legacy exp/imp instead of the expdp/impdp (Datapump).

We are importing a ton of data and here are some import tricks that are documented in Oracle MetaLink doc id 93763.1. The ones I'm listing here only apply to imports from direct exports. We had already done the other things in the note.

1. Increase the BUFFER size of the import
2. Create some huge redo log files to minimize the LGWR switches.

It was amazing. This MetaLink note actually works :-)

Cluvfy Open File Descriptors Failed

If your Cluvfy OS check fails for the following:

Check: Hard resource limit for "open file descriptors"
Node Name Available Required Comment
------------ ------------------------ ------------------------ ----------
coronet 4096 65536 failed
vail 4096 65536 failed
Result: Hard resource limit check failed for "open file descriptors".

Increase these kernel parameters in HP-UX ia64:

nfile=131072
maxfiles_lim=65536

Use kctune nfile and kctune maxfiles_lim to verify the correct values after the UNIX administrator corrects them.

More HP-UX ia64 VIP Fun

Here is a tip which will save you a headache if you are installing Clusterware on a new server.
Before you begin a new Clusterware install, make sure when you receive your servers from the UNIX admins that the VIPs are not already configured and active. However, you should have the host and IP for the Virtual IP in the domain server and local hosts file.

Do netstat -inw at a UNIX prompt to see the interfaces. You should not see the VIP interface.

If it is configured or active, you can unplumb it if it is NOT aggregated (e.g. APA):

ifconfig lan0:1 unplumb
or if it is APA:
ifconfig lan900:1 0.0.0.0

By setting the IP address to 0.0.0.0 it disables the secondary interface.
/etc/rc.config.d/netconf file has the interfaces which are started at boot up. They can be commented out to remove an interface following a reboot.

Friday, April 24, 2009

HP-UX 11.31 IA64 Memory / Filecache

Following this last install, we noticed that our HP-UX 11.31 IA64 servers were using a lot of system memory. Over half of the available memory was being eaten up by sys memory and file cache. We determined that the default settings for file cache were way to high. So, we made some adjustments.

The filecache_min and filecache_max kernel parameters are the ones to check. These should be tuned to leave more memory for Oracle. You can also mount the file system with direct IO which bypasses file buffer cache. If you are using file system for your Oracle datafiles then you want those mount points mounted with direct IO.

So, we solved it by setting constant values for filecache_min and max instead of taking the default %. By lowering them significantly, we were able to give more memory back to Oracle and the other applications.

We still have quite a bit of sys memory being utilized. However, the filecache issue is solved. I'll update this post if we figure out why so much sys memory is utilized. Or, it could be "just the way it is".

Sunday, April 19, 2009

Bug 7171446 - NUMA Issues from 10.2.0.4 Patchset

Here is a NUMA bug introduced in 10.2.0.4. If you are running HP-UX IA64 and your system starts locking up or you find a number of unexpected problems such as skewed CPU usage and ORA-600 errors. Here are some details you should review in note: 7171446.8 on MetaLink.

Unless the system has been specifically set up and tuned for NUMA at both OS and database level then disable Oracle NUMA optimizations by setting the following in the pfile / spfile / init.ora used to start the instances:

_enable_NUMA_optimization=FALSE
_db_block_numa=1

At some point, Oracle should release a patch for bug 7171446. Once it is released, we will install the patch and remove the hidden parameters from the parameters.

CRS Log Directory Permissions

During this last RAC install, we ran into an issue where the crs home log directory for the one of the nodes had the wrong owner, group and permissions. So, CRS couldn't log an issue that it was having.

Here is a hint, if you are having CRS issues, always first check to make sure all of the directories exist for the CRS processes to create log files.

What was totally crazy, the other node owner, group and permissions were right. Don't know if the owner, etc. got goofed up during the patchset install for 10.2.0.4. or if it was another Clusterware merge patch. Never did figure out the why of it.

11.1 CLUVFY HP-UX ia64 Issues

For our latest install of 10.2 RAC on HP-UX 11.31 Itanium 64 bit, we used the 11g version of Cluster Verify (cluvfy). Here are the issues that we had. First, it would complain about the MTU values being different between the private and public interfaces.

Different MTU values used across network interface(s).

These different values are necessary because we are using Infiniband as the private interconnect and you want the larger Maximum Transmission Unit (MTU) value for the more robust interconnect. The public interface is a standard gigabit network so a lower value makes sense. So, we basically ignored that error because changing the Infiniband to a lower MTU value is not practical just to get a clean cluvfy before installing the Clusterware. For more info on MTU and Oracle RAC see MetaLink note: 341788.1

This is a known bug discussed in metalink note 758102.1. Root cause is BUG
7493420 fixed in 11.2. This configuration is valid since the interfaces across nodes have the same MTU. As long as the interfaces across the nodes have the same MTU, you are good to go.

The next issue had to do with shared storage. With HP-UX ia64 11.31, we created shared raw ASM LUNs then created aliases to those LUNs for the ASM diskstring. This storage is shared between the two nodes using EVA8000 storage. Cluvfy does not recognize that shared storage is available and it is working correctly. The failure message you get can be ignored. Here is the message:

Shared storage check failed on nodes "xxxxx"

In the known limitations section of the cluvfy readme, it clearly states the following:

"Sharedness check of SCSI disks is currently not supported."

If these are SCSI disks, then this error is expected as cluvfy cannot handle this check. As long as these disks can be seen from both nodes and has the correct permissions and ownership, ASM should install/work fine.

As long as your storage is working correctly, you can ignore the shared storage check because cluvfy is not able to verify multipath / autopath type software like that built in to HP-UX 11.31 using virtual disk devices on EVA8000 storage.

HP-UX Async IO

After we installed a 10.2.0.4 RAC database on an HP-UX Itanium 64-bit platform, we noticed some errors related to asynchronous IO in the database trace files. Here is the message:

Ioctl ASYNC_CONFIG error, errno = 1

After further analysis on MetaLink, and assistance from Support, we determined that asynch IO was not configured. The following are the steps that we did as root to resolve the issue:

created /etc/privgroup. Added the following entries in the file:
dba RTPRIO RTSCHED MLOCK
oinstall RTPRIO RTSCHED MLOCK
/usr/sbin/setprivgrp -f /etc/privgroup
getprivgrp dba
getprivgroup oinstall
cd /dev/async
chown oracle:dba async
chmod 660 async

This was an interesting issue because of the Oracle 10gR2 documentation. The Oracle Clusterware and Oracle Real Application Clusters Installation Guide for HP-UX doesn't include these procedures. However, the Administrator's Reference for UNIX-Based Operating Systems does in an appendix for HP-UX.

We just had to say "Isn't that interesting..."