Steven Edwards' thoughts on various topics, Oracle related and not. Note: I reserve the right to delete comments that are not contributing to the overall theme of the Blog or are insulting or demeaning to anyone.
Friday, August 25, 2006
ASM Instance Changes
First, we had to increase the large pool to 100M. Next, we had to increase the number of processes. The number of processes should be the default (40) times the number of nodes in the Oracle cluster (40 * # of nodes)
Two MetaLink Notes - Bookmark
- 368055.1 Deployment of very large databases (10TB to PB range) with Automatic Storage Management (ASM)
- 139272.1 HP-UX: Asynchronous i/o
Both of these notes, I found useful during our recent VLDB 10g RAC HP-UX Itanium install. A friend of mine at work found the second note and told me about it. Thanks.
Tuesday, August 22, 2006
We Have An Oracle Cluster
As a DBA, after installation your tasks are to administer your RAC environment at three levels:
- Instance Administration
- Database Administration
- Cluster Administration
For administering Real Application Clusters, use the following tools to perform administrative tasks in RAC:
- Cluster Verification Utility (CVU)—Install and use CVU before you install RAC to ensure that your configuration meets the minimum RAC installation requirements. Also use the CVU for on-going administrative tasks, such as node addition and node deletion.
- Enterprise Manager—Oracle recommends that you use Enterprise Manager to perform administrative tasks whenever feasible.
- Task-specific GUIs such as the Database Configuration Assistant (DBCA) and the Virtual Internet Protocol Configuration Assistant (VIPCA)
- Command-line tools such as SQL*Plus, Server Control (SRVCTL), the Oracle Clusterware command-line interface, and the Oracle Interface Configuration tool (OIFCFG)
I've got plenty of posts saved up and will enter them either tonight or soon. It has been crazy at work. Funny, how you learn so much more while in a storm than you do when things are calm. It is like that in life as well as at work. God has a way of tempering you in the storms that you face and bringing you thru them stronger and better than before. Man, I could preach on that especially with things going on in my personal life right now.
Tuesday, August 15, 2006
ORA-00600 [KGHALO4]
Solutions:
Apply the one-off patch for bug 4414666 or set the _enable_NUMA_optimization=FALSE parameter in the init.ora as a workaround. For an ASM instance, two parameters are needed:
_enable_NUMA_optimization = FALSE _disable_instance_params_check = TRUE. Another solution is to apply the 10.2.0.2 patchset which fixes this problem.
Because this bug happens whenever you are trying to create the ASM Instance, it would be advisable to just apply the one-off patch for the bug and then install the 10.2.0.2 patchset as normal once everything is created and working.
Monday, August 14, 2006
OUI Trace
./runInstaller -J-DTRACING.ENABLED=true -J-DTRACING.LEVEL=2
We used this method to review what the OUI was doing and determine that the OUI saw ServiceGuard (SG) and SG libraries on the cluster nodes. See my previous posts for more information.
HP Autopath and Oracle
/dev/rdsk/ocr
/dev/rdsk/voting
It was all because of the HP Autopaths and virtual storage platform (EVA8000). Oracle Installer only allows entry of one path to the OCR and voting disks. Therefore, we had to create the special filenames in order to use OUI. mksf is the command. See HP portion in Note: 293819.1
We are now wondering about the creation of the ASM disk groups. Will HP autopathing cause us a problem? We should know soon.
Saturday, August 12, 2006
Cleanup Failed CRS Install on HP-UX Itanium
- srvctl stop nodeapps -n
(we didn't have to do this because our failed installs never got this far). - As root:
- rm /sbin/init.d/init.cssd
- rm /sbin/init.d/init.crs
- rm /sbin/init.d/init.crsd
- rm /sbin/init.d/evmd
- rm /sbin/rc2.d/K001init.crs
- rm /sbin/rc2.d/K960init.crs
- rm /sbin/rc3.d/K001init.crs
- rm /sbin/rc3.d/K960init.crs
- rm /sbin/rc3.d/S960init.crs
- rm -Rf /var/opt/oracle/scls_scr
- rm -Rf /var/opt/oracle/oprocd
- rm /etc/inittab.crs
- cp /etc/inittab.orig /etc/inittab
- If they are not already down, kill the EVM, CRS, and CSS processes.
- rm -Rf /var/tmp/.oracle
- rm -Rf /tmp/.oracle
- remove the ocr.loc file
- rm -Rf /* CRS Install Location */
- De-install the CRS home in the OUI
- Clean out the OCR and voting files with dd commands Example:
- dd if=/dev/zero of=/dev/rdsk/voting bs=8192 count=2560
- dd if=/dev/zero of=/dev/rdsk/ocr bs=8192 count=12800
- rm -Rf /app/oracle/oraInventory
Once those are done, you can restart the OUI install of Clusterware at the very beginning. Oracle has also just released a new cleanup utility for failed CRS installs here is the link:
http://download-west.oracle.com/otndocs/products/clustering/deinstall/clusterdeconfig.zip
The new script didn't work for us either. Of course, I didn't try it after our system admins removed ServiceGuard so it may work now.
Friday, August 11, 2006
Clusterware Install Tips HP-UX Itanium
Make sure if you are not going to use HP ServiceGuard on your RAC cluster, that all of ServiceGuard has been stopped and uninstalled. Don't leave any libraries laying around. We found out this the hard way, after spending over a week trying to figure out why the Oracle Installer was acting so crazy and bizarre.
Here are the symptoms: First, if the OUI does not give you the option to add the other nodes and you have to use a configuration file, this is a red flag that the OUI thinks that you are using some vendor cluster software (in this case HP ServiceGuard) instead of using Oracle's. Secondly, if you have some variables that are not assigned (see previous post) in the rootconfig script this indicates that it is not really trying to install rather it is trying to upgrade/update the OCR.
If for some reason, you get the clusterware services running on one of the nodes but it doesn't start on the others and locks up. It probably means that your removal of ServiceGuard was incomplete and left a few SG libraries laying around.
We found all of this out the hard way because HP installed and started ServiceGuard when they installed the HP 9000 Superdome!
Finally, here is an undocumented procedure for HP-UX Itanium Clusterware 10gR2 installation: Shutdown the VIP interface BEFORE beginning the install. If you don't, an error message will appear saying that the VIP interface is being used by another system. Then you have to shut the VIPs down before continuing the install. This is weird because the VIP must be up in order for cluvfy nodecon to work.
Friday, August 04, 2006
CVU Shared Storage Accessibility Check
ERROR>/tmp/9999//bin/lsnodes: cannot get local node number
Although Cluster Verify Utility is an excellent tool for prerequisites, there is room for improvement. Since we did not run cluvfy as thoroughly on our first attempt, we did not encounter Bug 4714708 - Cvu Cannot See Shared Drives.
It turns out that CVU currently does not work with devices other than SCSI devices.
Thursday, August 03, 2006
Failed To Upgrade OCR Message
"The rootconfig script has a few problems. The CRS_HOST_NAME_LIST and CRS_NODE_NAME_LIST are not populated. Script failed at this point:
if $CH/bin/ocrconfig -upgrade $CRS_ORACLE_OWNER $CRS_DBA_GROUP;
then$ECHO "Oracle Cluster Registry configuration upgraded successfully"
else$ECHO "Failed to upgrade Oracle Cluster Registry configuration"
exit 1
fi
libocr10.so is missing(running in debug mode)Commented this out and added environment variables and everything began to run. Script is trying to start init.crs but never starts the daemon."
So, once we talked to Oracle Support about this they suggested backing out the Clusterware install. Note 239998.1. Then re-running cluvfy and checking problems. In reviewing the CLUVFY output and installation documentation, we discovered missing OS patches and missing applications. HP C and C++ are not installed on all of the nodes.
We are canceling the re-install until all HP-UX Itanium patches (and those which are superseded) are installed. We are also waiting on a HP C and C++ install on the nodes which are missing them.
We are also asking Oracle Support to research whether or not GNU HP-UX Itanium C and C++ software can be substituted for the more costly HP licensed versions.
Wednesday, August 02, 2006
Clusterware Install Stopped at OCR Location
"The location /dev/dsk/xxx, entered for the Oracle Cluster Registry (OCR) is not shared across all the nodes in the cluster. Specify a shared raw partition or cluster file system file that is visible by the same name on all nodes of the cluster."
This is caused because the /dev/dsk is a block device and OUI does not recognize or use block devices. So in order to get it to continue, we had to point to the /dev/rdsk/xxx raw character device instead.
You need to bind raw devices and use these raw devices within OUI. See MetaLink Note: 363995.1 for a little more detail.
We also had to do the same thing for the voting disk.
A Pattern is Developing...
CLUVFY User Equivalence Failure
The first issue was with the path to ssh. To solve this issue we had to reference Oracle MetaLink Note 36598.1. Here is an excerpt from the Note:
"Generate a trace file using the cluvfy debugging environment variable SRVM_TRACE=TRUE as in Note 316817.1 and the resulting trace file will show something like:checkRemoteExecutionSetup:: Error checking user equivalence using Secured Shell'/usr/local/bin/ssh'; Fri May 12 15:38:18 CEST 2006. Cluvfy was looking in the wrong location for ssh. As it can be seen in the trace file, the location where the utility is searched is/usr/local/bin/, whilst it resides in /usr/bin.
To implement the solution, please execute the following steps:
- add the symbolic links in /usr/local/bin pointing to the real ssh location, ie /usr/bin. Do the same for the scp command.
- generate the ssh keys to all the cluster locations.
- Run cluvfy again"
The second issue was with the banner. "ssh node date" would return a security banner in addition to the date. In order to get nodecon to work, I had to rename the /etc/issue.net (in HP-UX Itanium) to keep a banner from displaying. See MetaLink note: 338045.1. Of course, once we are finished with the install, I will rename it back.
Monday, July 31, 2006
ASM File Locations
On the ASM Instance:
- Data Files
- Redo Files
- Control Files
- Archive Log Files
On Raw Partitions:
- Voting Disk
- OCR File
On the Local File System:
- Oracle Home Files
- Clusterware Home Files
- ASM Instance Home Files
- Alert Log, Trace Files
- Files for External Tables
- utl_file_dir location
X11 Forwarding From SUDO
$ Xlib: connection to "xxx.xx.xx.xxx" refused by server
Xlib: PuTTY X11 proxy: wrong authentication protocol attempted
Error: Can't open display: xxx.xx.xx.xxx:xx.x
Below are the steps used to successfully transfer xauth information to another user (Oracle in this case)
- Enable X11 forwarding on your terminal application and login as you (as stated above)
‘chmod 644 .Xauthority’ (needs to be done every time service account needs access, it will reset when you log out) - ‘become service account x’
- ‘xauth merge ~username/.Xauthority’ (needs to be done every time service account access is needed)
Once you get a copy of the .Xauthority file to /home/oracle it should work.
Sunday, July 30, 2006
10gR2 RAC Installation Guide
Here is my advice. Before beginning the install of RAC read all of the Installation Guides specific to the release and OS that you will using. Don't just rely on the one installation guide. Read the Release Notes also. And read the Oracle Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide. Check for documentation updates.
I plan on following the installation guide step by step. Will create other posts for the sections of the install that are different for HP-UX Itanium or are not clear in the Installation Guide.
Here are some notes that I gathered from the RAC Installation Guides which I felt were noteworthy:
- You must install Oracle Clusterware and Oracle Database in separate home directories. If you will use multiple Oracle Database homes with ASM, then you should install a separate Oracle Database home for ASM. You should create the listener in the Oracle Database Oracle home.
- With Oracle Database 10g Release 2 (10.2) with RAC, CSS has been modified to allow you to configure CSS with multiple voting disks. In 10g Release 1 (10.1), you could configure only one voting disk. By enabling multiple voting disk configuration, the redundant voting disks allow you to configure a RAC database with multiple voting disks on independent shared physical disks. If you intend to use multiple voting disks managed by Oracle Clusterware, then you must have at least three disks to provide sufficient redundancy, and you must ensure that each voting disk is located on physically independent storage.
- For OCR: Configure one disk if you have existing redundancy support. If you intend to use OCR mirroring managed by Oracle Clusterware, then you must have two OCR locations, and you must ensure that each OCR is located on physically independent storage.
- Although you can specify a logical volume as a device in an ASM disk group, Oracle does not recommend their use. Because logical volume managers can hide the physical disk architecture, ASM may not operate effectively when logical volumes are specified as disk group devices. (Got to find out about this one because we are going to use logical volumes with ASM)
- Cluster Verification Utility (CVU) is a tool that performs system checks. CVU is used to assist you with confirming that your system is properly configured for Oracle Clusterware and Oracle Real Application Clusters installation. CVU does not check kernel parameter settings. This issue is tracked with Oracle bug 4565046.
Oracle recommends that you use the following Oracle Database 10g features to simplify RAC database management:
- Oracle Enterprise Manager—Use Enterprise Manager to administer your entire processing environment, not just the RAC database. Enterprise Manager lets you manage a RAC database with its instance targets, listener targets, host targets, and a cluster target, as well as ASM targets if you are using ASM storage for your database.
- Automatic undo management—This feature automatically manages undo processing.
- Automatic segment-space management—This feature automatically manages segment freelists and freelist groups.
- Locally managed tablespaces—This feature enhances space management performance.
In addition to this being my first RAC install, this is also my first time to use ASM (Automatic Storage Management). Am I a glutton for punishment or what? New to RAC and ASM at the same time.
Friday, July 28, 2006
Setting up the VIPs with HP APA
First some background, VIP stands for Virtual IP address. And for more on HP APA, see a previous post of mine. When installing 10g RAC you need at least a minimum of 3 network interfaces for each node in the RAC cluster.
- A public interface for normal network communications to the node or partition.
- A virtual (public) interface which will be used for failover in case the primary public interface fails. Also used for RAC managment.
- A private interface for the cluster interconnect.
We have 4 software APA NICs (Network Interface Cards) and two regular Gb NICs on each node of the RAC cluster. So, here are our questions:
- How do we use VIP with APA?
- Do they have to be on the same network subnet?
As always, the answer to the first question can be found in a MetaLink document. (296874.1) Configuring the HP-UX Operating System for the Oracle 10g VIP. As stated, there are 2 ways of configuring HP-UX systems for network redundancy to be used for the Virtual IP:
- Oracle VIP with MC/ServiceGuard configured networks only, via multiple physical interfaces on many redundant networks.
- Oracle VIP with APA(i.e. NIC teaming) only via a single logical interface on many redundant networks.
HP Auto Port Aggregation (APA) is a software product that creates link aggregates, often called "trunks," which provide a logical grouping of two or more physical ports into a single "Fat-Pipe". The link aggregates can be active/active(APA aggregate) or active/standby(hot standby mode). An IP is configured only on the single logical interface (usually lan90X), and failure to a single NIC would be transparent to applications that are dependent on a specific interface name.
Auto Port Aggregation(APA) is a NIC teaming solution provided by HP. Although APA is not required when using MC/ServiceGuard (since MC/ServiceGuard has its own network redundancy solution), it is worthwhile to note that a NIC teaming solution can provide highly available VIPs. APA will configure 2 physical NIC's to 1 logical NIC interface. It is usually configured to be lan90x. All that needs to be done for the VIP is to configure it on that 1 logical NIC, similar to what is done on a single NIC configuration.
The answer to the second question is a simple "Yes". The public interface and the virtual interface must both be on the same subnet.
Thursday, July 27, 2006
10gR2 RAC Install -- Soon
With the large number of bugs in 10gR2 RAC, I expect some real challenges and multiple patches. There should be lots of entries on this blog to document the "fun".
Wednesday, July 26, 2006
ORA-12516 or TNS-12516
Every new remote connection into the database from GUI tools like TOAD and SQL Developer received this error. The listener was blocking connections. The problem went away on its own after the server and Oracle became less busy. The server was being slammed at the time of the errors.
(A friend of mine calls GUI tools point-click and drool environments)
After further research and a Service Request with Oracle Support, this is what I found. The error can happen when resources are low. Apparently, when resources are at a premium the listener can block new connections and return this ORA-12516. By increasing the initialization parameter PROCESSES, I hope to keep this from happening again.
Kind of a weird error and response from Oracle whenever resources are low. Oh well, strange error messages keep us gainfully employed.
See Oracle Note: 240710.1 on MetaLink for more information.
Monday, July 17, 2006
Security Checklist
http://www.oracle.com/technology/deploy/security/pdf/twp_security_checklist_db_database.pdf
Again, another white paper from Oracle. Use this checklist as a basis for your own and in accordance with your company's security requirements. Don't you just love audits and security?
How often do you apply the quarterly security patches? Did you know that each patchset released (e.g. 10.1.0.5) also incorporates the last security patch available during that time?
Food for thought and to chew on.