Steven Edwards' thoughts on various topics, Oracle related and not. Note: I reserve the right to delete comments that are not contributing to the overall theme of the Blog or are insulting or demeaning to anyone.
Wednesday, February 01, 2017
Oracle ASM Header Backup -- Update
There are multiple ways to skin this cat. Another option for backing up your ASM file headers is to use "kfed". Yes, I know, Oracle Support doesn't recommend that we use it. However, if you ever need to recover the header, you will be very glad you did. Below is a partial example of a backup using kfed:
/u01/app/oracle/product/12.1.0/grid_1/bin/kfed read ${ASM_DEV_DIR}/${i} text=${HDR_BACKUP_DIR}/${i}.${TODAY}.bkp
Utilizing "kfed read" will allow you to recover the header in case of corruption. If you have header corruption and your other methods of header backup (like md_backup and dd's) do not work, then you can tell Oracle Support that you have a kfed file of the header from before the corruption.
As with any corruption, make sure you open an SR with Oracle Support before attempting any repair yourself.
For more information, review the information in My Oracle Support (MOS) regarding kfed.
Thursday, December 08, 2016
Wow, 6 Years Without A Post
[http://www.EdwinMSarmiento.com]
So, what is a 19 year experienced Oracle DBA doing learning Microsoft SQL Server? I'm Starting to work with different RDBMSs and NOSQL technologies. Which is really a blast after being limited to only one vendor. I'm also getting to work with DataStax [Cassandra]. Who said you can't teach an old dog new tricks?
In Edwin's presentation, he talked about the 3M's with blogging:
MASTERY
- Writing helps CLARIFY your thinking
- Writing forces you to VALIDATE your assumptions
- Writing helps REINFORCE your knowledge
MARKETING
- Writing helps ESTABLISH CREDIBILITY
- Blogging builds your ONLINE REPUTATION (e.g. Brent Ozar, Pinal Dave, ...)
- Writing builds CONNECTIONS with readers
- Writing helps you see other's PERSPECTIVE
- Writing develops valuable RELATIONSHIPS
Thursday, December 09, 2010
10g Impdp ORA-00959 $deleted$ Fix
Import: Release 10.2.0.4.0 - 64bit Production on Monday, 06 December, 2010 17:57:27 Copyright (c) 2003, 2007, Oracle. All rights reserved. ;;; Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, OLAP and Real Application Testing options Master table "SYSTEM"."SYS_IMPORT_FULL_01" successfully loaded/unloaded Starting "SYSTEM"."SYS_IMPORT_FULL_01": parfile=hm_camp_m010.par Processing object type TABLE_EXPORT/TABLE/TABLE ORA-39083: Object type TABLE failed to create with error: ORA-00959: tablespace '_$deleted$16$0' does not exist
Basically, bombed because the tablespace _$deleted$16$0 doesn't exist. Looked it up on MetaLink and found this document:
Import Partition Table Fails With Ora-00959: Tablespace '_$Deleted$0$0' [ID 798142.1]
There is an open unpublished bug where basic problem is that a tablespace is renamed to one that used to exist on the database. Here are the steps that will cause this problem.
create tablespace aa
create tablespace bb
drop tablespace aa
rename bb to aa
Then why it occurs is because when a tablespace is dropped, its ts$ entry is set to an invalid state, but it is not deleted. When a rename is done, if the target name already has a ts$ entry, then during the rename, because we cannot have two ts$ entries with the same name, the _$deleted$ construct is introduced. The change is not taken care of properly, and is flushed to other dictionary tables causing this problem.The note didn't have good instructions to fix my issue. So, I'm recording what I did here:
- Run impdp with this parameter SQLFILE=dir:name to pull the DDL for the object.
- Modify and execute the SQL created in step 1.
- Run impdp with this parameter CONTENT=DATA_ONLY to bypass the Metadata portion of impdp steps.
In step 2 you take the create table SQL and modify it by creating the table in a tablespace that does exist.
Hopefully, this will help you.
Oracle ASM Header Backup
One of my DBA friends shared this information with me which I reviewed and researched. Kudos to my friend.
A situation popped up recently for one team where a drive went into predictive failure, and a controller hung, corrupting the ASM headers upon reboot. This resulted them having to completely rebuild the disk groups and restore the database from tape. If the team would have had a backup of their ASM headers, all of that wouldn’t have been necessary.
ASM metadata corruption is a clear and present danger that we as DBAs have to recognize and prepare for. It is nothing to be embarrassed about, and nothing to fear, as long as you are ready for it.
Oracle 11g
md_backup
md_backup is ran from the ASMCMD> prompt. It very simply creates a file that is a backup of your metadata, for all of your disk groups, or just a few if you want to specify. By default, it writes to the current directory and backs up all of the mounted disk groups header files.
Arguments: -b and –g
-b Specifies a location to store the backup file
-g Specifies the disk group to back up
Example: ASMCMD> md_backup –b /mydir/somebackupfile –g DG1 –g DG3
This will backup disk groups DG1 and DG3 to the file /mydir/somebackupfile.
md_restoremd_restore is a bit more involved than md_backup. As you may have guessed, it restores previously backed up metadata to disk groups you specifiy.
Arguments: -b –i –t –f –g –o
-b Specifies the backup file to read
-i Ignores errors. (Normally, if md_restore hits an error, it stops. This argument suppressed that.
-t Specifies the type of disk group to be created/restored. “full” creates a disk group and restores the metadata. “nodg” restores metadata only, and “newdg” creates a new disk group with a different name using in conjunction with the –o argument.
-o Renames the disk group
-f writes the SQL commands to be performed in the restore to a file instead of executing them (sql_script_file).
-g selects the disk groups to be restored. If none are specified, it restores all of them.
Example: ASMCMD> md_restore –t full –g DG1 –i backup_file
This will restore disk group DG1 using the specified file.
Example: ASMCMD> md_restore –t nodg –g DG1 –i backup_file
This will restore an existing disk group’s metadata.
Oracle 10g
When implementing ASM backups should not only be made of the data within ASM but the disk headers should also be backed up. Oracle 10g does not provide a way out of the box to do this but Oracle 11g provides two new commands in the asmcmd utility to backup disk headers. The header of a disk is comprised of the first 4096 bytes. While not supported it is possible to use the UNIX dd to read and write these bytes to a backup file.
dd if=device name of=unique header file bs=4096 count=1
That will capture the first 4k block of the ASM file. You would do this for each file, creating a separate backup for each. A simple dd will write it back in to the header, if you need to restore from your backup. Example:
dd if=/dev/rdisk/asmvol1 of=/u01/ora_backup/asmvo1_header.bak bs=4096 count=1
This would capture the first 4k of the given ASM file. Example:
dd if=/u01/ora_backup/asmvol1_header.bak of=/dev/rdisk/asmvol1
This would write in the backed up header back into the file it was backed up from.
It is always a good idea to keep backups of your disk group metadata, because no one can know when the corruption gremlin might strike.
Tuesday, May 12, 2009
VIP Address Fun
10.0.0.0 - 10.255.255.255 (10/8 prefix)
172.16.0.0 - 172.31.255.255 (172.16/12 prefix
192.168.0.0 - 192.168.255.255 (192.168/16 prefix
Here is another link: http://www.faqs.org/rfcs/rfc1918.html
When the vipca silent installation fails because of the address range, you have to run it manually or issue the command:
$ORA_CRS_HOME/bin/srvctl add nodeapps -n crmnode1 -o /ora/ora10 -A 1.2.3.4/255.255.255.0/lan0
See Note 316583.1 VIPCA FAILS COMPLAINING THAT INTERFACE IS NOT PUBLIC for more information.
Just follow the instructions in the note and you will be fine. However, when I first followed the instructions of the note, I couldn't get the add nodeapps for the VIP to work. Turns out the note did not have the correct syntax. I pointed it out to Oracle Support and the author of the note updated the syntax. So, it is always good to verify syntax from a separate source if you have problems with Oracle documents.
Also, the cluvfy will indicate the following:
WARNING:
Could not find a suitable set of interfaces for VIPs.
Result: Node connectivity check failed.
Even though everything is OK, you are just using a non-standard private VIP.
Sunday, May 03, 2009
Legacy Import Tricks
We are importing a ton of data and here are some import tricks that are documented in Oracle MetaLink doc id 93763.1. The ones I'm listing here only apply to imports from direct exports. We had already done the other things in the note.
1. Increase the BUFFER size of the import
2. Create some huge redo log files to minimize the LGWR switches.
It was amazing. This MetaLink note actually works :-)
Cluvfy Open File Descriptors Failed
Check: Hard resource limit for "open file descriptors"
Node Name Available Required Comment
------------ ------------------------ ------------------------ ----------
coronet 4096 65536 failed
vail 4096 65536 failed
Result: Hard resource limit check failed for "open file descriptors".
Increase these kernel parameters in HP-UX ia64:
nfile=131072
maxfiles_lim=65536
Use kctune nfile and kctune maxfiles_lim to verify the correct values after the UNIX administrator corrects them.
More HP-UX ia64 VIP Fun
Before you begin a new Clusterware install, make sure when you receive your servers from the UNIX admins that the VIPs are not already configured and active. However, you should have the host and IP for the Virtual IP in the domain server and local hosts file.
Do netstat -inw at a UNIX prompt to see the interfaces. You should not see the VIP interface.
If it is configured or active, you can unplumb it if it is NOT aggregated (e.g. APA):
ifconfig lan0:1 unplumb
or if it is APA:
ifconfig lan900:1 0.0.0.0
By setting the IP address to 0.0.0.0 it disables the secondary interface.
/etc/rc.config.d/netconf file has the interfaces which are started at boot up. They can be commented out to remove an interface following a reboot.
Friday, April 24, 2009
HP-UX 11.31 IA64 Memory / Filecache
The filecache_min and filecache_max kernel parameters are the ones to check. These should be tuned to leave more memory for Oracle. You can also mount the file system with direct IO which bypasses file buffer cache. If you are using file system for your Oracle datafiles then you want those mount points mounted with direct IO.
So, we solved it by setting constant values for filecache_min and max instead of taking the default %. By lowering them significantly, we were able to give more memory back to Oracle and the other applications.
We still have quite a bit of sys memory being utilized. However, the filecache issue is solved. I'll update this post if we figure out why so much sys memory is utilized. Or, it could be "just the way it is".
Sunday, April 19, 2009
Bug 7171446 - NUMA Issues from 10.2.0.4 Patchset
Unless the system has been specifically set up and tuned for NUMA at both OS and database level then disable Oracle NUMA optimizations by setting the following in the pfile / spfile / init.ora used to start the instances:
_enable_NUMA_optimization=FALSE
_db_block_numa=1
At some point, Oracle should release a patch for bug 7171446. Once it is released, we will install the patch and remove the hidden parameters from the parameters.
CRS Log Directory Permissions
Here is a hint, if you are having CRS issues, always first check to make sure all of the directories exist for the CRS processes to create log files.
What was totally crazy, the other node owner, group and permissions were right. Don't know if the owner, etc. got goofed up during the patchset install for 10.2.0.4. or if it was another Clusterware merge patch. Never did figure out the why of it.
11.1 CLUVFY HP-UX ia64 Issues
Different MTU values used across network interface(s).
These different values are necessary because we are using Infiniband as the private interconnect and you want the larger Maximum Transmission Unit (MTU) value for the more robust interconnect. The public interface is a standard gigabit network so a lower value makes sense. So, we basically ignored that error because changing the Infiniband to a lower MTU value is not practical just to get a clean cluvfy before installing the Clusterware. For more info on MTU and Oracle RAC see MetaLink note: 341788.1
This is a known bug discussed in metalink note 758102.1. Root cause is BUG
7493420 fixed in 11.2. This configuration is valid since the interfaces across nodes have the same MTU. As long as the interfaces across the nodes have the same MTU, you are good to go.
The next issue had to do with shared storage. With HP-UX ia64 11.31, we created shared raw ASM LUNs then created aliases to those LUNs for the ASM diskstring. This storage is shared between the two nodes using EVA8000 storage. Cluvfy does not recognize that shared storage is available and it is working correctly. The failure message you get can be ignored. Here is the message:
Shared storage check failed on nodes "xxxxx"
In the known limitations section of the cluvfy readme, it clearly states the following:
"Sharedness check of SCSI disks is currently not supported."
If these are SCSI disks, then this error is expected as cluvfy cannot handle this check. As long as these disks can be seen from both nodes and has the correct permissions and ownership, ASM should install/work fine.
As long as your storage is working correctly, you can ignore the shared storage check because cluvfy is not able to verify multipath / autopath type software like that built in to HP-UX 11.31 using virtual disk devices on EVA8000 storage.
HP-UX Async IO
Ioctl ASYNC_CONFIG error, errno = 1
After further analysis on MetaLink, and assistance from Support, we determined that asynch IO was not configured. The following are the steps that we did as root to resolve the issue:
- created /etc/privgroup. Added the following entries in the file:
- dba RTPRIO RTSCHED MLOCK
- oinstall RTPRIO RTSCHED MLOCK
- /usr/sbin/setprivgrp -f /etc/privgroup
- getprivgrp dba
- getprivgroup oinstall
- cd /dev/async
- chown oracle:dba async
- chmod 660 async
We just had to say "Isn't that interesting..."
Saturday, July 07, 2007
ASM Disk Group and Securepath
The best practice is to ignore the individual devices in the path and use the secure virtual device file instead when creating your ASM disk groups. For example /hpap/dsk/hpap??
The secure virtual device takes away the confusion of the underlying devices which are just paths to the same device. Securepath handles the rest. Also, by using the secure virtual device file, the ASM disk group automatically shows up on all instances that use the same virtual device file.
Cannot Lock /etc/mnttab
bdf: cannot lock /etc/mnttab; still trying ...
bdf: cannot lock /etc/mnttab; still trying ...
bdf: cannot lock /etc/mnttab; still trying ...
bdf: cannot lock /etc/mnttab; still trying ...
The /etc/mnttab is the mounted file system table. Our SAN was showing that 3 disks had failed in the same 7 disk group. Oracle was trying to expdp to the mount points and NetBackup was trying to read from the same mount points. We couldn't get Oracle to shutdown or NetBackup either.
HP physically went and pulled out one of the bad disks. When the bad disk was physically ejected, Oracle came down and so did NetBackup. Turns out that there was only one bad disk in the Data Replication Group. The bad disk did not eject itself cleanly and hung up the other 2 disks. So, it looked like there were 3 bad disks. Once the bad disk was replaced, the disk group began leveling as normal. The /etc/mnttab became available and all data was present. There was no data loss. We started Oracle and everything looked fine. So, what looked like a bad issue, turned out to be a disk that was not ejected cleanly from the disk group that also locked up the /etc/mnttab file.
Go figure, still not impressed with HP storage especially the EVA5000. We have had lots of bad issues with EVA5000 storage everything from strange things like above to data corruption of the Oracle database to loosing mount points.
Tuesday, October 03, 2006
Opatch with RAC
Wednesday, September 27, 2006
ORA-29701 Cluster Manager
After stopping everything on the node, the ASM instance would not start and immediately barked at me with the same ORA-29701 error. At this point I asked someone else that has more experience with Oracle Clusterware than I do.
They checked it out and found that somehow the ASM /dev/rdsks and special files for the OCR and voting disks had changed ownership. Someone with root access must have run insf -e to reinstall the special files. Oh, great!
A sys admin had already created a shell script to change the ownership of the /dev/rdsks to oracle:dba and chmod them to 660. So, all we had to do was ask one of our sysadmins to run the script. They also had to manually chown root:oinstall /dev/voting and chown oracle:oinstall /dev/ocr.
So, if you get an ORA-29701 and can't figure it out, check the owner and permissions of your Oracle devices.
Sunday, September 10, 2006
Database Control RMAN Wizard Bug
Now, without the undo tablespace, database recovery is not possible. However, the EM RMAN wizard will not allow you the option of adding the undo tablespace to the list of tablespaces to be selected for backup.
So, here is the workaround:
- Modify the RMAN script that is created and manually include the undo tablespace in your list of tablespaces to be backed up. Then submit your job.
I will be creating some RMAN backups for a 10.2 database soon and will let you know if this bug is also in 10gR2.
Saturday, September 09, 2006
Old Dog New Tricks
tnsnames.ora changes do not require restart of the listener. Kind of cool. And may favorite is "lsnrctl reload". The reload resets the listener without stopping and starting it.
Friday, September 08, 2006
HP EVA8000 Autopath Tuning
The UNIX administrators, HP, and my peers began looking into the problem. I wasn't totally engaged but want to post the information anyway. Props to my colleages. This is more system administration territory; however, as DBAs we need to know the impact of EVA tuning.
HP checked the load balancing policy on the nodes. It was set to “No Load Balancing” which greatly impacts performance. Now, how it was changed to "No Load Balancing" after the first I/O benchmarks is still a mystery.
To see the LUNs, issue the command “autopath display” as root. This will list all the LUNs and show the HP Autopath load balancing policy.
root@hostname:/ # autopath display
...
Load Balancing Policy : No Load Balancing
...
So, HP and my peers recommended that the HP Autopath load balancing policy be set to round robin for all LUNs. The autopath set_lbpolicy command sets the load balancing policy for the specified device path.
autopath set_lbpolicy <{policy name} {path}>
description: sets load balancing policy
usage: autopath set_lbpolicy <{policy name} {path}>
Policy name: The load balancing policy to set
Valid policies are
RR : Round Robin.
SST : Shortest Service Time.
SQL : Shortest Queue Length.
NLB/OFF : No load Balancing.
Path : Device Special File e.g./dev/dsk/c#t#d#
Example:
# autopath set_lbpolicy RR /dev/dsk/c0t0d0
The example above sets the policy to Round Robin.
Here is a little more information about what the policies mean:
- RR : Round Robin. - I/O is routed through the paths of the active controller in round-robin order
- SST : Shortest Service Time. - is a measurement against the average service time of the IOs on a path
- SQL : Shortest Queue Length. - is a calculation on the device queue depth when it is on a certain path.
- NLB/OFF : No load Balancing. - No consideration given to service times or queue lengths, typically impacts performance.