Login as sysdba
oradebug hanganalyze 3;
wait 2 minutes and note the location of the trace file
oradebug hanganalyze 3;
Start another sqlplus session as sysdba
ALTER SESSION SET EVENTS '10998 trace name context forever, level 1';
alter session set events 'immediate trace name SYSTEMSTATE level 10';
After generating the dumps, used the UNIX top command to look for Oracle processes that were hung. Found an Oracle user process that was using 99% CPU and all of the other Oracle processes were "waiting". So, I decided to kill the process. First I did ps -ef pipe to grep 99999 (where 99999 is the PID) to see what this process was. Then I determined that the process could be safely killed. Next, I killed it and the PID (Process Id) that was associated with the rogue process. [su - oracle followed by kill -9 99999] Once the PID was killed, all other Oracle processes began "running" and the crisis was over. My next step is to open a SR with Oracle Support and upload the trace files.
Check out the comments for more instructions by me.
2 comments:
I heard back from Oracle Support.
A true hang condition was NOT found. The database has only STUCK sessions which are not hung in a cycle deadlock but are unable to continue working due to a long wait condition. (e.g. A transactional lock wait).
Wait chains may reflect a performance problem scenario, which is what appears to have
happened in your case. Wait chains may be resolved by killing the session that is on top of
the waiting list, or by forcing it to release the resources that other processes require to keep
working (Commit or Rollback the transaction).
Today, I had a hang so severe that it was taking hours to create the trace files using the procedures that I outlined in this post. Decided to go ahead and stop the trace dumps after an hour,so we could get the database back in business. Had to do a shutdown abort to get the instance to shutdown.
Here is a note to check out on MetaLink 121779.1 Taking Systemstate Dumps when You cannot Connect to Oracle. Use that note whenever the system will not respond at all during a hang up.
Post a Comment