Monday, October 21, 2013

R12: Application server craziness

A few days ago I let you know that we were going to R12, but I never really said if we were successful or not on the blog.  Well, we were!  Yay us!  Now, that brings new challenges obviously so I'm going to spend a week or two sharing with you some brand new content directly related to my experiences with 12.1.3.

The first thing I had noticed in the hours proceeding our validation and sign off, was that the application servers were running "hot" with their OS load averages being a bit higher than I'd normally seen them.  I wasn't too surprised by that since our DBA group had a Knowledge Transfer session with us and this was one of the issues brought up as a result of a known bug described in MOS Note 745711.1 that required us to set the value for FORMS_RECORD_GROUP_MAX to 100,000 but I thought differently the next morning.  We were already seeing load averages FOUR times normal on EACH server and nobody was really in and doing anything yet.  What really puzzled me was that the threads on the OS weren't tracking back to anything I could find existing in gv$session; not just INACTIVE, but they just weren't there trying to match up the PID with the gv$session.process column information so I figured maybe there is something new with how R12 deals with session information.

I alerted the DBAs and continued to take a look into this, while they did the same, and we came to the realization that all of the application servers were not rebooted.  Why is this a critical issue?  A reboot of the application servers is required after the RPM patching that was done to them for R12!  The DBAs also thought they needed to limit the amount of rows returned as well, so we made a change to the value of FORMS_RECORD_GROUP_MAX to 60,000 and did a rolling bounce of the application servers just in time.  Why just in time?  When we started, they were FIFTY times normal load and by the time we finished were north of SEVENTY times normal load and we had started to see occasional intermittent failures on the nodes that were still up waiting for their reboot.

After reboot, and parameter change, the load never really went over our normal limits and it has been a WEEK now with everything just fine!

1 comment: