Thursday, June 28, 2012

Users do what users do

If I had to write a book, the first chapter would be titled after this post.  Maybe more accurately, users do what users can do.  Last week I was going to write something up about this type of thing, but it makes SO much more sense after this week and introduces what I'll call Main Tenants of Support.

These Main Tenants are things we support individuals hold to be self evident, even if developers and project personnel don't understand a damn thing about them.

Our case:  We were given scripts from Oracle to drop and recreate a queue table that we were told by our developers wasn't being used since it had built up over 10 years of non-use.

rem stop queue: exec dbms_aqadm.stop_queue(‘AR.AR_REV_REC_Q’);

rem drop queue: exec dbms_aqadm.drop_queue(‘AR.AR_REV_REC_Q’);

rem drop queue table: exec dbms_aqadm.drop_queue_table(‘AR.AR_REV_REC_QT’);

rem create new queue table: begin DBMS_AQADM.CREATE_QUEUE_TABLE( Queue_table => ‘AR.AR_REV_REC_QT’, Queue_payload_type => ‘SYSTEM.AR_REV_REC_TYP’, storage_clause => ‘TABLESPACE AR’, Sort_list => ‘ENQ_TIME’, Multiple_consumers => TRUE, Compatible => ‘10.2’, comment => ‘Revenue Recognition Table’); end; /

rem create queue: exec dbms_aqadm.create_queue(‘AR.AR_REV_REC_Q’,’AR.AR_REV_REC_QT’);

rem start queue: exec dbms_aqadm.start_queue(‘AR.AR_REV_REC_Q’);

So instead of just purging it once and then letting another 10 years go by before we purge it again, we decided to wrap it into a concurrent program and let it run under 2 parameters:
  • drop and recreate queue
  • delete records from the queue
In DEV and TEST the report was run in 60-70 hours which seemed like an acceptable time frame for deleting 45-50 million records and rebuilding the queue, since this is something we've never had to do at this type of scale.

What we didn't know:  The queue table is used in base Oracle functionality for invoice processing.  So what you say?  The API for the drop queue was designed to delete all the rows AFTER it locked the object.

Error: Unable to create invoice due to an error: ORA-04021: timeout occurred while waiting to lock object AR.AR_REV_REC_Q
ORA-06512: at "APPS.RA_CUSTOMER_TRX


Why was this a problem:  That meant when we kicked off the concurrent program 10 PM Friday night, it prevented receipts from getting created in Financials from our web payment front end until we saw an alert on Monday shortly before noon that was telling us we had a backlog in receipt creation.  Killing the concurrent request didn't do anything because the process was still alive, so we had to mark the session for kill but the session was still alive.  It did stop what it was doing after getting past the 9 millionth record, but then it popped out to our shell of the concurrent program which was a problem because it found itself in a loop that it had no way out of.  I researched a bit and found a good article from Tanel Poder breaking down a situation a lot like ours, and I had some tricks in my bag from a previous issue involving the Revenue Recognition, to realize that the OS process needed to be killed manually.

What this shows us about our Main Tenant:  We were able to make the mistake of locking the queue.  Oracle told us to do this, so why would we believe it would have an issue?  We did exactly what the application allowed us to do, so we did it.  The same can be said of forms, fields, and just about anything else in the application so we need to be aware of how developers and project analysts (and Oracle) allow users to interact with the application.  Did we build a form that allows blind queries?  Does a report spend 3 days figuring out a Hash Join before running the actual meat and potatoes of the report in 15 minutes time?  It is the aftermath of many of these unasked, or unanswered, questions prior to implementation that I have to deal with on a daily basis.

Edit: 8/16/12 - Needed to change the link to Tanel's article since it was somehow linking the search engine URL instead.

No comments:

Post a Comment