Following up on yesterday's post where I suggest another reason to have alerting in place, today I explain what happened and the errors we saw in our system which may help you down the road. A scheduled ASM disk maintenance didn't complete as planned since the system never completed (or took) the commands issued, and several hours the database cluster seemed to panic due to ASM with the following message:
ORA-00020: maximum number of processes 100 exceeded
ORA-20 errors will not be written to the alert log for
the next minute. Please look at trace files to see all
the ORA-20 errors.
so the cluster had to be restarted. What did that mean for us? Well, this was a remote database cluster to our local database cluster so overnight when the business had the new R12 Create Accounting report scheduled to run during the remote cluster outage this (and several other) report errored with several messages:
ORA-00060: deadlock detected while waiting for resource
ORA-01591: lock held by in-doubt distributed transaction <ID>
This ID and error message were found in our local database log when the transaction became in-doubt, so this is an opportunity to improve some alerting when this happens the next time to avoid extended locking in the application.
Obviously, the next day after the remote database cluster was up while these reports ran they ran successfully. For the most part. I'll tell you more about that later.
No comments:
Post a Comment