Showing posts with label statement suspended. Show all posts
Showing posts with label statement suspended. Show all posts

Friday, February 28, 2014

OBIEE and why TEMP space is really important

Several months ago I shared that the DBAs were proactive in creating a TEMP space alert so we could see dangerous activity before it caused our EBS application a problem.  Releasing a new version of OBIEE to run against our Financials database, nightly they were doing "test" runs for several months of data to iron out any potential problems with their schemas, etc. before they turn the dashboards on in a few weeks.  Each night, we would see our TEMP space alert go off and while it did not reach extreme levels, we were concerned enough to let the OBIEE and DBA teams know about it before they did the entire dataload.

A few days/weeks go by, and they finally are ready to do the entire dataload.  What do you think happens?  That's right, we ran out of TEMP space so the database just paused like I've reported before on "statement suspended, wait error to be cleared" wait events until TEMP space was added so the dataload could be completed over to the Financials database.  Not only is this another chance to reiterate how important TEMP is, but ask the right questions about applications that hook into yours even if you are told it is just to extract data.

Friday, August 2, 2013

Database FLASH exhaustion

Have you seen a database exhaust FLASH before?  Well if not, here's an introduction!

New errors detected in "/u01/app/oracle/diag/rdbms/db/dbnode/trace/dbname_arc3_17851.trc":
===========================================================
ORA-19816: WARNING: Files may exist in db_recovery_file_dest that are not known to database.
ORA-17502: ksfdcre:4 Failed to create file +FLASH
ORA-15041: diskgroup "FLASH" space exhausted

During this time, we have flashback logs which are using up more than a TB of space which our DBA found by going into ASMCMD to see that FLASH is consuming a lot of space:

[oracle@server trace]$ asmcmd
ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  4194304  15593472  6107124          5197824          454650              0             N  DATA/
MOUNTED  NORMAL  N         512   4096  4194304    894720   893272           298240          297516              0             Y  DBFS_DG/
MOUNTED  NORMAL  N         512   4096  4194304   3896064   626852          1298688          
-335918              0             N  FLASH/
ASMCMD> exit

Here the DBA can see how much space is used by the flashback logs:

ASMCMD> du FLASHBACK/
Used_MB      Mirror_used_MB
891468             1782936

What does this mean to us?  Well, this is a typical snapshot of our system at this time (with our handy dandy alert we created months earlier):

Waits                                                                 Wait Time
  Backup: MML write backup piece                   5482.197026
  SQL*Net message from dblink                        4727.746732
  TCP Socket (KGAS)                                      762.1687

This is a snapshot of our system when it starts running out of FLASH:

Waits                                                                 Wait Time
  statement suspended, wait error to be cleared   212154.60987
  inactive transaction branch                                11951.353872
  SQL*Net message from dblink                         2742.039774
  TCP Socket (KGAS)                                       1175.376008
  SQL*Net break/reset to client                          418.624794

The DBA cleaned up archive logs older than 24 hours, and now we have space to grow and those sessions which were suspended are now able to resume activity:

ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  4194304  15593472  6107108          5197824          454642              0             N  DATA/
MOUNTED  NORMAL  N         512   4096  4194304    894720   893272           298240          297516              0             Y  DBFS_DG/
MOUNTED  NORMAL  N         512   4096  4194304   3896064  1393468          1298688           47390              0             N  FLASH/
ASMCMD>

As the DBA tells us, if we use our archive application and delete 300 GB deletion of data from the DATA diskgroup, it gets into the FLASH diskgroup in the flashback logs as 600 gigs with mirroring.  This results in the 1.7TB of usage by flashback logs which is abnormal and as you can see takes up too much space.

Thursday, March 7, 2013

Journal Import: Suspended!

Off of our wait event alert, there was a session that just sitting in "statement suspended, wait error to be cleared" which was a new wait event I had not seen before.  Doing some research I found it to be a Journal Import report which was in the process of deleting rows per this statement:

delete from GL_INTERFACE
where (user_je_source_name, nvl(group_id, -1)) in (listing of ledgers) and status = 'PROCESSED'
and request_id = user request
and rownum <= 5000
and (user_je_source_name, nvl(group_id, -1)) IN (SELECT src.user_je_source_name, nvl(group_id,-1) FROM gl_interface_control ic, gl_je_sources src WHERE ic.je_source_name = src.je_source_name
AND   ic.interface_run_id = run id
AND   ic.set_of_books_id = set of books id
AND   ic.status = 'I'
AND   nvl(ic.processed_table_code,'D') != 'S')

I can see in the table dba_outstanding_alerts information about this session as such:

Operation on resumable session Session 1579, Instance 1 session id 1579 suspended because of errors in rollback segment _SYSSMU1260_2942144267$. Error message is ORA-01628: max # extents (32765) reached for rollback segment _SYSSMU1260_2942144267$

Now the error message seems to be straightforward, and My Oracle Support notes ID 761848.1 & 788060.1 give some anecdotal evidence for what is going on, but we weren't able to get the root cause addressed in time to allow the report to complete successfully as it timed out internally and resulted in the report completing in Error.  This means I'm not exactly sure how we could have resolved it, yet I have some suspicion that if we could have expanded the maximum number of extents for this segment we would have seen the session resume and move along.  I'll put this down under as a mystery which needs research, because we need to learn more about it for next time when the report in question is something a lot more important and we can't go back to Oracle with an SR to fix.