Several months ago I shared that the DBAs were proactive in creating a TEMP space alert so we could see dangerous activity before it caused our EBS application a problem. Releasing a new version of OBIEE to run against our Financials database, nightly they were doing "test" runs for several months of data to iron out any potential problems with their schemas, etc. before they turn the dashboards on in a few weeks. Each night, we would see our TEMP space alert go off and while it did not reach extreme levels, we were concerned enough to let the OBIEE and DBA teams know about it before they did the entire dataload.
A few days/weeks go by, and they finally are ready to do the entire dataload. What do you think happens? That's right, we ran out of TEMP space so the database just paused like I've reported before on "statement suspended, wait error to be cleared" wait events until TEMP space was added so the dataload could be completed over to the Financials database. Not only is this another chance to reiterate how important TEMP is, but ask the right questions about applications that hook into yours even if you are told it is just to extract data.
Showing posts with label statement suspended. Show all posts
Showing posts with label statement suspended. Show all posts
Friday, February 28, 2014
Friday, August 2, 2013
Database FLASH exhaustion
Have you seen a database exhaust FLASH before? Well if not, here's an introduction!
New errors detected in "/u01/app/oracle/diag/rdbms/db/dbnode/trace/dbname_arc3_17851.trc":
===========================================================
ORA-19816: WARNING: Files may exist in db_recovery_file_dest that are not known to database.
ORA-17502: ksfdcre:4 Failed to create file +FLASH
ORA-15041: diskgroup "FLASH" space exhausted
During this time, we have flashback logs which are using up more than a TB of space which our DBA found by going into ASMCMD to see that FLASH is consuming a lot of space:
Here the DBA can see how much space is used by the flashback logs:
ASMCMD> du FLASHBACK/
What does this mean to us? Well, this is a typical snapshot of our system at this time (with our handy dandy alert we created months earlier):
Waits Wait Time
Backup: MML write backup piece 5482.197026
SQL*Net message from dblink 4727.746732
TCP Socket (KGAS) 762.1687
This is a snapshot of our system when it starts running out of FLASH:
Waits Wait Time
statement suspended, wait error to be cleared 212154.60987
inactive transaction branch 11951.353872
SQL*Net message from dblink 2742.039774
TCP Socket (KGAS) 1175.376008
SQL*Net break/reset to client 418.624794
The DBA cleaned up archive logs older than 24 hours, and now we have space to grow and those sessions which were suspended are now able to resume activity:
ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED NORMAL N 512 4096 4194304 15593472 6107108 5197824 454642 0 N DATA/
MOUNTED NORMAL N 512 4096 4194304 894720 893272 298240 297516 0 Y DBFS_DG/
MOUNTED NORMAL N 512 4096 4194304 3896064 1393468 1298688 47390 0 N FLASH/
ASMCMD>
As the DBA tells us, if we use our archive application and delete 300 GB deletion of data from the DATA diskgroup, it gets into the FLASH diskgroup in the flashback logs as 600 gigs with mirroring. This results in the 1.7TB of usage by flashback logs which is abnormal and as you can see takes up too much space.
New errors detected in "/u01/app/oracle/diag/rdbms/db/dbnode/trace/dbname_arc3_17851.trc":
===========================================================
ORA-19816: WARNING: Files may exist in db_recovery_file_dest that are not known to database.
ORA-17502: ksfdcre:4 Failed to create file +FLASH
ORA-15041: diskgroup "FLASH" space exhausted
During this time, we have flashback logs which are using up more than a TB of space which our DBA found by going into ASMCMD to see that FLASH is consuming a lot of space:
[oracle@server trace]$ asmcmd
ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED NORMAL N 512 4096 4194304 15593472 6107124 5197824 454650 0 N DATA/
MOUNTED NORMAL N 512 4096 4194304 894720 893272 298240 297516 0 Y DBFS_DG/
MOUNTED NORMAL N 512 4096 4194304 3896064 626852 1298688
-335918 0 N FLASH/
ASMCMD> exit
Here the DBA can see how much space is used by the flashback logs:
ASMCMD> du FLASHBACK/
Used_MB Mirror_used_MB
891468 1782936
What does this mean to us? Well, this is a typical snapshot of our system at this time (with our handy dandy alert we created months earlier):
Waits Wait Time
Backup: MML write backup piece 5482.197026
SQL*Net message from dblink 4727.746732
TCP Socket (KGAS) 762.1687
This is a snapshot of our system when it starts running out of FLASH:
Waits Wait Time
statement suspended, wait error to be cleared 212154.60987
inactive transaction branch 11951.353872
SQL*Net message from dblink 2742.039774
TCP Socket (KGAS) 1175.376008
SQL*Net break/reset to client 418.624794
The DBA cleaned up archive logs older than 24 hours, and now we have space to grow and those sessions which were suspended are now able to resume activity:
ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED NORMAL N 512 4096 4194304 15593472 6107108 5197824 454642 0 N DATA/
MOUNTED NORMAL N 512 4096 4194304 894720 893272 298240 297516 0 Y DBFS_DG/
MOUNTED NORMAL N 512 4096 4194304 3896064 1393468 1298688 47390 0 N FLASH/
ASMCMD>
As the DBA tells us, if we use our archive application and delete 300 GB deletion of data from the DATA diskgroup, it gets into the FLASH diskgroup in the flashback logs as 600 gigs with mirroring. This results in the 1.7TB of usage by flashback logs which is abnormal and as you can see takes up too much space.
Thursday, March 7, 2013
Journal Import: Suspended!
Off of our wait event alert, there was a session that just sitting in "statement suspended, wait error to be cleared" which was a new wait event I had not seen before. Doing some research I found it to be a Journal Import report which was in the process of deleting rows per this statement:
delete from GL_INTERFACE
where (user_je_source_name, nvl(group_id, -1)) in (listing of ledgers) and status = 'PROCESSED'
and request_id = user request
and rownum <= 5000
and (user_je_source_name, nvl(group_id, -1)) IN (SELECT src.user_je_source_name, nvl(group_id,-1) FROM gl_interface_control ic, gl_je_sources src WHERE ic.je_source_name = src.je_source_name
AND ic.interface_run_id = run id
AND ic.set_of_books_id = set of books id
AND ic.status = 'I'
AND nvl(ic.processed_table_code,'D') != 'S')
I can see in the table dba_outstanding_alerts information about this session as such:
Operation on resumable session Session 1579, Instance 1 session id 1579 suspended because of errors in rollback segment _SYSSMU1260_2942144267$. Error message is ORA-01628: max # extents (32765) reached for rollback segment _SYSSMU1260_2942144267$
Now the error message seems to be straightforward, and My Oracle Support notes ID 761848.1 & 788060.1 give some anecdotal evidence for what is going on, but we weren't able to get the root cause addressed in time to allow the report to complete successfully as it timed out internally and resulted in the report completing in Error. This means I'm not exactly sure how we could have resolved it, yet I have some suspicion that if we could have expanded the maximum number of extents for this segment we would have seen the session resume and move along. I'll put this down under as a mystery which needs research, because we need to learn more about it for next time when the report in question is something a lot more important and we can't go back to Oracle with an SR to fix.
delete from GL_INTERFACE
where (user_je_source_name, nvl(group_id, -1)) in (listing of ledgers) and status = 'PROCESSED'
and request_id = user request
and rownum <= 5000
and (user_je_source_name, nvl(group_id, -1)) IN (SELECT src.user_je_source_name, nvl(group_id,-1) FROM gl_interface_control ic, gl_je_sources src WHERE ic.je_source_name = src.je_source_name
AND ic.interface_run_id = run id
AND ic.set_of_books_id = set of books id
AND ic.status = 'I'
AND nvl(ic.processed_table_code,'D') != 'S')
I can see in the table dba_outstanding_alerts information about this session as such:
Operation on resumable session Session 1579, Instance 1 session id 1579 suspended because of errors in rollback segment _SYSSMU1260_2942144267$. Error message is ORA-01628: max # extents (32765) reached for rollback segment _SYSSMU1260_2942144267$
Now the error message seems to be straightforward, and My Oracle Support notes ID 761848.1 & 788060.1 give some anecdotal evidence for what is going on, but we weren't able to get the root cause addressed in time to allow the report to complete successfully as it timed out internally and resulted in the report completing in Error. This means I'm not exactly sure how we could have resolved it, yet I have some suspicion that if we could have expanded the maximum number of extents for this segment we would have seen the session resume and move along. I'll put this down under as a mystery which needs research, because we need to learn more about it for next time when the report in question is something a lot more important and we can't go back to Oracle with an SR to fix.
Subscribe to:
Posts (Atom)