Friday, August 2, 2013

Database FLASH exhaustion

Have you seen a database exhaust FLASH before?  Well if not, here's an introduction!

New errors detected in "/u01/app/oracle/diag/rdbms/db/dbnode/trace/dbname_arc3_17851.trc":
===========================================================
ORA-19816: WARNING: Files may exist in db_recovery_file_dest that are not known to database.
ORA-17502: ksfdcre:4 Failed to create file +FLASH
ORA-15041: diskgroup "FLASH" space exhausted

During this time, we have flashback logs which are using up more than a TB of space which our DBA found by going into ASMCMD to see that FLASH is consuming a lot of space:

[oracle@server trace]$ asmcmd
ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  4194304  15593472  6107124          5197824          454650              0             N  DATA/
MOUNTED  NORMAL  N         512   4096  4194304    894720   893272           298240          297516              0             Y  DBFS_DG/
MOUNTED  NORMAL  N         512   4096  4194304   3896064   626852          1298688          
-335918              0             N  FLASH/
ASMCMD> exit

Here the DBA can see how much space is used by the flashback logs:

ASMCMD> du FLASHBACK/
Used_MB      Mirror_used_MB
891468             1782936

What does this mean to us?  Well, this is a typical snapshot of our system at this time (with our handy dandy alert we created months earlier):

Waits                                                                 Wait Time
  Backup: MML write backup piece                   5482.197026
  SQL*Net message from dblink                        4727.746732
  TCP Socket (KGAS)                                      762.1687

This is a snapshot of our system when it starts running out of FLASH:

Waits                                                                 Wait Time
  statement suspended, wait error to be cleared   212154.60987
  inactive transaction branch                                11951.353872
  SQL*Net message from dblink                         2742.039774
  TCP Socket (KGAS)                                       1175.376008
  SQL*Net break/reset to client                          418.624794

The DBA cleaned up archive logs older than 24 hours, and now we have space to grow and those sessions which were suspended are now able to resume activity:

ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  4194304  15593472  6107108          5197824          454642              0             N  DATA/
MOUNTED  NORMAL  N         512   4096  4194304    894720   893272           298240          297516              0             Y  DBFS_DG/
MOUNTED  NORMAL  N         512   4096  4194304   3896064  1393468          1298688           47390              0             N  FLASH/
ASMCMD>

As the DBA tells us, if we use our archive application and delete 300 GB deletion of data from the DATA diskgroup, it gets into the FLASH diskgroup in the flashback logs as 600 gigs with mirroring.  This results in the 1.7TB of usage by flashback logs which is abnormal and as you can see takes up too much space.

No comments:

Post a Comment