Wednesday, July 27, 2005

UNDO stats watching...

After we had some UNDO woes here at work, I had to go through the ugly process of figuring out how much UNDO space we could use and how to set our UNDO_RETENTION parameter. Oh, is this a confusing and frustrating process.

Here's some statistics of our current systems without the gory detail:

17 GB of UNDO tablespace and no room to grow.
18000 for the UNDO_RETENTION parameter.
A 3K+ second MAX(V$UNDOSTAT.MAXQUERYLEN)
Data from 8 days ago (rolling) to current in V$UNDOSTAT.
We periodically get ORA-01555 (snapshot too old) and ORA-30036 (unable to extend undo segment) errors.

Here's what I have discovered so far:
  1. Never read too far in posts on the internet. Flame wars distract you too much. No matter how entertaining it is to read Howard J. Rogers argue with Joel Garry, find what you need and get out. Time's 'a wastin'.

  2. UNDO is an inexact science.

  3. Oracle has a handle on things quite well.

  4. Oracle has a funky way of figuring out how much UNDO space you will need for a given UNDO_RETENTION parameter.

  5. Everyone has interesting ways to calculate UNDO space and the "optimal" UNDO_RETENTION parameter.

From Swiss company Akadia AG, I was able to derive some interesting statistics using some of the scripts they provided.
There formula for calculating optimal UNDO_RETENTION is this:
OPTIMAL UNDO RETENTION = ACTUAL UNDO SIZE / ( DB_BLOCK_SIZE * UNDO_BLOCK_PER_SEC )
This script will show the actual undo size:
SELECT SUM(a.bytes) "UNDO_SIZE"
FROM v$datafile a,
v$tablespace b,
dba_tablespaces c
WHERE c.contents = 'UNDO'
AND c.status = 'ONLINE'
AND b.name = c.tablespace_name
AND a.ts# = b.ts#;

In my case, this is the result:
   UNDO_SIZE
----------
1.7826E+10
(I think that equals 17GB...)

Next we get our UNDO blocks per second:
SELECT MAX(undoblks/((end_time-begin_time)*3600*24))
"UNDO_BLOCK_PER_SEC"
FROM v$undostat;

And again, my results:
UNDO_BLOCK_PER_SEC
------------------
651.855
And finally, the DB_BLOCK_SIZE:
SELECT TO_NUMBER(value) "DB_BLOCK_SIZE [KByte]"
FROM v$parameter
WHERE name = 'db_block_size';

And of course, my results one more time:
DB_BLOCK_SIZE [KByte]
---------------------
8192
Now using their formula, I get (I fixed the UNDO_SIZE number for this calculation):
 17825792000 / ( 651.855 * 8192 ) = 3338.17

Here's the script to do all of this at once:
SELECT d.undo_size/(1024*1024) "ACTUAL UNDO SIZE [MByte]",
SUBSTR(e.value,1,25) "UNDO RETENTION [Sec]",
ROUND((d.undo_size / (to_number(f.value) *
g.undo_block_per_sec))) "OPTIMAL UNDO RETENTION [Sec]"
FROM ( SELECT SUM(a.bytes) undo_size
FROM v$datafile a,
v$tablespace b,
dba_tablespaces c
WHERE c.contents = 'UNDO'
AND c.status = 'ONLINE'
AND b.name = c.tablespace_name
AND a.ts# = b.ts#
) d,
v$parameter e,
v$parameter f,
( SELECT MAX(undoblks/((end_time-begin_time)*3600*24))
undo_block_per_sec
FROM v$undostat
) g
WHERE e.name = 'undo_retention'
AND f.name = 'db_block_size'
/
And the output from this:
ACTUAL UNDO SIZE [MByte]
------------------------
17000

UNDO RETENTION [Sec]
-------------------------
18000

OPTIMAL UNDO RETENTION [Sec]
----------------------------
3338
Now, after all that, I still am nowhere. If I set the UNDO_RETENTION to 3338, we will get ORA-01555 errors. If I leave it at 18000, eventually we will get ORA-30036 as the UNDO tablespace fills up.

I would like to know how I can associate the current AND active data in V$UNDOSTAT with the current ACTIVE and UNEXPIRED data in DBA_UNDO_EXTENTS. How can I tell what is active and what is expired in V$UNDOSTAT? Or am I looking in the wrong place?

I'm giving this to Tom...