The Oracle EMT: Hotsos Symposium 2014

Wednesday, March 5, 2014

Hotsos Symposium 2014 - Day 3

Following a trend, I had breakfast and then onto the show!

8:30 AM - Leveraging in-memory storage to overcome Oracle Database PGA memory limits - Alex Fatkulin - @AlexFatkulin

Alex starts with the observation that data trends have changed over the years, and bloat in the database is not unexpected.
The idea of legacy Oracle code vs. 64 bit OS innovation.
"Kill it with iron" is a great mantra!
A short PGA memory overview which honestly had some new stuff for me, but don't worry I'll be tackling learning more about Oracle internals this year!
Review of SQL work areas - hash area, sort area, bitmap area.
ORA-02017 was an error code he received when setting memory values. Look it up, the message will surprise you!
Alex showed us that he was on a 64 bit system, but Oracle will only accept a 32 bit value for a particular memory parameter.
Why can Oracle accept 1TB for another memory parameter? There are a bunch of hidden parameters related to PGA_AGGREGATE_TARGET that get set.
I thought that Alex had a really good graph to tell his story as well.
Importantly, he pointed out that per process limits are multiplied by the Degree of Parallelism. In that case, an Exadata X3-8 system would need a DOP set to 768 to use the PGA of 1.5TB for processes.
Review MOS note 453540.1 / patch 17951233 for how to override memory limits.
Can we accept memory spilling as a fact of life?
There are ways to radically improve TEMP I/O:
In-memory file system
SAN/NAS LUN with write-back cache
Linux tmpfs via loop device
InfiniBand SRP with ZFS storage device had incredible payoffs so I wonder if we can replicate that in our space.
I loved seeing the spilling to temp!

9:45 AM - Profiling the Logwriter and Databasewriter - Frits Hoogland - @fritshoogland

lgwr and dbwr are the process names at the database level, and this presentation is about what their wait events tell you at a pretty detailed technical level.
lgwr manages the redolog buffer and Frits was able to run oradebug on the lgwr process.
"rdbms ipc message" indicates a sleep/idle event.
lgwr looks every 60 seconds for PMON existence, if it doesn't find it the instance gets terminated.
I really enjoyed his expectations vs. actual graph because he was very honest about the difference!
semctl system call wakes up lgwr so it submits an io_submit call.
Frits was genuinely excited about the DB not working as intended, which was plain as day and I love that he feels that sense of thrill just as I and others do when we have problems or can't explain something!
When he put overhead on the process, he created an edge case which made me think of Schrödinger's cat.
The parameter "_use_adaptive_log_file_sync" is automatically set to TRUE in 11.2.0.3.
The wait event "log file parallel write" is a sign lgwr is waiting!
It was really wonderful seeing latches I've never seen before!
pwrite64 is the parallel write io subprocess.
I thought it was awesome that Frits might have a bug which may be fixed in 12.1.0.2!
Logswitch events!
"_disable_logging" is a forbidden switch but all Frits could see was a lack of writing as a result.
The processes oss_write (stands for Oracle Storage System) and oss_wait are what you will see on an Exadata system.
DBWn writes dirty buffers and looks for PMON to be alive too!
"DB file async I/O submit" was a new event he wasn't expecting. This did not exist prior to 11.2.0.2 and while he thought it may have been instrumentation of io_submit it wasn't.
"DB file parallel write" is not physical I/O indicator or I/O latency timing, this is the timing of what's gone on before.

This was a really great presentation and gave me some new insight on how to dive deeper into the mechanics of what Oracle is doing behind the scenes!

11:00 AM - Journey to the Center of the Database - Ric Van Dyke - @RicVanDyke

Not knowing a whole lot about database internals, I wanted to attend this session to pick up what might be some common knowledge while Ric dives into a bigger issue with his presentation

First he starts off by asking what is a data block, and comes back with an answer that most are blocks representing table or data.
The size of your blocks depends on the DB_BLOCK_SIZE parameter, and different block sizes are designed for different purposes such as OLTP versus DW.
Block overhead is about 100 bytes which contains a header, table directory, and row directory.
What is a rowid really? Ric points out that it isn't what we think it is.
He then takes us into the row! I thought it was pretty awesome how he was doing a dump of the blocks and then putting it back together again to see what it said.
Trailing nulls are not stored in a block, so they should be put at the end of your rows to save space.
Test scripts!
Number format doesn't show up in block dump.
Character data is stored in reverse order…sort of
Did you know the old row of data still exists after update in the block if the whole row is updated, instead of a single column being re-written? Pretty cool to see him prove it out! Because of this, there are redo log considerations.
Ric had a good breakdown of the differences of the pros for block sizes!
He keeps underscoring how you need to test your environment to see how your indexes (and data) run with different block sizes.
There is a new 12c feature that introduces batched indexes to grab a set of rowids and associated data! Very cool!
His example showed block size being increased from 4k to 16k, and as expected the consistent gets went down, but the time taken went up.
From the audience, Ric was taught something that he taught the audience member before! Priceless!
PCTFREE vs PCTUSED - I think this is likely why my Workflow purging hit the HWM issues it did!

1:00 PM - Understanding How Adaptive Cursor Sharing (ACS) Produces Multiple Optimal Plans - Carlos Sierra - @csierra_usa

Carlos started out with showing us allies to plan flexibility, and several plan stability tools which are in sharp contrast to each other.
Reiterates that with cursor sharing, you use bind variables to reduce hard parsing but the problem is that different values in your binds changes your return results so plans eventually change from good to bad.
With ACS, introduced in 11g, you can have multiple optimal plans per SQL depending on the selectivity of the predicates.
ACS has overhead, so it shouldn't be used "globally" and bind variable statements are candidates which are eventually categorized as "Bind Aware".
SQL becomes Bind Aware when the rows processed change between executions substantially or oscillates significantly.
V$SQL - columns: is_sharable, is_bind_sensitive, is_bind_aware
V$SQL_CS_STATISTICS - rows processed
V$SQL_CS_HISTOGRAM - buckets S/M/L (S - less than 1k, M - 1k to 1m, L - 1m and up)
V$SQL_CS_SELECTIVITY - selectivity profile
Live demo! This was really interesting, and I can't wait to play around with the scripts Carlos will share!
The hints /*+ BIND_AWARE */ and /*+ NO_BIND_AWARE */ help to force ACS to do what we want if we can't change the code.
With ACS, there is the idea of plan selection but it is not permanent and SQL Plan Management (SPM) gives you that permanency.

2:15 PM - Creative Problem Solving - Kerry Osborne - @KerryOracleGuy

There was SO much that Kerry brought up in his presentation, it feels like if I told you everything, it'd give away the ending and you wouldn't get as much out of it when you get a chance to hear him talk. As a result, I'm going to narrow his talk down to just a few points that were really relevant to me.

Can we teach people to be better problem solvers? I believe so, but I'm probably biased because I've trained a lot of people and I like to think I've helped to make some of them better problem solvers.
Kerry loves working with people, but hates working with people which I think is very insightful about him, but speaks a lot about our profession choices as well because we can choose how we act even though we know our preferences.
What makes a good problem solver? Kerry boiled this down to 4 pillars, but one I disagree with is needing to be optimistic (if you know me, you'll know why I disagree). I think depending on your job/role, you can absolutely be pessimistic and an effective problem solver as you can be optimistic about yourself or your skills and be pessimistic about situations or actions that others take.
I thought his statement that it takes 10,000 hours of practice before you start getting good at something very interesting. That's 5 years if you do the same thing for 40 hours a week for 50 weeks out of the year. Who does the exact same thing over and over like that? Especially in my line of work of Support where I do a little of this, and a little of that, but not a lot of it the same at any one time to get repetitive practice at it. Maybe time to find something for my toolbox that I can get practice at, and get better at huh?
Problems are complex and have multiple solutions. Sounds a lot like what Cary Millsap says a lot right?
Can we overcome who we naturally are, to get better? Yes, and there are mind tricks we can use to improve our odds.
The effects of stress are typically negative, but it is hard to totally avoid stress so there are several techniques that can be used to lessen that effect on your work product.

It may sound like I disagree with Kerry, but honestly he makes a lot of good points and it's made me look at some things in a different light.

The Oracle EMT

Wednesday, March 5, 2014

Hotsos Symposium 2014 - Day 3

No comments:

Post a Comment