Exadata – Storage Indexes: what we must know?

One of the biggest advantages of Exadata are the Storage Indexes. Unfortunately Oracle is not providing us a lot of information about them and if you dig a bit in the net, you will find a lot creepy stories on how they are supposed to work. So instead of making yet another story, I’ll demonstrate everything with examples.

STORAGE INDEXES BASICS

Storage indexes are in memory structures which you can find in Exadata or SPARC SuperCluster storage nodes. They are created and maintained from cellsrv process. The process which makes Exadata special. Storage Indexes are fully automated and enabled by default. One of the most important things that you shall know about them is that they are never flushed to the disk, so they are not persistent between restarts. When I say never flushed to the disk I mean even to the swap, so there is no place for worries that you will end with I/O for reading Storage Index region.
Cellsrv uses them to eliminate I/O operation in case of Full Table or Fast Full Index scans. This can happen only when smart scan /offloading/ is in action and the query predicate is one or combination of the following =<,>,IS NULL,IS NOT NULL.


STORAGE INDEX STRUCTURE

The structure of Storage Indexes is basic, but different from the conventional structure of the B*tree or BitMap indexes. It’s important first to understand their structure in order to understand how they actually work.

The Storage Index is a logical structures made from range of Index Regions. The Index Region is in memory structure which is tracking the data distribution in a Storage Region. And the Storage Region is a logical division of the storage server space. All Storage Regions are 1MB in size and all ASM AU are aligned to them. So if we have AU with size of 4MB /The Exa recommendation/, we will have 4 Storage Regions with 4 Index Regions per AU.
So from here we can understand that the IRs /Indexes Region/ are independent from the database segment and that the IRs are only related with the SRs /Storage Region/.
Each IR tracks the data distribution for a maximum of 8 columns. And for each column it stores statistics for the MIN and the MAX value that it contain. Those statistics are based only on the first 8 characters of the value. Additionally it contains NULL flag for each column.

To make it easier for understanding I made a picture:
SI

So when we run a query and we have valid predicate on SI indexed column, first we check the IR for the SR and then we perform read, only if the desired value is between the MIN and the MAX value or satisfy the NULL clause.


STORAGE INDEXES CREATION

As we know, the Storage Indexes are created automatically by cellsrv when the data is read from the disk.
Lets just see it in action:

00000656: BEFORE UPDATING RIDX SUMMARY
00000658: SQLID 0ury85nym5qkr Dumping RIDX for Read before update
RIDX(0x7f2a5eeb8530) : st 1(RIDX_INVALID) validBitMap 0 tabn 0 id {0 0 0}
RIDX: strt 0 end 0 offset 0 size 0 rgnIdx 0 RgnOffset 0 scn: 0x0000.00000000 hist: 0
RIDX validation history:
0:Undef 1:Undef 2:Undef 3:Undef 4:Undef 5:Undef 6:Undef 7:Undef 8:Undef 9:Undef
OCL_RIDX: 0x7f2a5eeb8528 magic:caf1 rgnidx:2149 rgnhdlid:1state: 32 invalidatereason:7f2a5eeb8530 RIDX:2014-02-09 20:43:18.760255 :0000065A: SQLID 0ury85nym5qkr ridxp 0x7f2a5eeb8530 [st: 1] (0, 2048) [io strt 0 end 2048] isRead 1 newColCached 1 newColFound 0 someColNotFound 0 isRidxValid 0 overWriteSummary 1 colInRidxNotInPredCol 0 objd 87427 RIDX:diskOffset 2253389824 RIDXCtx:diskOffset 2253389824 RIDXCtx:ioSize 1048576 regionSize 1048576
0000065B: AFTER UPDATING RIDX SUMMARY
RIDX(0x7f2a5eeb8530) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {87427 7 2087871281}
RIDX: strt 0 end 2048 offset 2253389824 size 1048576 rgnIdx 2149 RgnOffset 0 scn: 0x0000.0011fdcb hist: 2
RIDX validation history:
0:FullRead 1:Undef 2:Undef 3:Undef 4:Undef 5:Undef 6:Undef 7:Undef 8:Undef 9:Undef
Col id [2] numFilt 4 flg 2 (HASNONNULLVALUES):
lo: 41 43 43 45 53 53 24 0
hi: 74 69 74 6c 65 39 33 5f

/* For information how to dump indexes check this post. */

Nothing surprising, an empty RI has been updated with the MIN/MAX/NULL values for the column 2 for object 87427 thanks to sqlid 0ury85nym5qkr.

Now lets take a look at this one:

RIDX(0x7ffd3aecb474) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {87410 4 2087871281}
RIDX: strt 0 end 2048 offset 1874853888 size 1048576 rgnIdx 1788 RgnOffset 0 scn: 0x0000.0011b157 hist: 2
RIDX validation history:
0:FullRead 1:Undef 2:Undef 3:Undef 4:Undef 5:Undef 6:Undef 7:Undef 8:Undef 9:Undef
Col id [4] numFilt 4 flg 2 (HASNONNULLVALUES):
lo: 43 4c 55 53 54 45 52 0
hi: 54 59 50 45 32 20 55 4e
RIDX(0x7ffd3aec98f4) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {87410 4 2087871281}
RIDX: strt 0 end 2048 offset 1876951040 size 1048576 rgnIdx 1790 RgnOffset 0 scn: 0x0000.0011b157 hist: 92
RIDX validation history:
0:FullRead 1:FullRead 2:FullRead 3:Undef 4:Undef 5:Undef 6:Undef 7:Undef 8:Undef 9:Undef
Col id [1] numFilt 3 flg 2 (HASNONNULLVALUES):
lo: 41 50 45 58 5f 30 33 30
hi: 58 44 42 0 0 0 0 0
Col id [2] numFilt 2 flg 2 (HASNONNULLVALUES):
lo: 41 4c 45 52 54 5f 51 54
hi: 78 64 62 2d 6c 6f 67 31
Col id [4] numFilt 4 flg 2 (HASNONNULLVALUES):
lo: 43 4c 55 53 54 45 52 0
hi: 54 41 42 4c 45 20 53 55

Now that’s quite interesting, here we can see 2 different IR: 0x7ffd3aecb474 and 0x7ffd3aec98f4 for the same object id: 87410 in the same tablesapce: 4 in the same database: 2087871281, but different columns have been indexed? The second IR: 0x7ffd3aec98f4 have 2 columns more than the first one.
That’s seems to prove our theory about the SI structure. But why is this happening?
Oracle says that SI/IR creation/update is based on an algorithm which is checking the density of your data in a particular column, and based on that it decides whether to index the column or not. So in other words, well ordered data can have more benefit from SI. And actually there are also other important benefits in other situations from ordered data. Indeed not only from storage indexes..


STORAGE INDEXES MAINTENANCE

The SI are maintained automatically. So if we have a RI with 2 columns and we want to add another it shall be the same as the creation:

:00160A68: BEFORE UPDATING RIDX SUMMARY
00160A69: SQLID ccbbkshjv9rkj Dumping RIDX for Read before update
RIDX(0x7ffd3aece728) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {87427 7 2087871281}
RIDX: strt 0 end 2048 offset 2303721472 size 1048576 rgnIdx 2197 RgnOffset 0 scn: 0x0000.0011fd4a hist: 12
RIDX validation history:
0:FullRead 1:FullRead 2:Undef 3:Undef 4:Undef 5:Undef 6:Undef 7:Undef 8:Undef 9:Undef
Col id [1] numFilt 3 flg 2 (HASNONNULLVALUES):
lo: 41 50 45 58 5f 30 33 30
hi: 58 44 42 0 0 0 0 0
Col id [2] numFilt 4 flg 2 (HASNONNULLVALUES):
lo: 41 43 43 45 53 53 24 0
hi: 78 64 62 2d 6c 6f 67 31
OCL_RIDX: 0x7ffd3aece720 magic:caf1 rgnidx:2197 rgnhdlid:2state: 32 invalidatereason:7ffd3aece728 RIDX:2014-02-09 20:29:05.620827 :00160A6C: SQLID ccbbkshjv9rkj ridxp 0x7ffd3aece728 [st: 2] (0, 2048) [io strt 0 end 2048] isRead 1 newColCached 1 newColFound 1 someColNotFound 0 isRidxValid 1 overWriteSummary 1 colInRidxNotInPredCol 1 objd 87427 RIDX:diskOffset 2303721472 RIDXCtx:diskOffset 2303721472 RIDXCtx:ioSize 1048576 regionSize 1048576
00160A72: AFTER UPDATING RIDX SUMMARY
RIDX(0x7ffd3aece728) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {87427 7 2087871281}
RIDX: strt 0 end 2048 offset 2303721472 size 1048576 rgnIdx 2197 RgnOffset 0 scn: 0x0000.0011fd4a hist: 92
RIDX validation history:
0:FullRead 1:FullRead 2:FullRead 3:Undef 4:Undef 5:Undef 6:Undef 7:Undef 8:Undef 9:Undef
Col id [1] numFilt 2 flg 2 (HASNONNULLVALUES):
lo: 41 50 45 58 5f 30 33 30
hi: 58 44 42 0 0 0 0 0
Col id [2] numFilt 3 flg 2 (HASNONNULLVALUES):
lo: 41 43 43 45 53 53 24 0
hi: 78 64 62 2d 6c 6f 67 31
Col id [8] numFilt 4 flg 2 (HASNONNULLVALUES):
lo: c2 2 1d 0 0 0 0 0
hi: c3 a c 1d 0 0 0 0

And if we have an update to some of the columns in the SR…

--the RI before update of column[1] /not indexed/

RIDX(0x7fad85ecfe1c) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {87427 7 2087871281}
RIDX: strt 32 end 2048 offset 2156937216 size 1032192 rgnIdx 2057 RgnOffset 16384 scn: 0x0000.0012262f hist: 1
RIDX validation history:
0:PartialRead 1:Undef 2:Undef 3:Undef 4:Undef 5:Undef 6:Undef 7:Undef 8:Undef 9:Undef
Col id [2] numFilt 4 flg 2 (HASNONNULLVALUES):
lo: 41 43 43 45 53 53 24 0
hi: 78 64 62 2d 6c 6f 67 31

--the RI after update of column[1] /not indexed/

0006B4E5: BEFORE UPDATING RIDX SUMMARY
0006B4EB: SQLID dy3s2ghkxdasu Dumping RIDX for Read before update
RIDX(0x7fad85ecfe1c) : st 1(RIDX_INVALID) validBitMap 0 tabn 0 id {0 0 0}
RIDX: strt 0 end 0 offset 0 size 0 rgnIdx 0 RgnOffset 0 scn: 0x0000.00000000 hist: 0
RIDX validation history:
0:Undef 1:Undef 2:Undef 3:Undef 4:Undef 5:Undef 6:Undef 7:Undef 8:Undef 9:Undef
OCL_RIDX: 0x7fad85ecfe14 magic:caf1 rgnidx:2057 rgnhdlid:1state: 1 invalidatereason:7fad85ecfe1c RIDX:2014-02-09 22:06:03.932207 :0006B4F0: SQLID dy3s2ghkxdasu ridxp 0x7fad85ecfe1c [st: 1] (32, 2048) [io strt 32 end 2048] isRead 1 newColCached 1 newColFound 0 someColNotFound 0 isRidxValid 0 overWriteSummary 1 colInRidxNotInPredCol 0 objd 87427 RIDX:diskOffset 2156937216 RIDXCtx:diskOffset 2156937216 RIDXCtx:ioSize 1032192 regionSize 1032192
0006B4F3: AFTER UPDATING RIDX SUMMARY
RIDX(0x7fad85ecfe1c) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {87427 7 2087871281}
RIDX: strt 32 end 2048 offset 2156937216 size 1032192 rgnIdx 2057 RgnOffset 16384 scn: 0x0000.0012262f hist: 1
RIDX validation history:
0:PartialRead 1:Undef 2:Undef 3:Undef 4:Undef 5:Undef 6:Undef 7:Undef 8:Undef 9:Undef
Col id [2] numFilt 4 flg 2 (HASNONNULLVALUES):
lo: 41 43 43 45 53 53 24 0
hi: 78 64 62 2d 6c 6f 67 31

We will find that the whole RI is invalidated just from update to a column which is not even included in the RI! Then the next time when you run your query the RI is updated again. Crappy…


Kovachev

About these ads

4 thoughts on “Exadata – Storage Indexes: what we must know?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s