ALL about HCC

If you try to find out what is HCC and how it works you could start reading the documentation, then some books, blog posts and at the end you will have to put all together. In this post I’ll do exactly this. Put all together. Starting with the basic and going through the internals with examples.

HCC stands for Hybrid Columnar Compression. This is the kick ass compression of Oracle Database, which can provide you up to 15x compression ratio. It can be used only on SPARC SuperCLuster,Exadata,Pillar Axiom and ZFSSA.
Well, that was the marketing part. Now let see how it works.


The most important thing that we shall know about HCC tables is that they are storing the data in columnar format. When we insert rows in the table, they are transformed to columns and written to the disk in logical units named Compression Unit. The CU is typically 32kb and the consistency/locking is implemented in them. In other words, if we want to update a row, the whole CU is locked. That means locking 32kb of compressed rows. That sounds bad, but in general updating compressed data is a bad idea. For exceptional cases could be fine. Additionally, if we are updating a row or inserting without direct path, the rows are stored in row format, in separate block, and compressed with OLTP compression. So if we want to apply HCC , we shall use direct path operations.

To make it easier I made a picture:

In here we can see that the CU is based on four 8kb /default/ DB blocks and each block contains different columns or even part from the column /if its big one/ in compressed format.


HCC is providing us 4 levels of compression:
QUERY LOW – lowest
ARCHIVE HIGH – highest

Lets take a closer look.

oracle  oracle             [.] lzopro_lzo1x_1_12_compress_core
--- lzopro_lzo1x_1_12_compress_core

oracle  oracle             [.] deflate_slow
--- deflate_slow
   |--------- deflate
   |          kgcczlibdo
   |          kgcczlibpseudodo
   |          kgccdo

From here we can see that QUERY LOW is using LZO and QUERY HIGH is using ZLIB.


oracle  oracle              [.] deflate_slow
--- deflate_slow
   |--------- deflate
   |          kgcczlibdo
   |          kgcczlibpseudodo
   |          kgccdo

oracle  oracle             [.] fallbackSort
--- fallbackSort

And from here we can see that the ARCHIVE LOW is using again ZLIB as QEURY LOW /actually the result with QUERY LOW and ARCHIVE LOW is the same or almost or the same/ and that ARCHIVE HIGH is using BZIP2.

Now if we take a loser look at ARCHIVE HIGH, we will notice something interesting. Before compressing, oracle is additionally sorting the data. When I’m saying sorting I don’t mean the whole segment. It’s sorting just the block. /check line 15/


One of the biggest problems with HCC is the single row access. When we want to take just a single row from the table, based on range or unique index scan, we have to decompress the whole CU. That sounds like a lot of work just for a single row.
There is another issue when we have an index range scan that is returning many rows. In this case we can have a situation in which we are decompressing the same CU more than once. This is because the index range scan is returning the rowids ordered by the value of the column, not by the rowid. So the firs row can be in CU[1], the second in CU[5], the third in CU[1] again and so on. In that case we can have a lot of overheat.
There are few solutions for such situations. The first one is sorting our data before compressing. This is the my recommendation, because we can achieve better compression ratio if the data is sorted. As we already know ARCHIVE HIGH is doing this per block, so we can even “exchange” some CPU between the operations 🙂
The world is not “pinky”, unfortunate it’s not always possible to sort our data before compressing it. However, in such situations we can use a hint:

NAME                             INVERSE                          VERSION_OUTLINE
-------------------------------- -------------------------------- -------------------------

This hint is adding one additional step in our execution plan which is ordering the returned rows from the index by rowid:

SQL> select /*+ CLUSTER_BY_ROWID(t) */ count(b)  from hcc t where a='asdf';


Execution Plan
Plan hash value: 2611202939

| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT             |         |     1 |     8 |   532   (0)| 00:00:07 |
|   1 |  SORT AGGREGATE              |         |     1 |     8 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| HCC     |   336 |  2688 |   532   (0)| 00:00:07 |
|   3 |    SORT CLUSTER BY ROWID     |         |   336 |       |     5   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN         | HCC_IDX |   336 |       |     5   (0)| 00:00:01 |

However, in version 12.1 we don’t need the hint. There is a parameter named “_optimizer_cluster_by_rowid” which is controlling this feature, and in 12c is enabled by default.

Of course if we are running lower version than, we can use a query which is first reading and ordering the rowids returned by the index and then accessing the table.



One thought on “ALL about HCC

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s