8 reasons to sort your data

Time. The most precious resource in the world.
Everyone measures with it. Even the optimizer.

In this post I will speak only for it, the time. Or more specifically our database time and how
spending more at the moment can save you a lot in future.

Most people are not sorting their data, because they think for it as “pay now or pay later”.
That’s obvious and true. But it is important how many times you’ll pay later and would you
benefit if you pay it now.

In Oracle Database you can gain performance benefits from sorted data in many situations. Here I’m representing 8 of them.
Maybe the most important. But be sure that there are more.

Before we begin, there are few things which you should know. The first is our
test environment. It’s based on four tables: BIG, SMALL, BIG_SORTED and SMALL_SORTED.
The content is randomly generated and the only difference is in the order of the data.
The BIG% tables have 10 000 000 rows each and the SMALL% ones 5000 rows each.
Also the BIG% tables have around 400 000 unique values in each column and SMALL% tables around 4900. You can find the script to create them in here. I also hit a quite interesting issue, while I was making those examples. I had a lot of block gaps on one of my tables. The result was a huge performance difference. So in case of gaps difference, use the “gap query” in the script, to check and then move/rebuild you objects.

The second thing is to be able to read and analyze tkprof-ed trace files. Most developers are not familiar with it, which is normal. In general it’s just a sql explain plan with a bit more statistics. The most important statistics are explained here. I’m sure that most people will manage to analyze the output without extra reading, but in case there are questions, check the link or ask.

Now for the protocol :)

SQL> /

OWNER TABLE_NAME               COLUMN_NAME NUM_DISTINCT   NUM_ROWS     BLOCKS
----- ------------------------ ----------- ------------ ---------- ----------
X     BIG                      A                 452128   10000000      41761
X     BIG                      B                 452128   10000000      41761
X     BIG                      C                 452128   10000000      41761
X     BIG                      D                 452128   10000000      41761
X     BIG                      E                 452128   10000000      41761
X     BIG_SORTED               A                 452128   10000000      41761
X     BIG_SORTED               B                 452128   10000000      41761
X     BIG_SORTED               C                 452128   10000000      41761
X     BIG_SORTED               D                 452128   10000000      41761
X     BIG_SORTED               E                 452128   10000000      41761
X     SMALL                    A                   4975       5000         25
X     SMALL                    B                   4973       5000         25
X     SMALL                    C                   4972       5000         25
X     SMALL                    D                   4968       5000         25
X     SMALL                    E                   4977       5000         25
X     SMALL_SORTED             A                   4975       5000         25
X     SMALL_SORTED             B                   4973       5000         25
X     SMALL_SORTED             C                   4972       5000         25
X     SMALL_SORTED             D                   4968       5000         25
X     SMALL_SORTED             E                   4977       5000         25

20 rows selected.

SQL>

From here we can see that our tables blocks, rows and distinct values are equal.

SQL> /

SEGMENT_NAME                  NUM_GAPS
--------------------------- ----------
BIG_SORTED_IDX                      91
BIG                                109
SMALL                                4
BIG_SORTED                         109
BIG_IDX                             92
SMALL_SORTED                         4

6 rows selected.

SQL>

And from here we can see that our objects have more or less the same gaps.

Others:
Instance Version 11.2.0.3 /no patches applied/ with memory_target=384M;
The parameter “_serial_direct_read” is changed on always;
Buffer cache is flushed if it was used.

Enough introduction, lets start with joins!


HASH JOINS

In general hash join is working in the following way: Optimizer scans the first data source and build a hash table from it, which tries to hold in the memory. Of course if the hash table is too big, optimizer flush it to the temporary tablespace. Which means that you will spend additional I/O for scanning your hash table.
After the hash table is built and ready for use, optimizer reads the second data source and scans the hash table for matches for each returned row.

Now let’s try it with sorted and unsorted data:

SQL> select count(*) from big b1,big b2 where b1.a=b2.a;

  COUNT(*)
----------
 228891028

Elapsed: 00:01:38.47
SQL>
--trace:
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     45.49      98.45     113649      82996          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     45.49      98.45     113649      82996          0           1
--
--
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=82996 pr=113649 pw=30660 time=98457260 us)
 228891028  228891028  228891028   HASH JOIN  (cr=82996 pr=113649 pw=30660 time=157443762 us cost=40042 size=2211763040 card=221176304)
  10000000   10000000   10000000    TABLE ACCESS FULL BIG (cr=41498 pr=41495 pw=0 time=3540666 us cost=11398 size=50000000 card=10000000)
  10000000   10000000   10000000    TABLE ACCESS FULL BIG (cr=41498 pr=41494 pw=0 time=3526096 us cost=11398 size=50000000 card=10000000)
SQL> select count(*) from big_sorted b1, big_sorted b2 where b1.a=b2.a;

  COUNT(*)
----------
 228891028

Elapsed: 00:01:17.27
SQL>
--trace:
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     32.20      77.26     112839      82996          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     32.20      77.26     112839      82996          0           1
--
--
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=82996 pr=112839 pw=29850 time=77260163 us)
 228891028  228891028  228891028   HASH JOIN  (cr=82996 pr=112839 pw=29850 time=106631807 us cost=40042 size=2211763040 card=221176304)
  10000000   10000000   10000000    TABLE ACCESS FULL BIG_SORTED (cr=41498 pr=41495 pw=0 time=3560414 us cost=11398 size=50000000 card=10000000)
  10000000   10000000   10000000    TABLE ACCESS FULL BIG_SORTED (cr=41498 pr=41494 pw=0 time=3703476 us cost=11398 size=50000000 card=10000000)

As we can see we have more than 15% improvement with ordered data. If you take a look at line 22 and 16 you’ll notice that we have the same consistent gets in both cases, but less physical reads, writes and CPU time with ordered data.
The reason for that is the following:
In our case, the optimizer cannot hold the hash table in the memory, so it separates the hash table on hash partitions and flushes them to the disk. After that the optimizer reads the second table and for each returned row it scans only the appropriate hash partition, only if it is in memory. If the appropriate partition is not in the memory, the optimizer writes the value from the second table to the disk, separated on hash partitions again. After the second table is fully retrieved, the optimizer reads the second table hash partitions and joins them with the appropriate hash partitions from the first table. In that way, the optimizer makes less partition swapping while joining unordered data. But even with this optimization, less work will be done with ordered data.


SORT MERGE JOINS

With sort merge joins, the optimizer takes both data sources, sorts them if they are unsorted, and then joins them, while reading the first data source and scans the second data source for matches. If we have a duplicated join key on the first data source, instead of reading the second data source from the beginning, the optimizer seeks back till the last mach for that value.

Lets try it:

SQL>  select /*+ use_merge(b1,b2) */ count(*) from big b1,big b2 where b1.a=b2.a;

  COUNT(*)
----------
 228891028

Elapsed: 00:02:20.91
SQL>
--trace
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        2     70.58     140.86     137103      82996        108           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        5     70.58     140.87     137103      82996        108           1
--
--
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=82996 pr=137103 pw=54114 time=140866843 us)
 228891028  228891028  228891028   MERGE JOIN  (cr=82996 pr=137103 pw=54114 time=229097320 us cost=85246 size=2211763040 card=221176304)
  10000000   10000000   10000000    SORT JOIN (cr=41498 pr=56252 pw=14757 time=26190152 us cost=42121 size=50000000 card=10000000)
  10000000   10000000   10000000     TABLE ACCESS FULL BIG (cr=41498 pr=41495 pw=0 time=3011777 us cost=11398 size=50000000 card=10000000)
 228891028  228891028  228891028    SORT JOIN (cr=41498 pr=80851 pw=39357 time=129382717 us cost=42121 size=50000000 card=10000000)
  10000000   10000000   10000000     TABLE ACCESS FULL BIG (cr=41498 pr=41494 pw=0 time=4089417 us cost=11398 size=50000000 card=10000000)
SQL>  select /*+ use_merge(b1,b2) */ count(*) from big_sorted b1,big_sorted b2 where b1.a=b2.a;

  COUNT(*)
----------
 228891028

Elapsed: 00:01:36.51
SQL>
--trace
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        2     63.46      96.51     112489      82996          4           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        5     63.46      96.51     112489      82996          4           1
--
--
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=82996 pr=112489 pw=29500 time=96514764 us)
 228891028  228891028  228891028   MERGE JOIN  (cr=82996 pr=112489 pw=29500 time=190302417 us cost=85246 size=2211763040 card=221176304)
  10000000   10000000   10000000    SORT JOIN (cr=41498 pr=56245 pw=14750 time=61411512 us cost=42121 size=50000000 card=10000000)
  10000000   10000000   10000000     TABLE ACCESS FULL BIG_SORTED (cr=41498 pr=41495 pw=0 time=3315670 us cost=11398 size=50000000 card=10000000)
 228891028  228891028  228891028    SORT JOIN (cr=41498 pr=56244 pw=14750 time=91734768 us cost=42121 size=50000000 card=10000000)
  10000000   10000000   10000000     TABLE ACCESS FULL BIG_SORTED (cr=41498 pr=41494 pw=0 time=3753990 us cost=11398 size=50000000 card=10000000)

Well, that was quite obvious. No sorting, less I/O, less CPU and less time.
More than 15% faster with ordered data.


NESTED LOOP JOINS & Index Clustering Factor

Now nested loop joins are quite simple: For each row in the first data source, scan the second data source. It seems they don’t have some special algorithm that we can benefit in case of sorted data. But nested loop joints are mostly useful when the second data source is an index. It’s quite a different story with indexes. In general, when you scan the index, you are actually searching the data blocks in the table that hold the value. So if your data is ordered, that value will be stored next to each other, probably in the same data block. The optimizer is aware about that and have a metric named “Index Clustering Factor”, which tracks how ordered your table is, based on the indexed column. Or in other words “how many data blocks avg. I have to read from the table for each value in the index” – you can calculate that value dividing CLUSTERING_FACTOR by NUM_DISTINCT.

Lets create indexes and take a look at our clustering factor:

SQL> create index big_sorted_idx on big_sorted(a);

Index created.

SQL> create index big_idx on big(a);

Index created.

SQL>  select index_name,clustering_factor,t.blocks table_blocks,s.blocks index_blocks
from user_indexes i,user_segments s,dba_tables t
where i.index_name=s.segment_name
  4  and i.table_name=t.table_name;

INDEX_NAME                     CLUSTERING_FACTOR TABLE_BLOCKS INDEX_BLOCKS
------------------------------ ----------------- ----------- -----------
BIG_IDX                                  9997464       41761       23552
BIG_SORTED_IDX                             41494       41761       23552

SQL>

It is better when CF is closer to table blocks.

The result is obvious again, but lets try it anyway:

SQL> select /*+ use_nl(s1,b1)*/ count(*) from small s1, big b1 where b1.a=s1.a and s1.b=b1.b;

  COUNT(*)
----------
      5000

Elapsed: 00:02:20.32
SQL> 
--trace
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     16.31     140.31      95781     124450          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     16.32     140.31      95781     124450          0           1
--
--
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=124450 pr=95781 pw=0 time=140311741 us)
      5000       5000       5000   NESTED LOOPS  (cr=124450 pr=95781 pw=0 time=171671570 us)
    114207     114207     114207    NESTED LOOPS  (cr=10261 pr=5232 pw=0 time=23666325 us cost=125054 size=100000 card=5000)
      5000       5000       5000     TABLE ACCESS FULL SMALL (cr=23 pr=22 pw=0 time=6796 us cost=9 size=50000 card=5000)
    114207     114207     114207     INDEX RANGE SCAN BIG_IDX (cr=10238 pr=5210 pw=0 time=41274737 us cost=2 size=0 card=22)(object id 80682)
      5000       5000       5000    TABLE ACCESS BY INDEX ROWID BIG (cr=114189 pr=90549 pw=0 time=98802146 us cost=25 size=10 card=1)
SQL> select /*+ use_nl(s1,b1)*/ count(*) from small_sorted s1, big_sorted b1 where b1.a=s1.a and s1.b=b1.b;

  COUNT(*)
----------
      5000

Elapsed: 00:00:03.98
SQL>
--trace
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      1.87       3.97       9591      15390          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      1.87       3.97       9591      15390          0           1
--
--
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=15390 pr=9591 pw=0 time=3971954 us)
      5000       5000       5000   NESTED LOOPS  (cr=15390 pr=9591 pw=0 time=4011114 us)
    114207     114207     114207    NESTED LOOPS  (cr=10262 pr=4467 pw=0 time=1617366 us cost=15016 size=100000 card=5000)
      5000       5000       5000     TABLE ACCESS FULL SMALL_SORTED (cr=23 pr=22 pw=0 time=5250 us cost=9 size=50000 card=5000)
    114207     114207     114207     INDEX RANGE SCAN BIG_SORTED_IDX (cr=10239 pr=4445 pw=0 time=1867884 us cost=2 size=0 card=22)(object id 80681)
      5000       5000       5000    TABLE ACCESS BY INDEX ROWID BIG_SORTED (cr=5128 pr=5124 pw=0 time=2107968 us cost=3 size=10 card=1)

It was obvious, but it’s dramatic. Times, not percentages, times faster.

Does it come only from the clustering factor? Well, no.
Here we are benefiting a lot from the buffer cache, because our small table is also ordered, so we are rereading the same block from the buffer cache many times, till the next value arrives.
With unordered data, that can be almost impossible, in case of small buffer cache and high volume of reads/writes. As it is in our case.


GROUPING

Grouping and aggregating is strongly related to the order of the data. So in theory, ordered
data have to be grouped way faster.

Lets find out how faster:

SQL> select avg(count)
from (
select count(*) count from big group by a
union all
select count(*) count from big group by a
union all
select count(*) count from big group by a
);

AVG(COUNT)
----------
21.8829873

Elapsed: 00:01:04.11
SQL>
--trace:
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     20.39      64.08     172905     124494          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     20.39      64.08     172905     124494          0           1
--
--
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=124494 pr=172905 pw=48422 time=64080198 us)
   1370928    1370928    1370928   VIEW  (cr=124494 pr=172905 pw=48422 time=8630328 us cost=70221 size=17632992 card=1356384)
   1370928    1370928    1370928    UNION-ALL  (cr=124494 pr=172905 pw=48422 time=8116053 us)
    456976     456976     456976     HASH GROUP BY (cr=41498 pr=41495 pw=0 time=7460701 us cost=23407 size=2260640 card=452128)
  10000000   10000000   10000000      TABLE ACCESS FULL BIG (cr=41498 pr=41495 pw=0 time=4018842 us cost=11398 size=50000000 card=10000000)
    456976     456976     456976     HASH GROUP BY (cr=41498 pr=65705 pw=24211 time=25247428 us cost=23407 size=2260640 card=452128)
  10000000   10000000   10000000      TABLE ACCESS FULL BIG (cr=41498 pr=41494 pw=0 time=4452664 us cost=11398 size=50000000 card=10000000)
    456976     456976     456976     HASH GROUP BY (cr=41498 pr=65705 pw=24211 time=25186680 us cost=23407 size=2260640 card=452128)
  10000000   10000000   10000000      TABLE ACCESS FULL BIG (cr=41498 pr=41494 pw=0 time=4398624 us cost=11398 size=50000000 card=10000000)
SQL> select avg(count)
from (
select count(*) count from big_sorted group by a
union all
select count(*) count from big_sorted group by a
union all
select count(*) count from big_sorted group by a
);

AVG(COUNT)
----------
21.8829873

Elapsed: 00:00:11.93
SQL> 
--trace:
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      7.84      11.91     127087     124494          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      7.84      11.91     127087     124494          0           1
--
--
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=124494 pr=127087 pw=2604 time=11916323 us)
   1370928    1370928    1370928   VIEW  (cr=124494 pr=127087 pw=2604 time=4957463 us cost=70221 size=17632992 card=1356384)
   1370928    1370928    1370928    UNION-ALL  (cr=124494 pr=127087 pw=2604 time=4432693 us)
    456976     456976     456976     HASH GROUP BY (cr=41498 pr=41495 pw=0 time=3630661 us cost=23407 size=2260640 card=452128)
  10000000   10000000   10000000      TABLE ACCESS FULL BIG_SORTED (cr=41498 pr=41495 pw=0 time=4433780 us cost=11398 size=50000000 card=10000000)
    456976     456976     456976     HASH GROUP BY (cr=41498 pr=42796 pw=1302 time=3825611 us cost=23407 size=2260640 card=452128)
  10000000   10000000   10000000      TABLE ACCESS FULL BIG_SORTED (cr=41498 pr=41494 pw=0 time=3324196 us cost=11398 size=50000000 card=10000000)
    456976     456976     456976     HASH GROUP BY (cr=41498 pr=42796 pw=1302 time=4281550 us cost=23407 size=2260640 card=452128)
  10000000   10000000   10000000      TABLE ACCESS FULL BIG_SORTED (cr=41498 pr=41494 pw=0 time=3526446 us cost=11398 size=50000000 card=10000000)

Again, times faster. Take a closer look to our stats and you’ll realize that the time difference is coming from “SORT AGGREGATE” and “HASH GROUP BY” steps.


ORDERING QUERY RESULTS

Well, I wont make an example for it, it’s pointless. But I want to point out something really important here. I often come across developers who wonder in which order they would present the query output to the client and based on that they are taking decision on what column to order their data while inserting. But that’s wrong actually. Wrong, because optimizer is ordering your output data at the end. So in most cases you’ll have query which returns few or maybe tens of thousand of rows. Sorting them is nothing compared to filtering, joining and aggregating the raw data source. Yes of course, you can include the column on which you order for the output, but just don’t put it first.

Always think on which columns you’ll join and put them on the first places in your ORDER BY clause.


BASIC COMPRESSION

For data compression, Oracle Database is using algorithms like LZO,GZIP and BZIP.
So if your data is sorted, you’ll have better compression ratio:

SQL> create table big_comp compress as select * from big;

Table created.

Elapsed: 00:01:01.80
SQL> create table big_sort_comp compress as select * from big_sorted;

Table created.

Elapsed: 00:00:56.76
SQL> select segment_name,sum(bytes)/1024/1024 MB from user_segments where segment_name in ('BIG_COMP','BIG_SORT_COMP','BIG','BIG_SORTED') group by segment_name;

SEGMENT_NAME                        MB
--------------------------- ----------
BIG                                328
BIG_COMP                           272
BIG_SORTED                         328
BIG_SORT_COMP                      248

Not so good as expected. Near 10% better compression ratio. OLTP compression will have similar results.

Let see how HCC will handle with it ;)


HYBRID COLUMNAR COMPRESSION – HCC

Our BIG tables are transferred to Exadata instance with data pump . And now we can play with HCC and Storage Indexes :)

SQL> create table big_comp compress for query as select * from big;

Table created.

SQL> create table big_sort_comp compress for query as select * from big_sorted;

Table created.

SQL> select segment_name,sum(bytes)/1024/1024 MB from user_segments where segment_name in ('BIG_COMP','BIG_SORT_COMP','BIG','BIG_SORTED') group by segment_name;

SEGMENT_NAME                     MB
------------------------ ----------
BIG                             331
BIG_COMP                        128
BIG_SORTED                   332.75
BIG_SORT_COMP                102.75

SQL>

20% better compression ratio with ordered data. Twice better than basic compression.


STORAGE INDEXES

Storage Indexes are basic in-memory structures which are created and maintained on the Exadata storage servers. Those indexes are used to eliminate data blocks instead of locating them. Here we have a situation similar to the one with clustering factor and ordered data.

If our data is sorted, you’ll will read less blocks:

SQL> select count(*) from big where a='xxxx';

  COUNT(*)
----------
        20

Elapsed: 00:00:01.50
SQL>
SQL> select count(*) from big_sorted where a='xxxx';

  COUNT(*)
----------
        20

Elapsed: 00:00:00.24
SQL>

From here we can see that our execution time is way less. I didn’t put some traces, because the explanation of tracing storage indexes require a whole post by itself. And I think that this one is big enough. Actually, if someone reads it till here, will have a beer from me :D
If you wonder how am I so sure that I have storage indexes created on those columns and tables, check my other post here.


Inserting your data:
In general you can benefit a lot from sorting your data. But what’s the price? How much additional time do you’ll spend inserting your data?

Lets just try and see:

SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:01.20
SQL> create table test1 as select * from big;

Table created.

Elapsed: 00:00:50.40
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:02.48
SQL> create table test2 as select * from big order by a;

Table created.

Elapsed: 00:02:08.16
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:02.57
SQL> create table test3 as select * from big order by a,b;

Table created.

Elapsed: 00:02:09.31
SQL> create table test4 as select * from big order by a,b,c,d,e;

Table created.

Elapsed: 00:02:12.91
SQL>

As we can see, sorting is adding extra load. In this case, twice slower. 80 seconds more, seconds that you’ll save if you hit the half of the situations explained till now, just once. And what would happen if you hit them many times? Pay attention also to the difference between sorting 1, 2 or 5 columns. Its minimal.
So if you gonna order your data, order it well! Include the join columns that are most oftenly used, start with them, continue with ORDER BY and GROUP BY columns. Order your data well. Don’t lose your time just for a single column. Of course, don’t forget to try to find the balance. Actually I think this is the MOST important step. Finding the balance.
If you are inserting rarely accessed data, sorting is pointless. Pointless because you’ll loose time to sort data which you’ll not read, join, group and so on. Time which you won’t take back, from which you won’t benefit. And it’s true, only if you are not compressing that data to save space :)
So the balance is important and you have to take care for it.

PP: Don’t forget that index range and full scan is returning sorted data, also you can use index organized table to achieve that.


Reason number 9:
The final reason.
To make your system faster. The general reason to write this post.

Many people look at the performance as “bonus” or “Anyway, it’s running…”/my favorite/ and they are acting as if the server resources are “unlimited”. And that’s so wrong. Wrong, wrong, wrong. Just wrong!
The server resources are your credits, they are your “server side money”. If your application performs better and uses less resources, you’ll be able to scale it easier, to build it, to expand it without spending all of your resources for nothing. But they are “unlimited” right… And what if you don’t have any more!? What if your server is already whining!? You’ll start thinking and taking care about your performance! You’ll start rewriting and fixing your code, losing your time!
And how important is that? How important is your time?

Cheers,
Kovachev

About these ads

5 thoughts on “8 reasons to sort your data

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s