Home > Oracle, Performance > Indexing subset of table data – Compact Indexes

Indexing subset of table data – Compact Indexes


Think about you need to create an index on a huge table with skewed data.  Say 98% of the data is same value and remaining 2% may be different values. For example, you got a employee table, where nationality as

Indian : 98%

Japan: 0.5%

USA: 0.5%

Russia: 0.1%

China: 0.9%

This is highly skewed data and you want to create an index for this nationality column.  Interestingly this index is not going to be useful to access 98% of the ‘INDIAN’ nationalities, but they need to exist in the index.  Really they need to exist in the index?  Can I avoid the 98% of the INDIANS so that the index can be very small – just 2% of the data size? 

SQL> desc indextest
 Name                        Null?    Type
 —————           ——– —————–
 OWNER                                  VARCHAR2(30)
 OBJECT_NAME                  VARCHAR2(128)
 OBJECT_ID                           NUMBER

SQL> select count (*) from indextest;

  COUNT(*)
———-
    459327

SQL> alter table indextest add (country varchar2(20));

Table altered.

SQL> update indextest set COUNTRY=’INDIA’;

459327 rows updated.

I have created a table  INDEXTEST from DBA_OBJECTS with 459327 rows and added a new column COUNTRY.  Now I have 459327 employees in my test table with default nationality as INDIA.  Updated the table with COUNTRY as USA and JAPAN just for 24 for employees each.

SQL> update indextest set COUNTRY=’USA’ where rownum < 25 and COUNTRY=’INDIA’;

24 rows updated.

SQL> update indextest set COUNTRY=’JAPAN’ where rownum < 25 and COUNTRY=’INDIA’;

24 rows updated.

SQL> commit ;

Commit complete.

My current data is highly skewed and creating an index for all data is really a waste.  And being a very big index,  you need to manage so many other factors like –  maintenance, buffer cache usage, fragmentation and the impacts on other  DML activities.  Still let me create the standard index – just to capture the behaviour and size of the index.

SQL> create index i_country on indextest(COUNTRY);

Index created.

SQL> set autotrace traceonly exp stat
SQL> select * from indextest where COUNTRY=’JAPAN’;

24 rows selected.
Execution Plan
———————————————————-
Plan hash value: 4287388471

—————————————————————————————–
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
—————————————————————————————–
|   0 | SELECT STATEMENT            |           |    24 |  2592 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| INDEXTEST |    24 |  2592 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_COUNTRY |    24 |       |     3   (0)| 00:00:01 |
—————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

   2 – access(“COUNTRY”=’JAPAN’)

Note
—–
   – dynamic sampling used for this statement
Statistics
———————————————————-
         70  recursive calls
          0  db block gets
         79  consistent gets
          1  physical reads
          0  redo size
       1603  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         24  rows processed

SQL> select * from indextest where COUNTRY=’INDIA’;

459279 rows selected.
Execution Plan
———————————————————-
Plan hash value: 61661468

——————————————————————————-
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
——————————————————————————-
|   0 | SELECT STATEMENT  |           |   419K|    43M|   691   (3)| 00:00:09 |
|*  1 |  TABLE ACCESS FULL| INDEXTEST |   419K|    43M|   691   (3)| 00:00:09 |
——————————————————————————-

Predicate Information (identified by operation id):
—————————————————

   1 – filter(“COUNTRY”=’INDIA’)

Note
—–
   – dynamic sampling used for this statement
Statistics
———————————————————-
          7  recursive calls
          0  db block gets
      33066  consistent gets
       1007  physical reads
          0  redo size
   18312042  bytes sent via SQL*Net to client
     337322  bytes received via SQL*Net from client
      30620  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     459279  rows processed

 As expected, index is picking when you are refering  JAPAN and INDIA is going for a FTS.  The sizse of the index is 9MB with 24 extents.

SQL> select bytes/1024/1024,extents from dba_segments where segment_name=’I_COUNTRY’;

BYTES/1024/1024              EXTENTS
—————                                ———-
              9                                       24

So, we know the size of the inedex in 9MB, just because the index is filled with 99% unwanted entries.  Now let us create an index excluding for the nationalities ‘INDIA’.

SQL> drop index I_COUNTRY;

Index dropped.

SQL> create index i_country on indextest(decode(COUNTRY,’INDIA’,NULL,’USA’,’USA’,’JAPAN’,’JAPAN’));

Index created.

SQL>  select bytes/1024/1024,extents from dba_segments where segment_name=’I_COUNTRY’;

BYTES/1024/1024    EXTENTS
—————                    ———-
          .0625                     1

I just put a DECODE function to make NULL where it found a COUNTRY is INDIA and the size of the index is just 0.06 MB fit into a single small extent – index for 48 rows while your table contains 459327 rows.  Now your baby index is ready for use.

But the optimizer will not detect your idex directly as you used DECODE function while creating the index.  So, you need to use DECODE while refering the COUNTRY column in the WHERE clause – in a single word this is a function based index.

SQL> select * from indextest where decode(COUNTRY,’INDIA’,NULL,’USA’,’USA’,’JAPAN’,’JAPAN’)=’JAPAN’;

24 rows selected.
Execution Plan
———————————————————-
Plan hash value: 4287388471

—————————————————————————————–
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
—————————————————————————————–
|   0 | SELECT STATEMENT            |           |  4192 |   462K|     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| INDEXTEST |  4192 |   462K|     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_COUNTRY |  1677 |       |     1   (0)| 00:00:01 |
—————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

   2 – access(DECODE(“COUNTRY”,’INDIA’,NULL,’USA’,’USA’,’JAPPAN’,’JAPAN’)=’JAPAN’)

Note
—–
   – dynamic sampling used for this statement
Statistics
———————————————————-
         49  recursive calls
          0  db block gets
         75  consistent gets
          0  physical reads
          0  redo size
       1340  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         24  rows processed

 Now your baby index is functional with just subset of data.  I felt this is very useful with highly skewed data – you are not really playing with the whole data, but only on required data.

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: