Faster Primary Key creation or enabling

Today, I have seen an application running a script to enable primary key constraint for a long time.   The question is how to make it faster.  In Data warehousing systems, it is normal that we disable Constraints, drop indexes and re-enable them after loading data.  So, making them faster is very important as the other jobs are dependent on the initial data load.  

I know this is nothing new for a DBAs, but many developers tends to over look this.

Here is my test case, initially prepared a new table with 7 million records in it.

SQL> create sequence my_num;

SQL> create table my_table as select my_num.nextval, OWNER,OBJECT_NAME,OBJECT_ID,DATA_OBJECT_ID,

OBJECT_TYPE,CREATED   from dba_objects;

SQL> insert into my_table select my_num.nextval,OWNER,OBJECT_NAME,OBJECT_ID,

138236 rows created.

SQL> commit ;

SQL> select count(*) from my_table;


Now my_table got 7.2 million records, good enough to show slow primary key creation 🙂 

SQL> alter table my_table add constraint pk_my_Pkey primary key  (NEXTVAL,OBJECT_ID);

Table altered.

Elapsed: 00:06:35.15

The normal primary key creation took 6 minutes and 35.15 seconds.  To create primary key it took 6.35 minutes!, I felt that is way too high.  So, how to improve the constraint creation timings.  The primary key creation to got two steps

  • A unique index creation
  • Enabling the constraint

   The maximum time took was to create the index not to enable the primary key.  If we make the index creation faster, all set.  So, I decided to break the constraint creation into 2 parts a) Index creation b) Enabling primary key.  Considering the type of tables – daily data load, can be re-created easily etc we need not keep recovery is in mind – if we lose the table we will create it.  Also, we have enough CPU resources while other jobs are waiting to complete the data load.

  • As the system resources under utilised, I will use enough parallelism
  • As the table can be re-crated easily, I will disable redo generation while index creation. 

 With the above assumtpions in hand, I had created the index and primary key as follows.

SQL> create index my_index on my_table (NEXTVAL,OBJECT_ID) NOLOGGING parallel 8;

Index created.

Elapsed: 00:02:06.75

SQL> alter table my_table enable constraint pk_my_Pkey using index my_index;

Table altered.

Elapsed: 00:00:05.02

Perfect, the constraint creation completed more efficiently with in 2.06.83 minutes, means 33% of the actual time!

  1. Chandra
    April 15, 2014 at 7:24 am

    Good topic

  2. March 26, 2015 at 5:26 pm

    Great idea!!!

  3. Feng
    July 18, 2016 at 9:19 pm

    A Primary Key constraint and a Clustered Index is not really the same in SQL Server. By default SQL Server enforces the Primary Key constraint with a Unique Clustered Index. But if you want you could instead enforce it with a Unique Non-Clustered Index. But that approach wouldn’t really make sense as a default, because you need to have specific problems to justify that approach.

    • July 19, 2016 at 6:26 am

      Not sure about SQL Server as I have tested this approach in Oracle. Sorry, I should have mentioned about it.

  4. August 14, 2017 at 2:24 pm

    Have you tested this with dropping the index and recreating it instead of merely enabling it?

    • August 24, 2017 at 1:50 am

      Yes, it took more than 6 minutes – in all those cases, it create a new index serially

  5. Vasiliy
    August 23, 2017 at 5:27 am

    It is worth noting that Oracle ( still reads the whole table serially when it enables a primary key constraint despite the existence of an appropriate unique index. I don’t know why it does this. So on a huge table enabling a constraint is still long enough. But I approve that the whole process is faster when you have pre-created an index in parallel though.

    • August 24, 2017 at 1:52 am

      I will check this, not sure why oracle has to scan the table when unique index is existing.

      • Vasiliy
        August 24, 2017 at 3:01 am

        By the way, in your example the index is not unique, so Oracle has to scan something to detect duplicates. I bet it scans the table, but it would be cheaper to scan the index. So would it be in the case of the unique index (but still it’d be more expensive than doing nothing).

      • Vasiliy
        August 24, 2017 at 8:31 am

        I am sorry. My bad. It was a flawed schema design. I blindly believed that primary key’s target columns had been constrained as not null (but they really should have been). I am currently in the middle of this process. I’ll tell if it doesn’t help (I believe it does).

        In my own example when the index is not unique Oracle reads the entire index (FFS) and checks it for duplicates (SORT). As expected.

  6. August 24, 2017 at 11:36 am

    Index FFS (scan through the leaf nodes) for non- unique indexes is expected to validate the data. I agree.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: