People – not just People rather they are your resources!

April 11, 2011 Leave a comment

Other people are your greatest resources! They give birth to you; they feed you, dress you, provide you with money, make you laugh and cry.  They comfort you, heal you, invest your money, service your car and bury you.  We can’t live without them.  We can’t even die without them.

Categories: Oracle, Performance

ORA-00600: internal error code, arguments: [ktprhtnew6]

February 23, 2013 Leave a comment

While recovering a database, stuck with ORA-00600 and database crashed.  We have an Oracle version 11.1 and were hitting a bug: 8310931 and fixed in 11.2.  The bug says, this problem will happen usually on high number of CPUs while doing transaction recovery.  We got 126 CPU in the DB server.


Errors in file /opt/oracle/diag/rdbms/xxxx/xxx/trace/xxx_smon_29786.trc (incident=120257):

ORA-00600: internal error code, arguments: [ktprhtnew6], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/xxx/xxxx/incident/incdir_120257/xxxx_smon_29786_i120257.trc
Fri Feb 22 22:49:14 2013
Trace dumping is performing id=[cdmp_20130222224914]
Fatal internal error happened while SMON was doing active transaction recovery.
Errors in file /opt/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxx_smon_29786.trc:
ORA-00600: internal error code, arguments: [ktprhtnew6], [], [], [], [], [], [], [], [], [], [], []
SMON (ospid: 29786): terminating the instance due to error 474

The only information we could see the trace file was, it was doing transaction recovery while crashing with 256 servers.


*** 2013-02-22 22:49:06.125
Dead transaction 0x00df.019.0002e351 recovered by 256 server(s)
Incident 120257 created, dump file: /opt/oracle/diag/rdbms/xxx/xxx/incident/incdir_120257/xxxx_smon_29786_i120257.trc
ORA-00600: internal error code, arguments: [ktprhtnew6], [], [], [], [], [], [], [], [], [], [], []

Parallel Transaction recovery caught exception 600
Parallel Transaction recovery caught error 600
Fatal internal error happened while SMON was doing active transaction recovery.
error 474 detected in background process
ORA-00600: internal error code, arguments: [ktprhtnew6], [], [], [], [], [], [], [], [], [], [], []

*** 2013-02-22 22:49:14.570
SMON (ospid: 29786): terminating the instance due to error 474
ksuitm: waiting up to [5] seconds before killing DIAG

stuck with the issue – we thought to hide the high number of  CPUs from Oracle.  Set the CPU_COUNT to  64 and started the database.  That resolved the issue and database came up cleanly.

Note: I felt reducing parallel servers for rollback (by adjusting the parameter fast_start_parallel_rollback) were also could have been resolved the issue with the CPU_COUNT 126.  In our case it was ‘HIGH’.  Basically Oracle should not use 256 servers for recovery to hit this bug.  Will test that case when I get an opportunity, hurry to release the system now!

Who dropped my procedure?

February 21, 2013 Leave a comment

In past two posts ( from current schema and as SYS user ) , I have explained how we can restore a package.  Some one posed me a question – how can I identify who dropped the package.  Even though we are seeing a procedure or package as database object, it is stored as a table rows in the SOURCE$ table just like other tables in the database.  So, any CREATE or DROP procedure / package are just INSERT or DELETE or UPDATE operations to the SOURCE$ table.  We can use this along with FLASHBACK feature to identify who dropped the procedure, it is simple.

First thing is to identify what are various transactions happened to the SOURCE$ table.  Login as SYS user and use FLASHBACK query to identify the transactions to the SOURCE$.

<pre>SQL> set linesize 200
SQL> col TABLE_NAME for a15
SQL> col TABLE_OWNER for a15
SQL> SELECT distinct start_scn, commit_scn, operation, table_name, table_owner,xid,logon_user
FROM flashback_transaction_query
where table_name='&Table_name';
Enter value for table_name: SOURCE$
old 3: where table_name='&Table_name'
new 3: where table_name='SOURCE$'

 START_SCN COMMIT_SCN OPERATION TABLE_NAME TABLE_OWNER XID LOGON_USER
---------- ---------- -------------------------------- --------------- --------------- ---------------- ------------------------------
 7335974 7335984 DELETE SOURCE$ SYS 0002000B00000E1E APPLE
 7347338 7347344 INSERT SOURCE$ SYS 0005001D0000140F APPLE
 7347355 7347365 DELETE SOURCE$ SYS 0008000300000E0B THOMAS
 7317378 7317385 INSERT SOURCE$ SYS 0008001900000E05 APPLE</pre>

In fact I am missing a PROCEDURE from APPLE schema.  So there are two DELETE and two INSERT statements happened to the SOURCE$ table.  I am not going to worry about the INSERT operation as  the expected operation is DELETE.  So, I took the XID  ’0008000300000E0B’ and checked what are the UNDO SQLs for that transaction.

<pre>SQL> set linesize 200
SQL> SELECT undo_sql
 2 FROM flashback_transaction_query
 3 WHERE xid = HEXTORAW('&xid');
Enter value for xid: 0008000300000E0B
old 3: WHERE xid = HEXTORAW('&xid')
new 3: WHERE xid = HEXTORAW('0008000300000E0B')
UNDO_SQL
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
insert into "SYS"."OBJERROR$"("OBJ#") values ('22781');
update "SYS"."OBJ$" set "OBJ#" = '22781', "DATAOBJ#" = NULL, "TYPE#" = '7', "CTIME" = TO_DATE('21-FEB-13', 'DD-MON-RR'), "MTIME" = TO_DATE('21-FEB-13', 'DD-MON-RR'), "STIME" = TO_DATE('21-FEB-13', 'DD
-MON-RR'), "STATUS" = '3', "FLAGS" = '32768', "OID$" = NULL, "SPARE1" = '6', "SPARE2" = '65535' where ROWID = 'AAAAASAABAAAGTIAAY';
insert into "SYS"."DEPENDENCY$"("D_OBJ#","D_TIMESTAMP","ORDER#","P_OBJ#","P_TIMESTAMP","D_OWNER#","PROPERTY","D_ATTRS","D_REASON") values ('22781',TO_DATE('21-FEB-13', 'DD-MON-RR'),'0','1177',TO_DATE(
'18-APR-06', 'DD-MON-RR'),NULL,'1',NULL,NULL);
insert into "SYS"."ERROR$"("OBJ#","SEQUENCE#","LINE","POSITION#","TEXTLENGTH","TEXT","PROPERTY","ERROR#") values ('22781','4','5','1','29','PL/SQL: SQL Statement ignored','0','0');
insert into "SYS"."ERROR$"("OBJ#","SEQUENCE#","LINE","POSITION#","TEXTLENGTH","TEXT","PROPERTY","ERROR#") values ('22781','3','5','8','47','PL/SQL: ORA-00942: table or view does not exist','0','0');
insert into "SYS"."ERROR$"("OBJ#","SEQUENCE#","LINE","POSITION#","TEXTLENGTH","TEXT","PROPERTY","ERROR#") values ('22781','2','4','1','29','PL/SQL: SQL Statement ignored','0','0');
insert into "SYS"."ERROR$"("OBJ#","SEQUENCE#","LINE","POSITION#","TEXTLENGTH","TEXT","PROPERTY","ERROR#") values ('22781','1','4','8','47','PL/SQL: ORA-00942: table or view does not exist','0','0');
insert into "SYS"."SETTINGS$"("OBJ#","PARAM","VALUE") values ('22781','plsql_compiler_flags','INTERPRETED,NON_DEBUG');
insert into "SYS"."SETTINGS$"("OBJ#","PARAM","VALUE") values ('22781','plscope_settings','IDENTIFIERS:NONE');
insert into "SYS"."SETTINGS$"("OBJ#","PARAM","VALUE") values ('22781','plsql_ccflags',NULL);
insert into "SYS"."SETTINGS$"("OBJ#","PARAM","VALUE") values ('22781','plsql_warnings','DISABLE:ALL');
insert into "SYS"."SETTINGS$"("OBJ#","PARAM","VALUE") values ('22781','nls_length_semantics','BYTE');
insert into "SYS"."SETTINGS$"("OBJ#","PARAM","VALUE") values ('22781','plsql_debug','FALSE');
insert into "SYS"."SETTINGS$"("OBJ#","PARAM","VALUE") values ('22781','plsql_code_type','INTERPRETED');
insert into "SYS"."SETTINGS$"("OBJ#","PARAM","VALUE") values ('22781','plsql_optimize_level','2');
insert into "SYS"."SOURCE$"("OBJ#","LINE","SOURCE") values ('22781','7','end;');
insert into "SYS"."SOURCE$"("OBJ#","LINE","SOURCE") values ('22781','6','commit;
');
insert into "SYS"."SOURCE$"("OBJ#","LINE","SOURCE") values ('22781','5','update emp set comm=sal*0.10 where empno = v_empno;
');
insert into "SYS"."SOURCE$"("OBJ#","LINE","SOURCE") values ('22781','4','update emp set sal=sal*(1+v_percent/100) where empno = v_empno;
');
insert into "SYS"."SOURCE$"("OBJ#","LINE","SOURCE") values ('22781','3','begin
');
insert into "SYS"."SOURCE$"("OBJ#","LINE","SOURCE") values ('22781','2','is
');
insert into "SYS"."SOURCE$"("OBJ#","LINE","SOURCE") values ('22781','1','procedure Update_emp (v_empno number,v_percent number)
');
insert into "SYS"."PROCEDURE$"("OBJ#","AUDIT$","STORAGESIZE","OPTIONS") values ('22781','--------------------------------------',NULL,'0');</pre>

There you go, I am particularly interested in the below statements, which is undoing the ‘DROP’

<pre>insert into "SYS"."SOURCE$"("OBJ#","LINE","SOURCE") values ('22781','5','update emp set comm=sal*0.10 where empno = v_empno;
');
insert into "SYS"."SOURCE$"("OBJ#","LINE","SOURCE") values ('22781','4','update emp set sal=sal*(1+v_percent/100) where empno = v_empno;
');
insert into "SYS"."SOURCE$"("OBJ#","LINE","SOURCE") values ('22781','3','begin
');
insert into "SYS"."SOURCE$"("OBJ#","LINE","SOURCE") values ('22781','2','is
');
insert into "SYS"."SOURCE$"("OBJ#","LINE","SOURCE") values ('22781','1','procedure Update_emp (v_empno number,v_percent number)
');</pre>

From this, it is clear that the user THOMAS dropped the procedure.

buffer busy waits

February 12, 2013 Leave a comment

Ordering data in a table is important to avoid bottlenecks.  In the previous post I had explained how to avoid ITL waits by ordering the data.  This demo will shows how we can remove “buffer busy waits” by ordering the data.  I ran the below 5 similar sql scripts on a table ABC concurrently.


$ more 5.sql

update abc set owner='ABC' where mod(object_id,10)=5;

commit;

exit

$ more 4.sql

update abc set owner='ABC' where mod(object_id,10)=4;

commit;

exit

$ more 3.sql

update abc set owner='ABC' where mod(object_id,10)=3;

commit;

exit

$ more 2.sql

update abc set owner='ABC' where mod(object_id,10)=2;

commit;

exit

$ more 1.sql

update abc set owner='ABC' where mod(object_id,10)=1;

commit;

exit

$ more 0.sql

update abc set owner='ABC' where mod(object_id,10)=0;

commit;

exit

nohup sqlplus apple/apple @0.sql &

nohup sqlplus apple/apple @1.sql &

nohup sqlplus apple/apple @2.sql &

nohup sqlplus apple/apple @3.sql &

nohup sqlplus apple/apple @4.sql &

nohup sqlplus apple/apple @5.sql &

All the 6 UPDATE sessions were almost hanging on a mix of “buffer busy waits” and “free buffer waits”.   Being a small buffer cache and considering the the size of the server, I was not worried about the “free buffer waits”, but “buffer busy waits” which I do not like to see – CBC latch contention due to 6 parallel updates on the ABC table.


SQL> set linesize 160

SQL col event for a40

SQL> col username for a10

SQL> col OSUSER for a12

SQL> select v.seq#,v.sid,username,v.event,program

2  from v$session_wait v, v$session s  where  s.sid=v.sid and username='APPLE'

3  order by last_call_et;

SEQ#        SID USERNAME   EVENT                                    PROGRAM

---------- ---------- ---------- ---------------------------------------- ------------------------------------------------

552        115 APPLE      free buffer waits                        sqlplus@bandevsol01 (TNS V1-V3)

563         81 APPLE      buffer busy waits                        sqlplus@bandevsol01 (TNS V1-V3)

600         92 APPLE      buffer busy waits                        sqlplus@bandevsol01 (TNS V1-V3)

432         90 APPLE      buffer busy waits                        sqlplus@bandevsol01 (TNS V1-V3)

414         80 APPLE      free buffer waits                        sqlplus@bandevsol01 (TNS V1-V3)

551         88 APPLE      buffer busy waits                        sqlplus@bandevsol01 (TNS V1-V3)

6 rows selected.

Looking at the amount of data I was updating, creating an index was a good idea.


SQL> select count(*) from abc where mod(object_id,10)=0;

COUNT(*)

----------

37174

SQL> select count(*) from abc where mod(object_id,10)=1;

COUNT(*)

----------

37175

SQL>  select count(*) from abc where mod(object_id,10)=2;

COUNT(*)

----------

37175

SQL>  select count(*) from abc where mod(object_id,10)=3;

COUNT(*)

----------

37175

SQL> select count(*) from abc where mod(object_id,10)=4;

COUNT(*)

----------

37175

SQL> select count(*) from abc where mod(object_id,10)=5;

COUNT(*)

----------

37174

SQL> select count(*) from  abc;

COUNT(*)

----------

371744

In all conditions, I was updating only 10% of the table data, so a function based index on mod(object_id,10)  should restrict my UPDTAE access only limited rows, so that I will not hammer the CBC, will minimize  CBC latch contention.


SQL> create index i_objid on abc(mod(object_id,10));

Index created.

SQL> exec dbms_stats.gather_table_stats('APPLE','ABC',cascade=>true);

PL/SQL procedure successfully completed.

To my surprise, Oracle did not pick up the index, rather opted FTS for the updates.  Usually developer community rush to use hints to force the index.  But I decide to see why the index is not getting used.  Using a select statement, compared the blocks gets for both with and without index.


Execution Plan

----------------------------------------------------------

Plan hash value: 3116099409

--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |      | 37174 |  3920K|  1461   (3)| 00:00:18 |

|*  1 |  TABLE ACCESS FULL| ABC  | 37174 |  3920K|  1461   (3)| 00:00:18 |

--------------------------------------------------------------------------

&nbsp;

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter(MOD("OBJECT_ID",10)=1)

&nbsp;

Statistics

----------------------------------------------------------

0  recursive calls

0  db block gets

7727  consistent gets

5276  physical reads

0  redo size

2829186  bytes sent via SQL*Net to client

27781  bytes received via SQL*Net from client

2480  SQL*Net roundtrips to/from client

0  sorts (memory)

0  sorts (disk)

37175  rows processed

&nbsp;

SQL> select /*+ index (a ,I_OBJID) */ * from abc a where mod(object_id,10)=1;

37175 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 2443312682

---------------------------------------------------------------------------------------

| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |         | 37174 |  3920K|  4811   (1)| 00:00:58 |

|   1 |  TABLE ACCESS BY INDEX ROWID| ABC     | 37174 |  3920K|  4811   (1)| 00:00:58 |

|*  2 |   INDEX RANGE SCAN          | I_OBJID | 37174 |       |    76   (2)| 00:00:01 |

---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access(MOD("OBJECT_ID",10)=1)

Statistics

----------------------------------------------------------

1  recursive calls

0  db block gets

9354  consistent gets

3695  physical reads

0  redo size

2829186  bytes sent via SQL*Net to client

27781  bytes received via SQL*Net from client

2480  SQL*Net roundtrips to/from client

0  sorts (memory)

0  sorts (disk)

37175  rows processed

So, that is the reason why oracle is not using the index – the data distribution (clustering factor) is not favoring the index usage.  This is where we need to  worry about the developers who use INDEX hints to force index access.  Data is always dynamic, so keep your SQLs also dynamic – let CBO build an execution plan based on the current environment.  By keeping INDEX hint, you are telling Oracle to use a single standard plan which may not be suitable for the same data in different distributions.    Looking at the CLUSTERING_FACTOR columns explained clearly why oracle is choosing FTS, skipping the available index.


SQL> select index_name,CLUSTERING_FACTOR,table_name

from user_indexes where table_name='ABC';

INDEX_NAME                     CLUSTERING_FACTOR TABLE_NAME

------------------------------ ----------------- ------------------------------

I_OBJID                                    47277 ABC

SQL> SELECT BLOCKS,NUM_ROWS FROM USER_TABLES WHERE TABLE_NAME='ABC';

BLOCKS   NUM_ROWS

---------- ----------

5276     371744

So the clustering factor is 47,277 while the total number of blocks in the table is 5,276.  Oracle will favor index access only if the clustering factor is close to the number of BLOCKS in the table which is not the case here.  Forcing Oracle for an index access will drive up cost, execution time, consistent gets due to the inefficient plan.   So I decided to order the data by rebuilding the table using mod(object_id,10) order clause.  Created a backup table and INSERT back to ABC with ORDER BY  mod(object_id,10).

 SQL> insert into abc select * from abc_bak order  by  mod(object_id,10);

371744 rows created.

SQL> exec dbms_stats.gather_table_stats('APPLE','ABC',cascade=>true);

PL/SQL procedure successfully completed.

SQL> select index_name,CLUSTERING_FACTOR,table_name from user_indexes where table_name='ABC';

INDEX_NAME                     CLUSTERING_FACTOR TABLE_NAME

------------------------------ ----------------- ------------------------------

I_OBJID                                     5269 ABC

SQL> SELECT BLOCKS,NUM_ROWS FROM USER_TABLES WHERE TABLE_NAME='ABC';

BLOCKS   NUM_ROWS

---------- ----------

5272     371744

Now the CLUSTERING_FACTOR come down 5,269 and the number of blocks in the table is 5,272.  My UPDATE statements should go well now using the index.   CLUSTERING_FACTOR also plays a vital role in EXADATA for the effective use of Storage Indexes (SI).  Storage Indexes are the in memory objects built for a table and there were no indexes present.


Execution Plan

----------------------------------------------------------

Plan hash value: 3453688465

----------------------------------------------------------------------------------------

| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------

|   0 | UPDATE STATEMENT             |         | 34165 |   734K|   556   (1)| 00:00:07 |

|   1 |  UPDATE                      | ABC     |       |       |            |          |

|   2 |   TABLE ACCESS BY INDEX ROWID| ABC     | 34165 |   734K|   556   (1)| 00:00:07 |
<div>

|*  3 |    INDEX RANGE SCAN          | I_OBJID | 34165 |       |    70   (2)| 00:00:01 |

This index will help to reduce the CBC latch contention and “buffer busy waits” will be a history.  A well sized buffer cache in a good system will reduce the  “free buffer waits” as well.


SQL> /

SEQ#        SID USERNAME   EVENT                                    PROGRAM

---------- ---------- ---------- ---------------------------------------- ------------------------------------------------

214         77 APPLE      db file sequential read                  sqlplus@bandevsol01 (TNS V1-V3)

242         72 APPLE      db file sequential read                  sqlplus@bandevsol01 (TNS V1-V3)

208         85 APPLE      db file sequential read                  sqlplus@bandevsol01 (TNS V1-V3)

895         83 APPLE      db file sequential read                  sqlplus@bandevsol01 (TNS V1-V3)

162         71 APPLE      db file sequential read                  sqlplus@bandevsol01 (TNS V1-V3)

245         75 APPLE      db file sequential read                  sqlplus@bandevsol01 (TNS V1-V3)

&nbsp;

6 rows selected.

&nbsp;

SQL> /

&nbsp;

SEQ#        SID USERNAME   EVENT                                    PROGRAM

---------- ---------- ---------- ---------------------------------------- ------------------------------------------------

499         77 APPLE      free buffer waits                        sqlplus@bandevsol01 (TNS V1-V3)

523         72 APPLE      free buffer waits                        sqlplus@bandevsol01 (TNS V1-V3)

501         85 APPLE      free buffer waits                        sqlplus@bandevsol01 (TNS V1-V3)

1175         83 APPLE      free buffer waits                        sqlplus@bandevsol01 (TNS V1-V3)

445         71 APPLE      free buffer waits                        sqlplus@bandevsol01 (TNS V1-V3)

529         75 APPLE      free buffer waits                        sqlplus@bandevsol01 (TNS V1-V3)

6 rows selected.

So, data distribution is an important factor, and well neglected almost in all environments.  A well physically organised table can save you from many performance issues – remember to order based on a more frequently used column(s).    Also, developers think twice before you put INDEX hint, you are not really increasing the performance, rather it is a liability.

ITL Waits – 2

January 21, 2013 Leave a comment

As I said in the previous post, the straight forward method to resolve ITL wait is to increase the ITL slots while creating the table or modifying an existing table.  As this is a standard way, I am not going explain this method.  But, sometimes the data distribution in a table can contribute to the ITL waits.  From my test table I will create two tables using the same data.


SQL> create table abc_unorg as select * from abc where 1 = 2 ;

Table created.

SQL> alter table abc_unorg  pctfree 1 ;

Table altered.

SQL> create table abc_org as select * from abc where 1 = 2 ;

Table created.

SQL> alter table abc_org  pctfree 1 ;

Table altered.

SQL> select TABLE_NAME,PCT_FREE,INI_TRANS from user_tables;

TABLE_NAME                       PCT_FREE  INI_TRANS

------------------------------ ---------- ----------

ABC_UNORG                               1          1

ABC                                     2          1

ABC_ORG                                 1          1

Then I will populate both table from the same source in two different ways – data organized and unorganized as below.  Unorganized table will use a simple  INSERT.


SQL> insert into ABC_UNORG select * from abc;

371744 rows created.

SQL> commit ;

Commit complete.

While for the second table, ABC_ORG,  I will ORDER the data and load.


SQL> insert into ABC_ORG select * from abc order by mod(object_id,10);

371744 rows created.

SQL> commit ;

Commit complete.

So we have the test tables are ready with same configuration and same data, except the data distribution is different.  Then I run 7 different  SQL  UPDATEs as 7 different transactions ( from 7 different sessions simultaneously)  against the first unorganized table.


update abc_unorg set owner='XxxXXXXXXXXXXXXXXXXXxxxxxxXXXXXXXXXXXX' where mod(object_id,10)=1;

update abc_unorg set owner='XxxXXXXXXXXXXXXXXXXXxxxxxxXXXXXXXXXXXX' where mod(object_id,10)=2;

update abc_unorg set owner='XxxXXXXXXXXXXXXXXXXXxxxxxxXXXXXXXXXXXX' where mod(object_id,10)=3;

update abc_unorg set owner='XxxXXXXXXXXXXXXXXXXXxxxxxxXXXXXXXXXXXX' where mod(object_id,10)=4;

update abc_unorg set owner='XxxXXXXXXXXXXXXXXXXXxxxxxxXXXXXXXXXXXX' where mod(object_id,10)=5;

update abc_unorg set owner='XxxXXXXXXXXXXXXXXXXXxxxxxxXXXXXXXXXXXX' where mod(object_id,10)=6;

update abc_unorg set owner='XxxXXXXXXXXXXXXXXXXXxxxxxxXXXXXXXXXXXX' where mod(object_id,10)=7;

As expected I got ITL waits as the pctfree was just  1% and INITTRANS were 1.    Only 3 session were able proceed with the UPDTAE statement while the 5 session waiting on ‘enq: TX – allocate ITL entry ‘


SEQ#        SID USERNAME             EVENT                                  P1         P2         P3  SQL_HASH_VALUE SECONDS_IN_WAIT SQL_ID

---------- ---------- -------------------- ------------------------------ ---------- ---------- ----------  -------------- --------------- -------------

28         91 APPLE                enq: TX - allocate ITL entry   1415053316     589839       3040     2346547447              14 bmgrgky5xuy7r

28         74 APPLE                enq: TX - allocate ITL entry   1415053316     393242       3188     3997157940              14 dtw3dcbr3zkjn

27         81 APPLE                enq: TX - allocate ITL entry   1415053316     589839       3040      764038200              20 8k203rwqsnm1s

2693         92 APPLE                free buffer waits                       4      17673          3     3143069066               0 5tsz18yxpfvca

28         87 APPLE                enq: TX - allocate ITL entry   1415053316      65546       2424     2491327179              14 26ht7d6a7x8qb

2636         86 APPLE                buffer busy waits                       4      16071          1     2664539258               0 4tn3m36gd393u

2658         82 APPLE                free buffer waits                       4      16071          0     2618831903               0 dsxvrhyf1hd0z

Since I have configured with the table with  PCTFREE 1%, dynamic ITITTRANS allocation is not happening for concurrent transactions as there is no space left in the block.    Then I run the same UPDATE transactions against ABC_ORG table and  the waits were,


SEQ#        SID USERNAME             EVENT                                  P1         P2         P3 LAST_CALL_ET SQL_HASH_VALUE SECONDS_IN_WAIT SQL_ID

---------- ---------- -------------------- ------------------------------ ---------- ---------- ---------- ------------ -------------- --------------- ------------- ----------------

2602         74 APPLE                free buffer waits                       4      22921          3           28     1294515348               0 1xfxu856kjg4n

1677         86 APPLE                free buffer waits                       4      22021          0           28     2345621301               0 017jxka5wyptp

2184        115 APPLE                free buffer waits                       3       2256          3           28     3530279707               0 8pnr4sg96rksv

2524         92 APPLE                free buffer waits                       4      22921          3           28     2822052462               0 aqsb0f6n3a6mf

2572         83 APPLE                free buffer waits                       4      22921          3           28     1753863190               0 3pdu5b1n8mp0q

2594         84 APPLE                free buffer waits                       4      22636          3           28     2561380704               0 58tppz6car4b0

2637         87 APPLE                free buffer waits                       4      22921          3           28     2118971024               0 1mwba09z4tvnh

Updates to the second table going well, not even a single “enq: TX – allocate ITL entry   ”, all the session were able to progress, though there are too many “free buffer waits” which I will cover in the next post.   So, the first table got too many ITL waits and transactions were hanging while transactions are fine with the second table.  Let us take a random ROWID and get the BLOCK_ID for that row.


SQL> select object_id,mod(object_id,10),dbms_rowid.ROWID_BLOCK_NUMBER('AAAE3pAAEAAAD1uAAP')

from abc_unorg where rowid = 'AAAE3pAAEAAAD1uAAP' ;

OBJECT_ID MOD(OBJECT_ID,10) DBMS_ROWID.ROWID_BLOCK_NUMBER('AAAE3PAAEAAAD1UAAP')

---------- ----------------- ---------------------------------------------------

5321                 1                                               15726

SQL> select object_id,mod(object_id,10) from abc_unorg where dbms_rowid.ROWID_BLOCK_NUMBER(rowid) = 15726 ;

OBJECT_ID MOD(OBJECT_ID,10)

---------- -----------------

5302                 2

5304                 4

5305                 5

5306                 6

== Removed the rows for clarity ==

5383                 3

5385                 5

5386                 6

&nbsp;

73 rows selected.

SQL> select mod(object_id,10),count(*) from abc_unorg

where dbms_rowid.ROWID_BLOCK_NUMBER(rowid) = 15726

group by mod(object_id,10) ;

MOD(OBJECT_ID,10)   COUNT(*)

----------------- ----------

5          8

6          8

8          8

7          6

2          6

4          7

9          8

0          8

1          7

3          7

10 rows selected.

That means the WHERE clause condition MOD(object_ID,10)  satisfies minimum of 6 rows for each value  in each block, so our transaction forced to create an ITL slot in all the blocks.  While the second table that is not the case, remember I have ordered the data.


SQL> select object_id,mod(object_id,10) from abc_org

where dbms_rowid.ROWID_BLOCK_NUMBER(rowid) = 15856 ;

OBJECT_ID MOD(OBJECT_ID,10)

---------- -----------------

157460                 0

157470                 0

157480                 0

== removed the rows for clarity ====

60240                 0

60250                 0

80 rows selected.

Since I have ordered the data, each block will have similar data so that only one transaction slot will be created almost in all the blocks.   There will be some blocks may contain different values for mod(object_id,10), but chances are very less to get a ITL waits.   Data distribution can leads to various CONCURRENCY waits, not just ITL waits alone.

ITL Waits

December 8, 2012 Leave a comment

ITL waits are so common and sometimes it will kill the concurrency of the application.  In general there  are 1 ITL slot (caused by INITRANS) for tables and 2 slots for indexes.    When a data block is formatted   the ITL slot is created in the variable part of the block header as specified by INITRANS for that segment.  As long as free space is available in the block, ITL slots can be grown up to MAXTRANS for any future requirements.   Here plays the important role PCTFREE which can impact negatively.  With the default setting 10% can fill up the blocks 100%, and that will stop creating new ITL slots for any second transaction happening to that block.

I have created my test table ABC from DBA_OBJECTS. Initiated 7 parallel update sessions to the table such way that each UPDATE will go to all the data blocks.


SQL> select PCT_FREE,INI_TRANS from dba_tables where table_name='ABC';

PCT_FREE  INI_TRANS

---------- ----------

2          1

update abc set owner='XXXXXXXXXX' where mod(object_id,10)=1;

update abc set owner='XXXXXXXXXX' where mod(object_id,10)=2;

update abc set owner='XXXXXXXXXX' where mod(object_id,10)=3;

update abc set owner='XXXXXXXXXX' where mod(object_id,10)=4;

update abc set owner='XXXXXXXXXX' where mod(object_id,10)=5;

update abc set owner='XXXXXXXXXX' where mod(object_id,10)=6;

update abc set owner='XXXXXXXXXX' where mod(object_id,10)=7;

As expected, I got ‘enq: TX – allocate ITL entry” waits for 3 UPDATES out of 7 sessions.


SQL> l

1  select v.seq#,username,v.event,last_call_et,v.SECONDS_IN_WAIT

2  from v$session_wait v, v$session s  where v.event not like '%message%' and s.sid=v.sid and v.event not like '%jobq slave wait%' and s.username='APPLE'

3* order by last_call_et;

SEQ# USERNAME             EVENT                          LAST_CALL_ET SECONDS_IN_WAIT

---------- -------------------- ------------------------------ ------------ ---------------

79 APPLE                enq: TX - allocate ITL entry            151             127

1262 APPLE                enq: TX - allocate ITL entry            155             132

47464 APPLE                buffer busy waits                       161               1

18677 APPLE                enq: TX - allocate ITL entry            169             140

52066 APPLE                free buffer waits                       174               1

27078 APPLE                db file sequential read                 179               0

45204 APPLE                read by other session                   184               0

7 rows selected.

Interestingly the sessions were waiting up to 140 seconds just to move out of the current SEQ#.  That’s not all – even after 3 sessions completed the UPDATES, still the other 3 sessions continued to wait as I did not commit the transactions, so that it will free the ITL slots..


SEQ# USERNAME             EVENT                          LAST_CALL_ET SECONDS_IN_WAIT

---------- -------------------- ------------------------------ ------------ ---------------

79 APPLE                enq: TX - allocate ITL entry            442             411

1262 APPLE                enq: TX - allocate ITL entry            446             416

18677 APPLE                enq: TX - allocate ITL entry            460             424

As soon as I commit, the hanging sessions were able to resume the transactions


SEQ# USERNAME             EVENT                          LAST_CALL_ET SECONDS_IN_WAIT

---------- -------------------- ------------------------------ ------------ ---------------

483 APPLE                read by other session                   496               0

1684 APPLE                db file sequential read                 500               0

19986 APPLE                read by other session                   514               0

So, application concurrency can be in danger if there is no space in a data block to create ITL slots which governed by a neglected table level parameter PCTFREE.   Now how to fix this ITL wait problem?

Option 1 : Allocate enough PCTFREE while creating the table.

Make sure you have allocated enough free space while creating the table by properly sizing the PCTFREE table level parameter.  I will re-build ABC table after changing the PCTFREE parameter to   10% using table MOVE


SQL> alter table abc pctfree 10;

Table altered.

SQL> alter table abc move ;

Table altered.

Now all my UPDATES doing the concurrently – and definitely I am going to ignore the “free buffer waits” for my current test.


SQL> /

SEQ# USERNAME             EVENT                          LAST_CALL_ET SECONDS_IN_WAIT

---------- -------------------- ------------------------------ ------------ ---------------

14991 APPLE                free buffer waits                         4               0

12087 APPLE                free buffer waits                        10               0

56614 APPLE                free buffer waits                        16               0

40377 APPLE                free buffer waits                        21               0

60609 APPLE                free buffer waits                        25               0

33776 APPLE                free buffer waits                        29               0

52940 APPLE                free buffer waits                        34               0

7 rows selected.

So, keeping right PCTFREE is important not only for avoiding row migration, but will help the application concurrency as well.  I will add two more ways to avoid this while one just adding more INITRANS and the other one is more interesting.

VCS upgrade issue

November 26, 2012 Leave a comment

After upgrading VCS to 5.1 CRS came up normally but database startup failed with error,


ORA-27504: IPC error creating OSD context

ORA-27300: OS system dependent operation:lcin: LMXIOC_GETVERSION failed: failed with status: 122
 ORA-27301: OS failure message: Operation not supported on transport endpoint
 ORA-27302: failure occurred at: vcsipc_lmxci
 ORA-27303: additional information: lcin: LMXIOC_GETVERSION failed: errno 122[Operation not supported on transport endpoint], fd 13, c0x1069017d8

MOS was not able to troubleshoot the issues, but the fix came from Veritas. The solution is here http://sfdoccentral.symantec.com/sf/5.1SP1/linux/html/sfrac_install/ch16s01.htm

I am sure this will be helpful for someone in the future, so that they will save their weekend.

Categories: Oracle Tags: ,

Chained or migrated rows

October 26, 2012 Leave a comment

What is the easiest way to identify chained or migrated rows, specially you want to just check few rows, they are migrated / migrated or not.  Here is the test.  Prepare the test table with PCTFREE 0.

SQL> create table t1 (c1 varchar2(4000)) pctfree 0 ;
Table created.

SQL> select table_name,PCT_FREE from user_tables;
TABLE_NAME PCT_FREE
------------------------------ ----------
T1 0
SQL> insert into t1 values('A');
1 row created.
SQL> commit ;
Commit complete.
SQL> insert into t1 select * from t1;
1 row created.
..........
SQL> insert into t1 select * from t1;
1024 rows created.
SQL> commit ;
Commit complete.

SQL> select count(*) from t1 ;
COUNT(*)
----------
 2048

Identify few ROWIDs and read directly from the table using the ROWID.

SQL> select rowid from t1 where rownum < 5;</pre>
ROWID
------------------
AAAVFWAAEAAARCrAAA
AAAVFWAAEAAARCrAAB
AAAVFWAAEAAARCrAAC
AAAVFWAAEAAARCrAAD
select /*+gather_plan_statistics */ count(*) from t1 where rowid='AAAVFWAAEAAARCrAAD';
SQL> select /*+gather_plan_statistics */ count(*) from t1 where rowid='AAAVFWAAEAAARCrAAD';
COUNT(*)
----------
 1
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST '));
PLAN_TABLE_OUTPUT
-----------------------------------
SQL_ID 4y8yk9tzffmt1, child number 0
-------------------------------------
select /*+gather_plan_statistics */ count(*) from t1 where
rowid='AAAVFWAAEAAARCrAAD'
Plan hash value: 1355481156
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 1 |
| 2 | TABLE ACCESS BY USER ROWID| T1 | 1 | 1 | 1 |00:00:00.01 | 1 |
----------------------------------------------------------------------------------------------
Note
-----
 - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

19 rows selected.
<pre>

It just access one block since I am reading using the ROWID with just one buffer get.  Make sure no one else doing a transaction against the table,  as a consistent read can drive up buffer gets.  Since I am visiting only one block, it is clear that my row is not chained and is self contained in the same block.  So, how can I prove otherwise.  I am going to update the with a bigger value and that will lead into migrated rows as the PCTFREE is 0.

SQL> update t1 set c1='AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA';
2048 rows updated.
SQL> commit ;
Commit complete.
SQL> select /*+gather_plan_statistics */ count(*) from t1 where rowid='AAAVFWAAEAAARCrAAD';
COUNT(*)
----------
 1
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST '));
PLAN_TABLE_OUTPUT
-------------------------------------------------------------
SQL_ID 4y8yk9tzffmt1, child number 0
-------------------------------------
select /*+gather_plan_statistics */ count(*) from t1 where
rowid='AAAVFWAAEAAARCrAAD'
Plan hash value: 1355481156
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time <strong><span style="color: #ff0000;">| Buffers |</span></strong>
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 2 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 2 |
| 2 | TABLE ACCESS BY USER ROWID| T1 | 1 | 1 | 1 |00:00:00.01<span style="color: #ff0000;"><strong> | 2 |</strong></span>
----------------------------------------------------------------------------------------------
Note
-----
 - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold</pre>
Now a single row read visiting two blocks, just because the row is migrated. But I wanted to make sure, that the rows are really migrated.
<pre>SQL> analyze table t1 list chained rows into chained_rows;
analyze table t1 list chained rows into chained_rows
*
ERROR at line 1:
ORA-01495: specified chain row table not found</pre>

Wow my chained row table is not existing, let me create it.

<pre>SQL> create table CHAINED_ROWS (
 2 owner_name varchar2(30),
 3 table_name varchar2(30),
 4 cluster_name varchar2(30),
 5 partition_name varchar2(30),
 6 subpartition_name varchar2(30),
 7 head_rowid rowid,
 8 analyze_timestamp date
 9 );
Table created.

SQL> analyze table t1 list chained rows into chained_rows;
Table analyzed.
SQL> select count(*) from chained_rows ;
COUNT(*)
----------
 2033
SQL> select count(*) from t1 ;
COUNT(*)
----------
 2048</pre>

Indeed, rows are chained – 2033 rows are chained out of 2048.    I am going to fix the migrated rows by moving the table.

SQL> alter table t1 move ;
Table altered.
SQL> select rowid from t1 where rownum < 5;
ROWID
------------------
AAAVGpAAEAAARDrAAA
AAAVGpAAEAAARDrAAB
AAAVGpAAEAAARDrAAC
AAAVGpAAEAAARDrAAD
SQL> select /*+gather_plan_statistics */ count(*) from t1 where rowid='AAAVGpAAEAAARDrAAD';
COUNT(*)
----------
 1
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST '));
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------
SQL_ID 2bjscjdbg8x2a, child number 0
-------------------------------------
select /*+gather_plan_statistics */ count(*) from t1 where
rowid='AAAVGpAAEAAARDrAAD'
Plan hash value: 1355481156
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time <strong><span style="color: #ff0000;">| Buffers |</span></strong>
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 1 |
| 2 | TABLE ACCESS BY USER ROWID| T1 | 1 | 1 | 1 |00:00:00.01 <strong><span style="color: #ff0000;">| 1 |</span></strong>
----------------------------------------------------------------------------------------------
Note
-----
 - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold</pre>

Yes, my table is back to normal – ROWID access going with just one buffer get.  Let me confirm that.

SQL> truncate table chained_rows ;
Table truncated.
SQL> analyze table t1 list chained rows into chained_rows;
Table analyzed.
SQL> select count(*) from chained_rows ;
COUNT(*)
----------
 0

I felt, this is an easy way to check the chained / migrated rows.  Not sure there are exceptions.

Follow

Get every new post delivered to your Inbox.