Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

Systemstate dump analysis: Nocache on high intensive sequences in RAC

Posted by Riyaj Shamsudeen on March 10, 2009

Sequence values are cached in instance memory and by default 20 values are cached. As instance cache is transient, loss of an instance can result in loss of cached sequence values. Permanent record of highest possible value from any instance is kept track in SEQ$ table.

SQLs accessing this sequence within an instance will access instance SGA. As each instance caches its own sequence values it is highly likely that SQLs accessing this sequence from different instance will create gaps in sequence values.

Common knee-jerk reaction to this issue is to set nocache or cache 1 for these sequences. In a single instance environment, this approach will backfire due to massive updates to SEQ$ tables, buffer busy waits, latch free waits etc. In a RAC environment, this issue is magnified and almost hangs the instance. I had the privilege of working with a client to resolve one of their performance issues.

Problem

Instances were hung. It was not possible to login to the database. Many existing connections are working fine though. We were lucky enough that one of the DBAs had active connection to the database. So, we took systemstate dump from that session to see whey there is an hang (or slowness).

Alert log was printing following lines at this time frame. So, we know that there is a problem with row cache enqueue.

Wed Feb 18 06:31:31 2008
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=45
Wed Feb 18 08:59:31 2008
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=44
Wed Feb 18 08:59:31 2008
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=43
Wed Feb 18 09:01:00 2008
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=46
Wed Feb 18 09:01:00 2008
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=27

systemstate dump

We took a systemstate dump with following command.

alter session set events 'immediate trace name systemstate level 4';

I won’t bore you with all the details. But let us review just necessary sections of the systemstate dump file. We know that processes were waiting for row cache enqueue and so I searched for row cache enqueue in the trace file. From the trace file, we see that many sessions are holding NULL mode row cache enqueue and requesting X mode.

row cache enqueue: session: 6126dfea0, mode: N, request: X
  row cache parent object: address=65f5a2ce8 cid=13(dc_sequences) <<<<<<
  hash=5a31f94b typ=9 transaction=0 flags=00000000
  own=65f5a2dc8[65f5a2dc8,65f5a2dc8] wat=65f5a2dd8[63c841c18,63c841e48] mode=N
  status=-/-/-/-/-/-/-/-/-
  request=X release=FALSE flags=2
  instance lock id=QN 0010b020 00000000 <<<<<<<
  data=
  0010b020 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000

Of course, cid=13 indicating that it is for a sequence [ dc_sequence is the row cache area with cache_id=13]. It is also visible that row cache enqueue is maped to instance lock with QN as type and managed by Global enqueue Services. Wait is for that row cache enqueue lock protecting dc_sequences. We need to identify specific sequence though.

Looking for state objects in the same session shows one sequence.

owners:
       SO: 63e1d04b8, type: 46, owner: 63c29edb8, flag: INIT/-/-/0x00
          Sequence State Object:
          kdnssflg =
          kdnsshiw = INACTIVE
          kdnsssqn = 0x64cdeba58
          KGL Sequence Object #1093664: <<<<<<<<<
          kdnsqflg  =
          kqdsnflg  =                   KQDSNNOC
          kdnsqcus  = 0
          kdnsqisp  INACTIVE
          Increment =                              +1
          Minvalue  =                              +1
          Maxvalue  =    +999999999999999999999999999
          Cachesize =                              +0
          Highwater =                      +323649926
          Nextvalue = ###############################

Clearly above section shows that these lines are for sequence with object id 1093664. Querying dba_objects for that object_id shows sequence_name.

select owner, object_name , object_type, created from dba_objects where object_id=1093664
SQL> /

OWNER         OBJECT_NAME  OBJECT_TYPE        CREATED
------------- -----------  ------------       ----------
SCOTT         STG__SEQ     SEQUENCE           12-OCT-07

There is also library cache object accessing that sequence, confirming our analysis.

  SO: 640cc9230, type: 51, owner: 6136d45b8, flag: INIT/-/-/0x00
  LIBRARY OBJECT LOCK: lock=640cc9230 handle=64cdf5868 mode=N
  call pin=63c28f6f8 session pin=0
  htl=640cc92a0[643a01130,643a01130] htb=643a01130
  user=6136d45b8 session=6136d45b8 count=1 flags=[00] savepoint=58
  LIBRARY OBJECT HANDLE: handle=64cdf5868
  name=SELECT SCOTT.STG__SEQ.NEXTVAL FROM DUAL
  hash=5dbaf014 timestamp=02-18-2008 08:00:46
  namespace=CRSR flags=RON/KGHP/TIM/PN0/SML/[12010000]
  kkkk-dddd-llll=0000-0001-0001 lock=N pin=0 latch#=20
  lwt=64cdf5898[64cdf5898,64cdf5898] ltm=64cdf58a8[64cdf58a8,64cdf58a8]
  pwt=64cdf58c8[64cdf58c8,64cdf58c8] ptm=64cdf5958[64cdf5958,64cdf5958]
  ref=64cdf5878[64cdf5878, 64cdf5878] lnd=64cdf5970[64cdb02f8,64ce05748]
    LIBRARY OBJECT: object=64cdf4b40
    type=CRSR flags=EXS[0001] pflags= [00] status=VALD load=0

Sequence cache

In summary, we know that sessions were waiting for row cache enqueues with QN type. That row cache enqueue was protecting a parent row cache entry for dc_sequences. SO [State Objects] for both sequences and cursors accessing that sequence is also exists in the systemstate dump file.

Let’s query properties of that sequence.


SQL> select * from dba_sequences where sequence_name='STG__SEQ';

SEQUENCE_OWNER                 SEQUENCE_NAME                   MIN_VALUE
------------------------------ ------------------------------ ----------
 MAX_VALUE INCREMENT_BY C O CACHE_SIZE LAST_NUMBER
---------- ------------ - - ---------- -----------
SCOTT                         STG__SEQ                        1
1.0000E+27            1 N N          0   338585872

Evidently cache size is set to 0. High last_number also indicates that this is an heavily accessed sequence.

Summary

Essentially, keeping sequence with nocache in RAC for highly accessed sequences is pretty bad idea. Login to the database accesses sequences too, so if row cache enqueues are not available, then logins will hang. This is why existing connections were working fine, but new logins were simply hung. We are able to pinpoint all these using systemstate dump.

14 Responses to “Systemstate dump analysis: Nocache on high intensive sequences in RAC”

  1. Yas said

    Riyaj, nice post. The sequence AUDSES$ is accessed on login as far as I know. The default cache value for that sequence also may not be enough even in a single instance database. I fixed some contention problems by increasing its cache value in some cases.

  2. Hello yas
    Thank you for reading my blog and your kind words. That’s right, audses$ is one of them. In environments with spurious connection requests, it is better to trace one session logins (using sqltrace) and increase all those sequence cache values to higher value, if shared_server configuration is not feasible for implementation. In RAC environments, this issue gets magnified.

    Of course, there are other issues such as slower shared memory attach calls and semaphore get calls etc associated with spurious logins, which can cause side effects paralyzing database server [ one such issue documented in http://orainternals.wordpress.com/2009/03/10/systemstate-dump-analysis-nocache-on-high-intensive-sequences-in-rac/ ].

    Cheers
    Riyaj

  3. radino said

    “Essentially, keeping sequence with nocache in RAC for highly accessed sequences is pretty bad idea”

    I think, it could be even worse with “order” specified :).

  4. Dion Cho said

    Riyaji. Thanks for the sharing experiences.

    v$rowcache_parent view is another good source of troubleshooting sequence contention.

    http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/dynviews_2062.htm#REFRN30212

  5. The AUDSES$ sequence’s cache default has been changed to 10000 from version 10.2.0.3 or 10.2.0.4.

  6. Hi Tanel
    Thanks for visiting and of course, audses$ shouldn’t be an issue in 10g as you pointed out.

    Cheers
    Riyaj

  7. Yasser said

    Excellent article ….

    To say promptly some times this type of articles motivates for Junior Oracle DBA’s like us.

    -Yasser

  8. >>Common knee-jerk reaction to this issue is to set nocache or cache 1 for these sequences. In a single instance environment, this approach will backfire due to massive updates to SEQ$ tables, buffer busy waits, latch free waits etc. In a RAC environment, this issue is magnified and almost hangs the instance.<<

    Excellent wording! Made me really laugh.

    Btw, the cache size is already visible from the state object dump you have shown above:

    Maxvalue = +999999999999999999999999999
    Cachesize = +0
    Highwater = +323649926
    Nextvalue = ###############################

    Cheers

  9. Hi,

    We ran with exactly same problem as you discussed here, i want to know that the command you mentioned “alter session set events ‘immediate trace name systemstate level 4′;” was ran on which node? because in our case the malfunctioning node was hung and we were unable to pass any sort of SQL even with SYS.

    Thanks

    • Hi Nayyar
      Not sure exactly, what your symptoms were, but taking systemstate dump in any node should provide details about waits, if it is for sequence cache issues. If you still want to take systemstate dump from hung node, try using oradebug command.
      oradebug setmypid
      oradebug dump systemstate 4
      This should generate a trace file with systemstate..

      Cheers
      Riyaj

  10. Hi Riyej

    the error i am seeing in the trace is similar but not exactly which you mentioned.
    our instance is single node 10.2.0.5
    we are facing this error.in alert log.

    System State dumped to trace file /u12/app/oracle/admin/pck11/udump/pck11_ora_4145244.trc
    Wed Sep 28 09:12:19 CEST 2011
    System State dumped to trace file /u12/app/oracle/admin/pck11/udump/pck11_ora_3457118.trc
    Wed Sep 28 09:26:11 CEST 2011
    >>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=21
    System State dumped to trace file /u12/app/oracle/admin/pck11/udump/pck11_ora_5378194.trc
    Wed Sep 28 09:27:23 CEST 2011
    System State dumped to trace file /u12/app/oracle/admin/pck11/udump/pck11_ora_3477674.trc

    trace file contents:
    ————————–

    row cache parent object: address=700000024f43540 cid=15(dc_database_links)
    row cache parent object: address=70000002edd6038 cid=0(dc_tablespaces)
    row cache parent object: address=70000002ed772a8 cid=7(dc_users)
    row cache parent object: address=700000024ef1860 cid=16(dc_histogram_defs)
    row cache parent object: address=7000000293294f8 cid=11(dc_object_ids)
    row cache parent object: address=70000002eec41e0 cid=8(dc_objects)
    row cache parent object: address=700000024f43540 cid=15(dc_database_links)

    SO: 70000002aaf6af8, type: 53, owner: 70000002f385900, flag: INIT/-/-/0×00
    LIBRARY OBJECT LOCK: lock=70000002aaf6af8 handle=70000002ece9f90 mode=N
    call pin=0 session pin=0 hpc=0000 hlc=0000
    htl=70000002aaf6b78[70000002aaf6ac0,70000002aaac9e8] htb=70000002aaac9e8 ssga=70000002aaac0f0
    user=70000002f385900 session=70000002f381898 count=1 flags=BRO/[0100] savepoint=0×0
    LIBRARY OBJECT HANDLE: handle=70000002ece9f90 mtx=70000002ecea0c0(0) lct=118 pct=913 cdp=0
    namespace=CRSR flags=RON/KGHP/PN0/EXP/[10010100]
    kkkk-dddd-llll=0000-0001-0001 lock=0 pin=0 latch#=9 hpc=ffde hlc=ffde
    lwt=70000002ecea038[70000002ecea038,70000002ecea038] ltm=70000002ecea048[70000002aaf6b28,70000002aaf6b28]
    pwt=70000002ecea000[70000002ecea000,70000002ecea000] ptm=70000002ecea010[70000002ecea010,70000002ecea010]
    ref=70000002ecea068[700000029dcd310,700000029dcd310] lnd=70000002ecea080[70000002ecea080,70000002ecea080]
    LIBRARY OBJECT: object=700000029dccc78
    DATA BLOCKS:
    data# heap pointer status pins change whr
    —– ——– ——– ——— —- —— —
    0 70000002ece9ed0 700000029dccd90 I/P/A/-/- 0 NONE 00
    —————————————-
    SO: 70000002aaf6a40, type: 53, owner: 70000002f385900, flag: INIT/-/-/0×00
    LIBRARY OBJECT LOCK: lock=70000002aaf6a40 handle=70000002edf2cf0 mode=N
    call pin=0 session pin=0 hpc=0000 hlc=0000
    htl=70000002aaf6ac0[70000002aaac9e8,70000002aaf6b78] htb=70000002aaac9e8 ssga=70000002aaac0f0
    user=70000002f385900 session=70000002f381898 count=1 flags=[0000] savepoint=0x4e83f697
    the rest of the object was already dumped
    —————————————-
    SO: 70000002d4345e0, type: 41, owner: 70000002f385900, flag: INIT/-/-/0×00
    (dummy) nxc=0, nlb=0

    SQL> select PARAMETER from v$rowcache where cache# in (0,7,8,11,15,16);

    PARAMETER
    ——————————–
    dc_tablespaces
    dc_users
    dc_objects
    dc_object_ids
    dc_database_links
    dc_histogram_defs
    dc_users
    dc_object_grants
    dc_histogram_data
    dc_histogram_data
    dc_users
    dc_users

    12 rows selected.

    SQL>

    with this information i cant able to figure out which database object is facing issue.
    every time when ever this error is coming we need to kill all the db processes and start it no new connections are allowed.

    appreciate your help if you can help me to understand this error and how to over come from this error.

    Regards
    Javeedkaleem.

    • Hello Syed
      The data you have provided is not sufficient to do further analysis. Can you tell me what that PID 21 is waiting on? That would tell you what specific row cache parent it was waiting on. Let me know if I can access to your systemstate dump for further analysis.
      (or) try searching for “waiting” in the systemstate dump file and see the waited events. Then you can use p1 and p2 of the ‘row cache’ waited event to understand the issue.

  11. Mohammad Rasheeduddin said

    Hello Riyaj,

    Nice article. may i know what measures have we to take in order to avoid this issue. As per “keeping sequence with nocache in RAC for highly accessed sequences is pretty bad idea” shows that we should not use nocache in RAC for highly accessed sequence. Correct me if i’m wrong. So what option is best suited for this situation.

    regards,
    Mohammad Rasheeduddin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 179 other followers

%d bloggers like this: