Troubleshooting ‘DFS lock handle’ waits
Posted by Riyaj Shamsudeen on November 8, 2011
Waits for ‘DFS lock handle’ can cause massive performance issues in a busy RAC cluster. In this blog entry, we will explore the DFS lock handle wait event, and understand how to troubleshoot the root cause of these waits. I am also going to use locks and resources interchangeably in this blog, but internally, they are two different types of structures.
A little background
DFS (stands for Distributed File System) is an ancient name, associated with cluster file system operations, in a Lock manager supplied by vendors in Oracle Parallel Server Environment (prior name for RAC). But, this wait event has morphed and is now associated with waits irrelevant to database files also. Hence, it is imperative to understand the underlying details to debug the ‘DFS lock handle’ waits.
How does it work?
I have no access to the code, so read this paragraph with caution, as I may have misunderstood my test results: A process trying to acquire a lock on a global GES resource sends a AST(Asynchronous Trap) or BAST (Blocking Asynchronous Trap) message to LCK process, constructing the message with (lock pointer, resource pointer, and resource name) information. If the resource is not available, then the LCK process sends a message to the lock holder for a lock downgrade.
Now, let’s imagine that we need to ask remote background process to do some operations, say, DBWR to do an object checkpoint in a remote instance. Further, the requestor must wait for the background process to complete the task. What better way is there to implement these waits, than using a lock? It seems that acquiring a lock on a specific resource in a specific mode, triggers a predefined action in that background process. For example, acquiring locks on a specific CI resource triggers the DBWR to object level checkpoint (details later).
While the background process is working, the requesting process is instrumented to wait on the wait event ‘DFS lock handle’, in essence, the requesting process is waiting for one or more background process to perform a specific action. BTW, in some cases, background processes also can wait for ‘DFS lock handle’. Excessive waits by the foreground or background process for the DFS lock handle wait event can lead to instance freeze or application freeze.
Every resource has a value block and can be used to send more information about the resource, such as object_id for a sequence, object_id for a table partition etc.
Type of Lock
First, we need to identify the type of lock associated with this event. Fortunately, type|event in the trace file (or p1 column in v$session when the event waited is DFS lock handle) has this information coded and can be retrieved using the following SQL statement.
From a trace file:
nam='DFS lock handle' ela= 4362 type|mode=1398145029 id1=86033 id2=0 obj#=-1 tim=23065806692
SQL> select chr(bitand(&&p1,-16777216)/16777215) ||
mod(&&p1, 16) md
1398145029 = 0x53560005; 53 is ASCII for S and 56 is ASCII for V.
Now, we know that, lock type is SV and we can get more details about this lock type accessing v$lock_type view. SV enqueue is acquired to co-ordinate the retrieval values from sequences created with order and cache. There are many other enqueue types though, this is just an example
SQL> exec rs.print_table
('select * from v$lock_type where type=''SV''');
TYPE : SV
NAME : Sequence Ordering
ID1_TAG : object #
ID2_TAG : 0
IS_USER : NO
DESCRIPTION : Lock to ensure ordered sequence allocation in RAC mode
In Global Enqueue Services layer, each resource is uniquely identified by a resource name. For example, for the sequence with object_id=86033, resource name is [0x15011][0x0],[SV][ext 0x0,0x0]. As you see, this resource name is essentially, [id1][id2],[lock type][extension]. In this example, id1=86033=0x15011. Also, p2, p3 columns in v$session for this DFS lock handle wait event is enumerating id1 and id2 columns of the resource. As you probably guessed, many instance level locks are not visible in gv$lock family of views, of course, gv$lock is a global view of local locks and gv$ges_enqueue is global view of global locks :-)
Let’s probe further to understand few lock types commonly encountered.
If the enqueue type is SV, then that enqueue is used to co-ordinate sequence values in a RAC cluster, for sequences declared with order and cache attributes. For the SV locks, id1 column will give you the object# of the sequence. In the example line below
nam='DFS lock handle' ela= 4362 type|mode=1398145029 id1=86033 id2=0 obj#=-1 tim=23065806692
id1 is set to 86033. That’s the object_id of a sequence declared with cache and order. Sequences with these attributes require the values to be in ORDER. As we know that in RAC, it is likely to have sequences return values in an order not correlated with time, due to the fact that each instance will cache sequence values. With ORDER attribute, we specifically require these values to be in a strict ascending/descending order even if the sequences are accessed in different instances. To accommodate this requirement, RAC code must ensure that next value retrieved from the sequence will be in an ORDER in any instance and the co-ordination mechanism is by transferring couple of GES messages between the instances. Master node of the GES resource for that sequence object will keep track of the latest value. It looks like, Value block of the GES resource is updated to reflect the latest sequence value.
You might be wondering what if there is more than a sequence. How is it handled? Each sequence gets its own resource. For example, the resource name for sequence with object_id=86033 will be [0x15011][0x0],[SV] and another sequence with object_id=86034 will be maintained by a resource [0x15012][0x0],[SV] etc. Value block of these resources will contain the latest value of the sequence and kept in the master node of this resource.( As a shameless plug, I go in to much details about GES and GRD in my Advanced RAC troubleshooting class. )
Of course, if the wait event is DFS lock handle, and if the enqueue is SV type, then the solution probably would be that, ORDER attribute is not a suitable attribute for that heavily accessed sequence. My recommendation is always been (a) Use bigger cache >1000 for the sequences frequently accessed, hundreds per second (b) Use default cache of 20 if the sequence is not frequently accessed (c) if you really require ORDERed values, know that this could be a problem in RAC, and use ORDER cache (d) Only in exceptional situations use nocache.
CI stands for Cross Invocation, and mainly used to trigger an action in a background process such as DBWR, CKPT process etc, in a remote instance. id1 and id2 columns of this CI enqueue has special significance. Depending upon the values of these id1,id2 fields a specific resource is manipulated to trigger a specific action in a background process.
For example, truncating a table leads to foreground process marking a segment as temporary and requesting the background process to clean up the segments from their cache. Essentially, a co-ordination mechanism to ensure that all buffers about a segment is flushed out by DBW. This operation is triggered by acquiring a GES resource of CI type in a specific format: [0xd][0x2],[CI][ext 0x0,0x0]. From the GES resource name [0xd][0x2],[CI], we can decode that lock name is CI, id1=0xd=13, id2=0x2=2 (CI-13-2).
You can see the process holding this specific resource in gv$ges_enqueue;
select resource_name1 ,inst_id, pid,blocker, blocked from gv$ges_enqueue where resource_name1 like '%[0xd][0x2],[CI]%' and pid!=0
RESOURCE_NAME1 INST_ID PID BLOCKER BLOCKED
------------------------------ ---------- ---------- ---------- ----------
[0xd][0x2],[CI][ext 0x0,0x0] 1 1481 0 0
[0xd][0x2],[CI][ext 0x0,0x0] 2 1486 0 0
In this case, PID 1486 is DBW0
$ ps -ef |grep 1486|grep -v grep
oracle 1486 1 0 14:23:36 ? 0:02 ora_dbw0_solrac2
In fact, similar resource with a different id2 combinations are accessed prior to waiting for id2=2. For example, resources, [0xd][0x1],[CI], [0xd][0x3],[CI] are accessed prior to acquiring the resource [0xd][0x2],[CI] for a partition. Value block of the resource [0xd][0x1],[CI] is used to send the object_id of the segments. First two resources ([0xd][0x1],[CI], [0xd][0x3],[CI] ) are accessed as AST(Asynchronous Trap), and the resource [0xd][0x2],[CI] is accessed with BAST (Blocking Asynchronous Trap) per partition (or segment).
If you encounter high number of waits for ‘DFS lock handle’ with CI enqueue, then find the object_id of the event. Find the object details from the object_id, and understand the activity on that segment. Of course, excessive truncate and DDL statement on objects can cause issues in RAC due to global lock activity.
There are few more examples of CI locks. For example, starting parallel query servers in an inter-instance parallelism will trigger few access to CI locks. Three access to the resource to allocate one parallel slave is printed below: Essentially, resource is [0xa][0x1],[CI] for the first wait below.
nam='DFS lock handle' ela= 554 type|mode=1128857605 id1=10 id2=1 obj#=79568
nam='DFS lock handle' ela= 523 type|mode=1128857605 id1=10 id2=3 obj#=79568
nam='DFS lock handle' ela= 129970 type|mode=1128857605 id1=10 id2=2 obj#=79568
Note 34631.1 documents some of the meaning for id1 and id2 column. For example, id1=13 “drop sort segments”. I am not certain this note is accurate enough for 11.2 though.
IV enqueues are acquired for Invalidation Lock. When a DDL is encountered on an object, then the dependent objects need to be invalidated. For example, cursors (SQL statements) that have dependency on a table need to be invalidated so that next execution of that cursor will reparse the cursor, after a DDL on that table. (Of course, I do understand that not all DDL statements will invalidate the cursors).
only the permanent objects such as tables and indexes are globalised and the SQL statements need not be protected by global resources. SQL statements are in turn invalidated by the dependency mechanisms on the dependent objects. For example, if you alter a table to drop a column, then that DDL statement will trigger invalidation of dependent objects of that table from all instances, an indirect invalidation of cursors from all instances occurs. In some cases, library cache locks and pin waits can occur due to indirect dependency validation mechanisms.
nam='DFS lock handle' ela= 414 type|mode=1230372869 id1=1398361667 id2=59 obj#=82512 tim=11542441219
In the example above, object# 82512 is an associated with table partition. Essentially, all dependent object of that partition must be invalidated from all instances. There are potential bugs in this area, for example, IV waits are higher for shared server connections (bug 8215444)
XA enabled applications can connect to more than one instance in RAC from 11.2 onwards. Prior to 11.2, XA transaction can only connect to just one instance of RAC. GTX processes implement these global transactions and due to bug 13003033, GTX process waits for BB enqueue longer, leading to more DFS lock handle waits in 11.2. Only workaround at this point is to disable global transactions (126.96.36.199).
Enqueue related to ASM
There are many ASM related operations that use DFS lock handles to co-ordinate the operations. For example, ASM map operations when a file extended or added will trigger actions in the other nodes via DFS lock handle mechanism.
Here is the list of ASM based waits:
38 ASM diskgroup discovery wait
39 ASM diskgroup release
40 ASM push DB updates
41 ASM add ACD chunk
42 ASM map resize message
43 ASM map lock message
44 ASM map unlock message (phase 1)
45 ASM map unlock message (phase 2)
46 ASM generate add disk redo marker
47 ASM check of PST validity
48 ASM offline disk CIC
DFS lock handle is a wait event is associated with many different areas of database software code. It is important to identify the lock type, id1,id2 and understand the problem to resolve the root cause. Of course, many of these issues can be a new or already-identified software bugs too.