Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

CPU profiling using perf utility in Linux

Posted by Riyaj Shamsudeen on June 9, 2016

After reading my blog entry about a performance issue due to excessive HCC decompression ( Accessing HCC compressed objects using index access path, a reader asked me about the CPU profiling method I mentioned in that blog entry. I started responding to that comment, and realized that the response was too big for a comment. So, in this blog entry, I will cover basics of the CPU profiling in Linux. Other platform provides similar utilities, for example, Solaris provides an utility dtrace.

Tool Box

There are many ways to do CPU profiling, and a few techniques are:
(a) Using Linux perf utility.
(b) Using pstack output and grouping the output by top functions in the call stack. Tanel has a great blog entry about the variant of this method: Tanel’s blog entry

I will discuss the perf utility briefly in this entry.

perf utility

There are two parts to perf utility usage: perf record to sample the function calls executed by a process or processes. perf record command writes the output to a file perf.data, by default, in the current directory. perf report reads the perf.data file and prints a human readable output of top function calls. Think of the functionality of the perf report command as “group by function_call, order by count desc” operation.

You can use perf utility to watch the server, a process, or a subset of processes. In this example, I recreated the issue in my database session and so we will use the command: perf record -p to record the function call samples on the dedicated server process associated with my session.

  -- Login to the database using your favorite tool.
  -- Find the pid of the Oracle dedicated server process. (You could skip this step if you know the pid of the process that you want to probe. Note that, you can use this method to probe an application performance issue also, not just database performance issue.)
  select spid from v$process where addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));
  
  -- In another window, login to the server and become root (sudo, direct login etc)
  cd /tmp; perf report -p 

  -- Recreate the issue in the first window.
  @lu.sql
  exit;

  -- Control+c the perf record command also.

perf report

So, perf record command mentioned above would have written the file perf.data. We can use perf report command to identify the function calls frequently seen executed by the process. ( Partial output shown to improve brevity)

# Events: 427K cycles
#
# Overhead          Command                 Shared Object                               Symbol
# ........  ...............  ............................  ...................................
#
    34.84%  oracle_76112_ua  oracle                        [.] BZ2_decompress
    25.52%  oracle_76112_ua  oracle                        [.] unRLE_obuf_to_output_FAST
     5.44%  oracle_76112_ua  oracle                        [.] kxhrPUcompare
     4.03%  oracle_76112_ua  oracle                        [.] qerhjWalkHashBucket
     4.00%  oracle_76112_ua  oracle                        [.] qerhjSplitProbe
     3.24%  oracle_76112_ua  oracle                        [.] qeruaRowProcedure
     3.18%  oracle_76112_ua  oracle                        [.] qerhjInnerProbeHashTable
     2.20%  oracle_76112_ua  oracle                        [.] kdzdcol_get_vals_rle_one
     2.05%  oracle_76112_ua  oracle                        [.] qervwRowProcedure
     2.03%  oracle_76112_ua  oracle                        [.] kdzt_acmp_project
     1.61%  oracle_76112_ua  oracle                        [.] evareo
     1.13%  oracle_76112_ua  oracle                        [.] kxhrHash
..

Analysis

From the output of perf report command, you can infer that the process was seen executing the BZ2_decompress function calls in ~35% of samples. That BZ2_decompress function call is executed to decompress the segments, compressed with HCC archive compression. Next function call in the list(25% of samples) unRLE_obuf_to_output_FAST is also associated with bzip2 decompression function (Refer unRLE_obuf_to_output_FAST ).

We were able to identify the function calls consuming the CPU cycles quickly using perf tool. In this case, it is HCC archive high decompression. So, we can correlate this data with the SQL Monitor output, identify the line number spending time, and understand the reasoning behind high CPU usage (and high elapsed time).

perf tool has many options. Read the wiki for further details: perf wiki

One Response to “CPU profiling using perf utility in Linux”

  1. Anand said

    Thank you for the post!!

Leave a comment