Monday, October 17, 2011

Analyse DB2 deadlocks



Introduction
Deadlock detection is important in any database environment for application concurrency. In complex Web environments, just like any other application, you need to be able to determine where any deadlocks are originating. This article explains how to configure DB2 for os/390′s deadlock trace settings to enable deadlock analysis. It explains how to specify the related DB2 for z/OS trace to get enough information, then how to analyze these trace reports and pinpoint the offending SQL statements causing the deadlocks for the complex Web applications on DB2 for the z/OS environment. The article assumes familiarity with the basic operations of z/OS.
Configure trace using the DB2 Performance Monitor
Locking trace
To enable Locking trace, first open the IBM DB2 Performance Monitor application. Then, perform the following steps:
  1. Configure Collect Task A to collect the deadlock trace. Set Trigger by 4=Immediate Start to active the task immediately.
  2. Select LockingData TypeIFICDRequesting Location, Plan name and Authid, then press Enter.
    Figure 1. Configure Collect Task A
    Configure Collect Task A
  3. Select the data type Lockout, and press Enter.
  4. On the IFCID Selection panel, select the following IFICDs, then press Enter.
    105    DBID/OBID for database and tablespace translation
    107    Data set open/close information
    172    Deadlock
  5. On the Trace Qualification panel, fill in the DB username (TUSER03 in this example), and the DB schema (TGUSER03 in this example), then press Enter.
    Figure 2. Trace qualification panel
    Trace qualification panel
  6. On the Trigger Immediately panel, fill in the output DataSet for DB2 trace data (for example, TGUSER03.DB2PM.TRACE01). Set the Disposition to Overwrite.
    Note: You can configure the DB2 trace stop trigger using other methods (for example, after a certain time has elapsed).
    Figure 3. Trigger immediately panel
    Trigger immediately panel
  7. Press Enter, and finish the Locking trace configuration.
  8. Activate Collect Task A to collect the deadlock information, if there is a deadlock when the Web application is running.
SQL Activity trace
Follow these steps to configure an SQL Activity trace:
  1. Configure Collect Task B to collect the SQL Activity trace. Set Trigger by 4=Immediate Start to active the task immediately.
  2. Select SQL ActivityData TypeIFCIDRequesting Location, Plan name and Authid, then press Enter.
  3. Select all data types for the collection, then press Enter.
  4. On the IFCID Selection panel, select the following IFCIDs, then press Enter:
    16    Start of the first insert
    20    Lock summary
    53    Describe, SQL commit/rollback or error before SQL analyzed
    58    End of SQL statement execution
    59    Start of FETCH
    60    Start of SELECT
    61    Start of INSERT, UPDATE or DELETE
    63    SQL statement to be parsed
    64    Start of PREPARE
    65    Start of OPEN CURSOR for static or dynamic SQL
    66    Start of CLOSE CURSOR for static or dynamic SQL
    68    Start of ROLLBACK
    69    End of ROLLBACK
    70    Start of COMMIT phase 2
    71    End of COMMIT phase 2
    88    Start of synchronous request (commit phase 1)
    89    End of synchronous request (commit phase 1)
    105    DBID/OBID for database and tablespace translation
  5. On the Trace Qualification panel, fill in the DB username (for example, TUSER03) and the DB schema (for example, TGUSER03), then press Enter.
  6. On the Trigger Immediately panel, fill in the output DataSet for DB2 trace data (for example, TGUSER03.DB2PM.TRACE02). Set the Disposition to OverwriteNote: You can configure the DB2 trace stop trigger using other methods (for example, after a certain time has elapsed).
  7. Press Enter to finish the SQL Activity configuration.
  8. Activate the Collect Task B to collect the SQL statements during the timeframe that the Web application is running.
Analyze the trace reports to determine the offending SQL statements
Fundamentals of DB2 locks in Web applications
Usually Web applications have page and row locks. You can determine the types of locks being used from the Data Definition Language (DDL) schema files used to create the database. There are three modes for row locks: S(Share), U(Update) and X(Exclusive). The effects of locks that you always want to minimize are suspension, timeout, and deadlock.
A deadlock occurs when two or more application processes each hold locks on resources that the others need and without which they cannot proceed. Here is a detailed explanation of how a deadlock situation arises:
  1. Jobs JobOne and JobTwo are two transactions. JobOne accesses table M, and acquires an X (exclusive) lock for page B, which contains record 000300.
  2. JobTwo accesses table N, and acquires an X (exclusive) lock for page A, which contains record 000010.
  3. JobOne requests a lock for page A of table N, while still holding the lock on page B of table M. The job is suspended, because JobTwo is holding an X lock on page A.
  4. JobTwo requests a lock for page B of table M, while still holding the lock on page A of table N. The job is suspended, because JobOne is holding an X lock on page B. This situation is a deadlock.
In order to improve an application’s concurrency, you need to figure out which SQL statements are causing the deadlocks. Then, optimize the SQL statements to eliminate the deadlocks.
Analyze the lock information from the deadlock report
As an example, assume there are deadlocks when multiple shoppers log on and register a shop concurrently. You have obtained a deadlock trace report and an SQL statement report.
First you should check the deadlock trace report (TGUSER03.DB2PM.LOCKS in this article).
Here are some explanations of key parameters in the trace report to help you understand the process:
Figure 4. Trace parameters
Trace parameters
Analyze the first deadlock on table USERS (Figure 5 and Figure 6). In Figure 5, you can see two resources were involved in the deadlock. They are row X'2B', page X'00004E', page USERS, and DB SW03DB1; and row X'2B', page X'00004C', table USERS, and DB SW03DB1WAITERS =2 indicates there are two waiters (0CC544053119 and 0E26A4053107) involved in the deadlock. And the deadlock occurred at 12/05/05 06:30:09.40.
From Figure 6, you see that the resource holder and waiter are the opposite of Figure 5. The waiter (actually it is the holder in Figure 5) was requesting the resource that was held by the holder (actually it is the waiter in Figure 5). In this situation, the deadlock occurred according to the deadlock definition.
Now, summarize the lock relationship using Figures 5 and 6.
From Figure 5, you can see the lock held by LUW 0CC544053119 is a row lock on row X'2B', page X'00004E', table USERS, and DB SW03DB1, and is held in X state. The waiter LUW instance 0E26A4053107 was requesting a mode of lock S on the same resource. While in Figure 6, the lock held by LUW 0E26A4053107 is a row lock on row X'2B', page X'00004C', table USERS, and DB SW03DB1, and is held in X state. The waiter LUW instance 0CC544053119 was requesting a mode of locks on the same resource. Then the deadlock occurred.
Finally, note that the entry in the Figure 5, identified as BLOCKER is HOLDER --*VICTIM* is the thread (the “victim”) whose work will be rolled back to let the other proceed.
Figure 5. Locking trace – deadlock report
Locking trace - deadlock report
Figure 6. Locking trace – deadlock report
Locking trace - deadlock report
Table 1 summarizes the deadlock analysis:
Table 1. Deadlock analysis
LUW instanceResource held (X lock)Resource requested (S lock)Deadlock timestamp
0CC544053119SW03DB1.USERS.X’00004E’.X’2B’SW03DB1.USERS.X’00004C’.X’2B’06:30:09.41044991
0E26A4053107SW03DB1.USERS.X’00004C’.X’2B’SW03DB1.USERS.X’00004E’.X’2B’06:30:09.41044991
Analyze the SQL information from the SQL activities report
Print SQL ACTIVITY trace (TGUSER03.DB2PM.SQL in this article) filtering with LUW INSTANCE numbers (0CC544053119 and 0E26A4053107) both involved in the deadlock. And find the latest commit operation, which should be completed just before the time of the deadlock appearance (06:30:09.41044991). Then, search the SQL statements only executing after the latest commit operation is completed.
COMMIT processing in SQL ACTIVITY trace for INSTANCE 0CC544053119
COMMIT RECEIVED      06:28:50.72
COMMIT RECEIVED      06:28:50.85
COMMIT RECEIVED      06:28:50.97
COMMIT RECEIVED      06:28:51.04  the latest commit before the deadlock occurred.
COMMIT RECEIVED      06:30:09.61
COMMIT RECEIVED      06:30:09.64
COMMIT RECEIVED      06:30:09.73
COMMIT RECEIVED      06:30:09.77
COMMIT RECEIVED      06:30:09.80

So, you should investigate the SQL statements that accessed SW03DB1.USERS and were executed between 06:28:51.04 and 06:30:09.41044991. They are shown in Figure 7.
Figure 7. SQL report
SQL report
According to the lock state X and S, there should be an INSERT statement followed by a SELECT statement on resource SW03DB1.USERS.
In the same way, you could find the correct latest commit time before deadlock appearance for INSTANCE 0E26A4053107.
COMMIT processing in SQL ACTIVITY trace for INSTANCE 0E26A4053107
COMMIT RECEIVED     06:28:50.65  the latest commit before the deadlock occurred.
COMMIT RECEIVED     06:30:49.67

Then, you should investigate the SQL statements that accessed SW03DB1.USERS and were executed between 06:28:50.65 and 06:30:09.41044991. They are shown in Figure 8.
Figure 8. SQL report 
SQL report
From Figure 5 and Figure 6, you can see both the instance 0CC544053119 and 0E26A4053107 are trying to submit the INSTER INTO USERS and SELECT FROM USERS SQL statement. Since there is no COMMIT between the INSERT and SELECT statement, potentially the deadlock is caused by a table scan. So the deadlock occurred when concurrent threads were run.
Conclusion
This article described how to use the DB2 Performance Monitor facility to collect both deadlock and SQL Activity traces. In addition, it provided an example showing how to find the SQL statements involved in one deadlock situation by analyzing the traces. Using this approach both the developer and tester can find the offending SQL statements and finish the first step of the concurrency performance problem resolution.

No comments:

Post a Comment