Monday, October 17, 2011

DB2 Lock


When several DB2® users access a database concurrently, lockwait situations can cause bad response times. Lockwaits tend to be temporary in nature and thus hard to catch. Nevertheless, when lockwait situations occur, it is the database administrator’s responsibility to determine the cause of the lockwait times. This article demonstrates, by example, how to use the db2pd and db2pdcfg utilities for DB2 for Linux®, UNIX®, and Windows® to accomplish that task.

db2pd options for lock monitoring
db2pd is a utility for monitoring and troubleshooting all kinds of DB2 database activity. It is a standalone utility shipped with the DB2 engine starting with the DB2 V8.2 release and was developed to look, feel, and function like the Informix onstat utility.db2pd is executed from the command line with an optional interactive mode. The utility runs very quickly as it is not acquiring any locks or latches and runs outside of the engine’s resources (meaning it even works on a hung engine). A lot of the monitor data provided by db2pd can also be gathered by snapshot monitoring, but the output format of db2pd and snapshot monitoring are quite different. This allows the DBA to choose the monitoring alternative that is more suitable for her or his needs. This article focuses on db2pd‘s options for lock monitoring. There is a developerWorks article by Sam Poon (see the Resources section) that provides a broader introduction to db2pd‘s monitoring capabilities.
The following diagram shows db2pd‘s options for lock monitoring:
Figure 1. db2pd options for lock monitoring
db2pd lock options
  • TranHdl: This is a specification of a transaction handle that allows for monitoring only locks held by a specific transaction.
  • showlocks: This suboption expands the lockname into meaningful explanations. For a row lock, the following information is shown: tablespace ID, table ID, partition ID, page and slot. Tablespace ID and table ID can easily be mapped to the corresponding tablename using a query on the catalog view SYSCAT.TABLES:
    Listing 1. Map tablespace ID, table ID to tableschema, tablename
    WHERE TBSPACEID = tbspaceid AND TABLEID = tableid
  • wait: Specifying the wait suboption, db2pd only shows locks that transactions are currently waiting on, and the locks responsible for the wait situation. This suboption greatly simplifies lockwait analysis because it limits the output to the locks participating in the lockwait situation.
  • The db2pd database and file options are not specific for lock monitoring, but apply to (nearly) all db2pd calls. Thedatabase option limits monitor data returned by db2pd to that of a certain database. The file option let’s you define a file to which db2pd output is written.
A lockwait analysis scenario
Next, let’s start to analyze a sample lockwait situation using the db2pd options introduced. For this purpose, let’s create the DB2SAMPLE database as follows:
Listing 2. Create SAMPLE database

Transaction A is executed by user A to give each manager a 10% bonus depending on the salary of the manager:
Listing 3. Update operation performed by transaction A

While transaction A is still running (because user A has not ended the transaction using COMMIT or ROLLBACK), transaction B is executed by user B to give each employee a 2% salary raise:
Listing 4. Update operation performed by transaction B

As transaction B does not finish, user B calls the DBA to determine the cause of the problem. The DBA in turn calls db2pd to look for a possible lockwait situation:
Listing 5. Check for lockwait situation
db2pd -db sample -locks wait showlocks

Database Partition 0 -- Database SAMPLE -- Active -- Up 3 days 08:33:05

Address    TranHdl    Lockname                   Type       Mode Sts Owner      Dur
0x050A0240 6          02000600050040010000000052 Row        ..X  W   2          1
0x050A0DB0 2          02000600050040010000000052 Row        ..X  G   2          1   

HoldCount  Att  ReleaseFlg
0          0x00 0x40000000   TbspaceID 2  TableID 6  PartitionID 0 Page 320 Slot 5
0          0x00 0x40000000   TbspaceID 2  TableID 6  PartitionID 0 Page 320 Slot 5

db2pd reports a lockwait situation for a row lock occurring on table with ID 6 in tablespace with ID 2. Checking againstSYSCAT.TABLES, the DBA determines that there is indeed a lockwait situation on table EMPLOYEE.
Listing 6. Determine the table involved in the lockwait situation

TABSCHEMA                               TABNAME
FECHNER                                 EMPLOYEE

  1 record(s) selected.

The status column of the db2pd -locks output (Sts) shows a “G” for transaction 2 (column TranHdl). G stands for “granted” and means that the transaction with transaction handle 2 owns the row lock. Additionally, column Mode indicates that it is an X lock that is held by transaction 2. The waiting transaction (state W = “wait” in column Sts) is the transaction with handle 6. That transaction is requesting an X lock on the same row as transaction 2. You can see this by looking at the Owner column (showing transaction 2 as the lock owner) and comparing the Lockname (identical for both entries in the db2pd -locks section).
Next, the DBA maps the transaction handles to applications. This is done using another db2pd option, -transactions:
Listing 7. Map transaction handles to applications
db2pd -db sample -transactions

Database Partition 0 -- Database SAMPLE -- Active -- Up 3 days 08:34:47

Address    AppHandl [nod-index] TranHdl    Locks      State   Tflag      Tflag2
0x05141880 30       [000-00030] 2          9          WRITE   0x00000000 0x00000
0x05144880 34       [000-00034] 6          5          WRITE   0x00000000 0x00000

The output from this db2pd call shows that transaction 2 (column TranHdl) is executed by application 30 (column AppHandl) and transaction 6 by application 34, respectively. Both transactions are in the process of writing changes to the database (column State = WRITE). So the DBA now knows that application 30 is holding the lock on which application 34 is waiting.
To get more information about the applications participating in the lockwait situation, db2pd is called with the -agents option. This option prints information about the agents working on behalf of the applications. Note that -agents is an instance level option, that means specifying a database is not necessary (indeed when a database is specified, db2pd prints a warning and ignores the database option).
Listing 8. Get information about applications and corresponding agents
db2pd -agents

Database Partition 0 -- Active -- Up 3 days 08:35:42

Current agents:      2
Idle agents:         0
Active coord agents: 2
Active agents total: 2
Pooled coord agents: 0
Pooled agents total: 0

Address    AppHandl [nod-index] AgentTid   Priority   Type     State
0x04449BC0 34       [000-00034] 3392       0          Coord    Inst-Active
0x04449240 30       [000-00030] 2576       0          Coord    Inst-Active

ClientPid  Userid   ClientNm Rowsread   Rowswrtn   LkTmOt DBName
3916       USER_B   db2bp.ex 43         43         NotSet SAMPLE
2524       USER_A   db2bp.ex 153        14         NotSet SAMPLE

In the db2pd -agents output, the DBA can see the IDs of the users working with application 30 and 34 (column Userid): application 30 is executed by USER_A, and application 34 is executed by USER_B. Such a mapping between application and user ID is only possible if each user has a separate database authorization ID. Generally, it is not possible for applications running on an application server because these applications use connection pooling so that connections are not personalized.
More information about each application is provided by the db2pd option -applications:
Listing 9. Get more information about applications
db2pd -db sample -applications

Database Partition 0 -- Database SAMPLE -- Active -- Up 3 days 08:36:14

Address    AppHandl [nod-index] NumAgents  CoorTid    Status
0x04AF8080 34       [000-00024] 1          3940       Lock-wait
0x03841960 30       [000-00020] 1          2548       UOW-Waiting             

C-AnchID C-StmtUID  L-AnchID L-StmtUID  Appid
195      1          0        0          *LOCAL.DB2.061122195637
0        0          60       1          *LOCAL.DB2.061122195609

The Status column confirms something the DBA already knows: application 34 is in lock-wait state. But that is nothing new, so the DBA concentrates on columns C-AnchID/C-StmtUID and L-AnchID/L-StmtUID, respectively. “C” stands for current and “L” stands for last anchor ID/statement UID. These IDs can be used to identify the last SQL statement executed by an application as well as the statement currently executed by an application. To do so, db2pd with the -dynamic option is called. This option shows the contents of the database’s dynamic statement cache:
Listing 10. Examine contents of dynamic statement cache
db2pd -db sample -dynamic

Database Partition 0 -- Database SAMPLE -- Active -- Up 3 days 08:37:39

Dynamic Cache:
Current Memory Used           187188
Total Heap Size               1271398
Cache Overflow Flag           0
Number of References          2
Number of Statement Inserts   3
Number of Statement Deletes   0
Number of Variation Inserts   2
Number of Statements          3

Dynamic SQL Statements:
Address    AnchID StmtUID    NumEnv     NumVar     NumRef     NumExe
0x056CEBD0 60     1          1          1          1          1
0x056CE850 180    1          0          0          0          0
0x056CFEA0 195    1          1          1          1          1          


Dynamic SQL Environments:
Address    AnchID StmtUID    EnvID Iso QOpt Blk
0x056CECD0 60     1          1     CS  5    B
0x056D30A0 195    1          1     CS  5    B

Dynamic SQL Variations:
Address    AnchID StmtUID    EnvID VarID      NumRef     Typ
0x056CEEB0 60     1          1     1          1          4
0x056D3220 195    1          1     1          1          4   


The mapping between the -applications output and the -dynamic output is straightforward:
Application 34 (in lock-wait state) is currently executing the SQL statement identified by current anchor ID 195 and current statement ID 1. In the Dynamic SQL Statements section of the db2pd -dynamic output, those IDs can be mapped to the following SQL statement:
Listing 11. SQL statement executed by application 34

The last SQL statement executed by application 30 that is holding the lock was the statement with last anchor ID 60 and last statement ID 1. Those IDs can be mapped to the following SQL statement:
Listing 12. SQL statement executed by application 30

Note that the db2pd -dynamic output contains another interesting piece of information that is normally difficult to find out: The isolation level of a dynamic SQL statement that was executed is shown in column Iso of the Dynamic SQL Environmentssection (UR = Uncommitted Read, CS = Cursor Stability, RS = Read Stability, RR = Repeatable Read).
Let’s summarize what the DBA discovered regarding the cause for the hang of user B’s application:
  • The hang is caused by an exclusive row lock on table EMPLOYEE.
  • The transaction owning the lock belongs to an application executed by user A. The transaction of user B is waiting on that lock.
  • The two statements colliding are both UPDATE statements on table EMPLOYEE.
With that information, the DBA can initiate the necessary steps to resolve the lockwait situation, such as advising user A to end the transaction, or forcing user A’s application, among other options. In addition, measures can be taken to avoid such situations in the future, for example configuring the DB2 governor to automatically end transactions that run too long.
In the sample scenario, db2pd was executed several times one after the other, each time with a single option. In a real-world scenario, that would not be the case. Instead, db2pd would be called once with all the options introduced above:
Listing 13. Single db2pd call with all options required to analyze a lockwait situation
db2pd -db sample -locks wait showlocks -transactions -agents -applications -dynamic
      -file db2pd.out -repeat 15 40

The resulting output consists of the outputs for each option in the order the options appear in the db2pd call. Also, note the two additional options at the end of the db2pd call:
  • -file indicates that db2pd output should be written to a file. In the sample call, output is written to file db2pd.out.
  • -repeat indicates that db2pd should be executed every 15 seconds for 40 times (that means for 10 minutes in intervals of 15 seconds). The output of each execution is appended to the end of the file specified by the -file option.
The -file and -repeat options are useful to monitor database activity for a certain period of time. For lockwait analysis, those options help to catch lockwait situations that only exist for a short period of time. For example, if the database parameterLOCKWAIT is set to 20 seconds, a transaction waiting for a lock is rolled back after 20 seconds of wait time. To catch such a lockwait situation, the db2pd interval must be set to an interval shorter than 20 seconds, such as 15 seconds in the example.
Catching locktimeouts that occur only sporadically
Sometimes lockwait situations lead to locktimeouts that cause transactions to be rolled back. The period of time until a lockwait leads to a locktimeout is specified by the database configuration parameter LOCKTIMEOUT. One of the biggest problems concerning locktimeout analysis is the fact that it is not known when the next locktimeout situation will occur. For catching deadlocks, you can create an event monitor for deadlocks. Such a deadlock event monitor writes an entry each time a deadlock occurs. But there is no equivalent event monitor for locktimeouts. So until DB2 9®, the only way to catch a locktimeout was by continuous db2pd or snapshot monitoring (for db2pd, the -file and -repeat options can be used for continuous lock monitoring as explained earlier).
DB2 9 comes with a new mechanism to collect monitor data when database failures or events occur: the db2cos script. For catching locktimeout events, we can configure the database to start the db2cos script each time a locktimeout occurs. Within thedb2cos script, db2pd can be called with the same options as already discussed. Let’s walk through a sample scenario demonstrating the use of the db2cos script to catch locktimeouts.
For this scenario, assume the DBA set the database locktimeout value to 10 seconds:
Listing 14. Update locktimeout setting

To start the db2cos script each time a locktimeout occurs, the DBA calls the db2pdcfg utility as follows:
Listing 15. Configure db2cos script invocation using db2pdcfg 
db2pdcfg -catch locktimeout count=1

The -catch option specifies the failure or event that should automatically lead to an invocation of the db2cos script. For locktimeout events, the string locktimeout can be specified. As an alternative, you could specify the corresponding SQL error code and reason code for locktimeouts:
Listing 16. Alternative db2pdcfg call for catching locktimeouts
db2pdcfg -catch 911,68 count=1

Besides certain string values and SQL codes, db2pdcfg also accepts internal DB2 error codes. So there are quite a lot database failures and events that you can catch this way. Locktimeout events are only one example of using of db2pdcfg anddb2cos.
The value 1 for the count suboption indicates that the db2cos script should only be executed once a locktimeout event occurs.
db2pdcfg confirms the setting of the error catch with the following output:
Listing 17. db2pdcfg‘s confirmation of error catch setting
Error Catch #1
   Sqlcode:        0
   ReasonCode:     0
   ZRC:            -2146435004
   ECF:            0
   Component ID:   0
   LockName:       Not Set
   LockType:       Not Set
   Current Count:  0
   Max Count:      1
   Bitmap:         0x4A1
   Action:         Error code catch flag enabled
   Action:         Execute sqllib/db2cos callout script
   Action:         Produce stack trace in db2diag.log

The setting of the error catch is also reported in the db2diag.log. Instead of opening the db2diag.log in a text editor, thedb2diag.log file can be filtered using the db2diag utility (a useful utility for examining the contents of the db2diag.log):
Listing 18. Confirmation of error catch setting in db2diag.log 
db2diag -g funcname:=pdErrorCatch

2006-12-18- I727480H285        LEVEL: Event
PID     : 4648                 TID  : 3948        PROC : db2syscs.exe
INSTANCE: DB2                  NODE : 000
FUNCTION: DB2 UDB, RAS/PD component, pdErrorCatch, probe:30
START   : Error catch set for ZRC -2146435004

ZRC -2146435004 is the DB2 internal error code for locktimeouts. These can be checked with the following db2diag call:
Listing 19. Check meaning of DB2 internal error codes using db2diag 
db2diag -rc -2146435004

Using db2pdcfg, the database engine has now been configured to call the db2cos script each time a locktimeout occurs. Thedb2cos script gathers all the required monitor information to determine the cause of the locktimeout. To do so, the DBA has to modify the db2cos script to call db2pd with the options already known. The db2cos script can be found in the following subdirectory:
  • Windows: DB2 install directory\BIN\db2cos.bat, for example C:\Program Files\IBM\SQLLIB\BIN\db2cos.bat
  • UNIX/Linux: Instance owner home/sqllib/bin/db2cos
The default db2cos.bat script on Microsoft Windows® looks like this:
Listing 20. Contents of default db2cos.bat on Windows


if %0. == . goto iterdone
   if /i %0. == INSTANCE. set INSTANCE=%1
   if /i %0. == DATABASE. set DATABASE=%1
   if /i %0. == TIMESTAMP. set TIMESTAMP=%1
   if /i %0. == APPID. set APPID=%1
   if /i %0. == PID. set PID=%1
   if /i %0. == TID. set TID=%1
   if /i %0. == DBPART. set DBPART=%1
   if /i %0. == PROBE. set PROBE=%1
   if /i %0. == FUNCTION. set FUNCTION=%1
   if /i %0. == REASON. set REASON=%1
   if /i %0. == DESCRIPTION. set DESCRIPTION=%1
   if /i %0. == DiAGPATH. set DIAGPATH=%1
goto iterargs


if %DATABASE%. == . goto no_database
   db2pd -db %DATABASE% -inst >> %DIAGPATH%\db2cos%PID%%TID%.%DBPART%
   goto exit

   db2pd -inst >> %DIAGPATH%\db2cos%PID%%TID%.%DBPART%


In the case of an event or failure at the database level, the default db2cos script calls db2pd with the -db and -inst options. The DBA replaces the corresponding line with a db2pd call that collects the monitor data required for locktimeout analysis:
Listing 21. Change db2cos script to collect data for locktimeout analysis
if %DATABASE%. == . goto no_database
   db2pd -db %DATABASE% -locks wait -transactions -agents -applications -dynamic
         >> %DIAGPATH%\db2cos%PID%%TID%.%DBPART%
   goto exit

Now the db2cos script is prepared and the DBA can wait for the next locktimeout event to occur.
Let’s assume the same lock situation, between user A and user B as described before, happens. But this time LOCKTIMEOUT is set and therefore the transaction of user B is rolled back after 10 seconds (LOCKTIMEOUT = 10). User B notifies the DBA that his or her transaction was rolled back with SQL error message -911 and reason code 68 (SQL code -911 / reason code 68 = locktimeout). In turn, the DBA checks the monitor data that was gathered by the automatic invocation of the db2cos script.
First, the DBA calls db2diag with the internal error code for locktimeout to determine the exact time the locktimeout happened:
Listing 22. Check point-in-time for locktimeout event in db2diag.log 
db2diag -g data:=-2146435004

2006-12-18- I6857H409          LEVEL: Event
PID     : 2968                 TID  : 2932        PROC : db2syscs.exe
INSTANCE: DB2                  NODE : 000         DB   : SAMPLE
APPHDL  : 0-21                 APPID: *LOCAL.DB2.061226132544
FUNCTION: DB2 UDB, lock manager, sqlplnfd, probe:999
DATA #1 : <preformatted>
Caught rc -2146435004.  Dumping stack trace.

The db2diag.log entry shows that a locktimeout occurred at 2006-12-18- As the db2cos script writes its output to a file named db2cos%PID%%TID%.%DBPART% in the %DIAGPATH%, the DBA can expect to find a file db2cos29682932.0in the instance’s diagnostic path:
  • %DIAGPATH% = instance’s diagnostic path = on Windows by default C:\Program Files\IBM\SQLLIB\DB2
  • %PID% = process ID = 2968 (as shown in the db2diag.log entry)
  • %TID% = thread ID = 2932 (also shown in the db2diag.log entry)
  • %DBPART% = database partition = 0 (in a non-partitioned database environment)
The contents of that file will resemble the db2pd monitor output examined step-by-step in the first part of this article and will allow the DBA to identify the cause of the locktimeout.
Having caught the locktimeout, the DBA can disable the db2cos script invocation by calling db2pdcfg with the -catch clearoption:
Listing 23. Clear error catch settings using db2pdcfg again
db2pdcfg -catch clear

All error catch flag settings cleared.
This article demonstrates how to use the db2pd utility for lockwait monitoring. Going through a sample scenario, it shows how a DBA can identify the cause for concurrency problems by examining the output of different db2pd options. Starting with DB2 9,db2pd can be used in conjunction with the new db2cos script to catch locktimeout events as they happen. You learned how to configure automatic invocation of the db2cos script in the case of locktimeout events. The db2diag utility was introduced — a useful tool for checking the contents of the db2diag.log.

No comments:

Post a Comment