Monday, October 17, 2011

Unlocking the secrets surrounding the locking behavior of the DB2 UDB data movement utilities

Summary:  The DB2 Universal Database data movement utilities -- export, import, and load -- support various levels of concurrency, which is primarily enforced through the placement of different types of locks on resources such as database tables or rows. This article describes the locking behaviour of these utilities and shows you, through working examples, how to use tools such as the DB2 UDB snapshot monitor to view and acquire a better understanding of this behavior.

To prevent uncontrolled data access, and provide concurrency control, the IBM® DB2® Universal DatabaseTM (DB2 UDB) database manager places locks on database resources such as table rows or entire tables. A lock associates the resource with an application, called the lock owner, to control how other applications can access the same resource. In short, locking is the means by which data integrity can be maintained when two or more application processes simultaneously request access to the same data.
Locks prevent uncommitted changes made by one application process from being accessed by other processes. The locks are released when the process ends or when a commit operation releases locks that were acquired during a unit of work (UOW). A UOW is a recoverable sequence of operations within an application process; it ends as a result of a commit operation, a rollback operation, or the end of the application process.
The principal DB2 UDB data movement utilities are export, import, and load. These utilities can be used to move data between databases. The export utility copies data from database tables to files, and those files can then be used to import or load the data into another database table.

Isolation levels

The isolation level that is associated with an application process defines the degree to which that process is isolated from other concurrently executing application processes. The isolation level is specified as an attribute of the package that the application process uses. A package is a database object that is created during execution of the BIND, PRECOMPILE, or REBIND command; it contains output from the SQL compiler and is used to execute SQL statements. An appropriate isolation level can be specified when you precompile (prepare) an application, or when you bind an application to a database.
  • When an application is precompiled, embedded SQL statements are replaced with source code that can be submitted to the SQL compiler.
  • When an application is bound to a database, output from the SQL compiler is converted into a package.
DB2 UDB supports the following isolation levels:
  • Repeatable read (RR)ensures that any table row read during a unit of work (UOW) is not changed by other application processes until the UOW completes. Similarly, any row changed by another application process cannot be read until it is committed by that application process. An application process running at the RR level is completely isolated from the effects of concurrent application processes.
  • Read stability (RS) is similar to RR. However, an application process running at the RS level is not completely isolated from the effects of concurrent application processes. If such an application process issues the same query more than once, it might see changed data or new ("phantom") rows that have been added by other application processes.
  • Cursor stability (CS) also ensures that any row changed by another application process cannot be read until it is committed by that application process. However, the CS isolation level only ensures that the current row of every updatable cursor is not changed by other application processes; rows that were read during a UOW can be changed by other application processes.
  • Uncommitted read (UR) allows, for certain operations, any row read during a UOW to be changed by other application processes, and any row changed by another application process to be read, even if that change has not been committed. For other operations, UR is similar to CS.
Prior to Version 8, the DB2 UDB load utility needed exclusive access to the table spaces that were associated with the target table. That is no longer the case. As of Version 8, the load utility places locks on the table itself, and you can enjoy concurrent access to other table objects in the same table spaces while a data load operation is running. In fact, the load utility, which is much faster than the import utility, can still take quite a long time to run when huge amounts (gigabytes or more) of data are being loaded. In circumstances such as this, concurrent access to pre-existing data in target tables of load operations is very important.
This article outlines the locking behavior of the DB2 UDB data movement utilities, and provides an overview of how concurrent access to data is managed while these utilities are running.
The DB2 UDB export utility writes data from a database table to one or more files that are stored outside of the database. The export utility uses SQL to read the data from the table. The files to which the export utility writes data can be in one of several standard file formats, most commonly PC/IXF or DEL. The format of a PC/IXF file is a version of an adaptation of the Integration Exchange Format (IXF) data interchange architecture. A delimited ASCII (DEL) file is a stream of ASCII characters; rows in the data stream are separated by row delimiters, and column values within a row are separated by column delimiters. For more information about these file formats, see Exploit file type modifiers that are built into the DB2 data movement utilities. Besides specifying the format of the data that is to be written to the output file, you must also provide the name of that file, and an SQL SELECT statement to specify the data that is to be exported. That is all the information that is required to initiate a basic DB2 export operation. Figure 1 shows a partial syntax diagram for the EXPORT command that includes only the required parameters.
A DB2 export operation requires a database connection. Other applications that want to access data in the same table concurrently do not need to disconnect from the database, and are free to pursue read or write operations in that table. That is because the export utility is bound with the cursor stability (CS) isolation level (for a description of the DB2 isolation levels, see the sidebar). During an export operation, row-level locking is applied to the exporting table, which means that only one lock is held at a time. That lock is the row lock that is associated with the row being exported. When the isolation level is cursor stability, the lock is released as the cursor moves from row to row.

Figure 1. Partial syntax of the DB2 EXPORT command
EXPORT command
We can demonstrate this quite easily, using the following scenario. We will use a delimited ASCII data file named staffdata.del. This file contains a substantial amount of data (22767 records), which is necessary for this type of exercise. Such a file can quickly be assembled by pasting many copies of exported data from the STAFF table (in the SAMPLE database that comes with DB2 UDB) into a text file, and then overwriting the ID (SMALLINT) column data with sequential numbers that have been generated in a spreadsheet. This ensures unique values in the ID column.
Suppose we were to create a new table, NEWSTAFF, using the table definition for the STAFF table. After connecting to the SAMPLE database, we create the NEWSTAFF table by issuing the following SQL statement: create table newstaff like staff. Using the DB2 IMPORT command, we then populate this new table with data contained in the large DEL file: import from staffdata.del of del modified by chardel"" coldel, decpt. insert into newstaff.
We are now ready to export the data from the NEWSTAFF table, and to simultaneously capture a database lock snapshot. The DB2 UDB snapshot monitor lets you capture information about the database and connected applications at the time the snapshot is taken. You can capture a snapshot from the command line processor (CLP) interface, from SQL table functions, or by using the snapshot monitor administrative application programming interface (API) functions in an application, such as a C or a C++ application. There are a number of snapshot request types, and each type returns its own set of monitor data. For example, a lock snapshot captures and returns database lock information.
We will execute two CLP scripts, one from each of two DB2 command windows (sessions). The first script will be referred to as Application 1, and the second script will be referred to as Application 2. Application 1 will execute the export operation and Application 2 will get a lock snapshot while Application 1 is running.
An example of the kind of output you can expect from this type of exercise is shown in Listing 1. The lock snapshot includes database summary information, application information and, of course, the lock information. Information is captured about every lock that was held at the time that the snapshot was taken. The one of interest here is the intent share (IS) lock held by the NEWSTAFF table. With this type of lock, concurrent applications can read from or write to the locked table.

Listing 1. A database lock snapshot reveals that IS locking is used during a data export operation
Application 1:

connect to sample

export to newstaffdata.del of del select * from newstaff

SQL3104N  The Export utility is beginning to export data to file "newstaffdata.del".
SQL3105N  The Export utility has finished exporting "22767" rows.

Number of rows exported: 22767

connect reset



Application 2:

connect to sample

get snapshot for locks on sample

            Database Lock Snapshot

...

List of Locks
...
 Lock Name                   = 0x0002000D000000000000000054
 Lock Attributes             = 0x00000000
 Release Flags               = 0x00000001
 Lock Count                  = 1
 Hold Count                  = 0
 Lock Object Name            = 13
 Object Type                 = Table
 Tablespace Name             = USERSPACE1
 Table Schema                = MELNYK
 Table Name                  = NEWSTAFF
 Mode                        = IS

connect reset

The DB2 UDB import utility writes data from an input file to a database table or updatable view. The import utility uses the SQL INSERT statement (or the SQL UPDATE statement) to write data to the target table. The file from which the import utility reads data can be in one of several standard file formats, most commonly PC/IXF, DEL, or ASC. A non-delimited ASCII (ASC) file is a stream of ASCII characters; rows in the data stream are separated by row delimiters, and each column within a row is defined by a beginning and ending location. For more information about these file formats, see Exploit file type modifiers that are built into the DB2 data movement utilities. Besides specifying the format of the data that is to be read from the input file, you must also provide the name of that file, the target table name, and the mode under which the import utility is to execute. That is all the information that is required to initiate a basic DB2 import operation. Figure 2 shows a partial syntax diagram for the IMPORT command that includes the required parameters.

Figure 2. Partial syntax of the DB2 IMPORT command
IMPORT command
The main DB2 import modes include:
  • INSERT adds the input data to the table without changing any existing data.
  • INSERT_UPDATE adds new input data to the table, or updates existing rows with input data that has matching primary key values.
  • REPLACE deletes all existing data from the table (by truncating the data object) and then inserts the input data.
  • REPLACE_CREATE deletes all data from an existing table and then inserts the input data. If the table does not exist, this mode creates the table prior to inserting the input data. This mode is only valid with PC/IXF input files.
By default, the import utility is bound with the read stability (RS) isolation level (for a description of the DB2 isolation levels, see the sidebar). A DB2 import operation requires a database connection. Other applications that want to access data in the same table concurrently do not need to disconnect from the database; such applications can have both read and write access to the target table if the ALLOW WRITE ACCESS clause has been specified on the IMPORT command. On the other hand, ALLOW NO ACCESS (the default) prevents concurrent applications from accessing target table data. The import utility accomplishes this by requesting an exclusive (X) lock on the table at the beginning of an ALLOW NO ACCESS import operation. Listing 2 shows part of a lock snapshot taken during such an import operation into a table named NEWSTAFF2, which was created to be like the NEWSTAFF table. The listing does not show information about locks that are not specific to an import operation. Some of these are locks held by system catalog objects, or are so-called internal variation or plan locks that are more generally related to insert or update operations.

Listing 2. A database lock snapshot reveals that X locking is used during an ALLOW NO ACCESS data import operation
Application 1:

connect to sample

import from newstaffdata.del of del modified by chardel"" coldel, decpt.
  insert into newstaff2
  
SQL3109N  The utility is beginning to load data from file "newstaffdata.del".
SQL3110N  The utility has completed processing.
          "22767" rows were read from the input file.

SQL3221W  ...Begin COMMIT WORK. Input Record Count = "22767".
SQL3222W  ...COMMIT of any database changes was successful.

SQL3149N  "22767" rows were processed from the input file.
          "22767" rows were successfully inserted into the table.
              "0" rows were rejected.

Number of rows read         = 22767
Number of rows skipped      = 0
Number of rows inserted     = 22767
Number of rows updated      = 0
Number of rows rejected     = 0
Number of rows committed    = 22767

connect reset



Application 2:

connect to sample

get snapshot for locks on sample

            Database Lock Snapshot

...

List of Locks
...

 Lock Name                   = 0x0002000E000000000000000054
 Lock Attributes             = 0x00000000
 Release Flags               = 0x40000000
 Lock Count                  = 255
 Hold Count                  = 0
 Lock Object Name            = 14
 Object Type                 = Table
 Tablespace Name             = USERSPACE1
 Table Schema                = MELNYK
 Table Name                  = NEWSTAFF2
 Mode                        = X

select lock_mode, lock_object_type, substr(table_name, 1, 16) as table_name
  from table(sysproc.snapshot_lock('SAMPLE',-1)) as lockinfo
  where lock_object_type in (1,2,4,5,6,11,18,19)

connect reset

Listing 2 also shows a sample query against the table that is returned by the SNAPSHOT_LOCK table function, an SQL administrative routine that provides convenient access to snapshot monitor information. The information in Table 1 is based on sample output from such a query. Lock modes and lock object types are defined in the sqlmon.h include file, which will help you to decipher the numerical equivalents that the table function returns.
Lockg ModeLock Object TypeTable Name
Next-key Share (NS)RowSYSTABLES
Intent Share (IS)TableSYSTABLES
Intent Share (IS)TableSYSCOLUMNS
Exclusive (X)TableNEWSTAFF2
In the case of an ALLOW WRITE ACCESS import operation (valid only in conjunction with the INSERT or the INSERT_UPDATE import mode), the import utility requests an intent exclusive (IX) lock on the target table when the first row of data is ready to be inserted. When a table holds an IX lock, the lock owner and concurrent applications can read and update data in that table.The import utility periodically commits inserted data (see Listing 3). Because the acquired lock does not persist across transaction boundaries, the import utility loses its IX table lock and must request another lock after each commit. If a concurrently connected application happens to be holding an incompatible row lock, the import utility pauses until that application commits or rolls back its transaction.

Listing 3. A database lock snapshot reveals that IX locking is used during an ALLOW WRITE ACCESS (INSERT_UPDATE) data import operation
Application 1:

connect to sample
create table newstaff like staff
alter table newstaff add primary key (id)

import from staff_data.del of del modified by chardel"" coldel, decpt.
  insert into newstaff
...
Number of rows committed    = 35

import from newstaffdata.del of del modified by chardel"" coldel, decpt.
  allow write access insert_update into newstaff

SQL3109N  The utility is beginning to load data from file "newstaffdata.del".
SQL3186W  Data was not loaded into the database, because the log was full or
the lock space was exhausted.  SQLCODE "-912" was returned.  A commit will be
attempted and the operation will continue if the commit is successful.
SQL0912N  The maximum number of lock requests has been reached for the
database.  SQLSTATE=57011
SQL3221W  ...Begin COMMIT WORK. Input Record Count = "1020".
SQL3222W  ...COMMIT of any database changes was successful.
...
SQL3186W
SQL0912N
SQL3221W  ...Begin COMMIT WORK. Input Record Count = "22356".
SQL3222W  ...COMMIT of any database changes was successful.

SQL3110N  The utility has completed processing.  "22767" rows were read from
the input file.
SQL3221W  ...Begin COMMIT WORK. Input Record Count = "22767".
SQL3222W  ...COMMIT of any database changes was successful.
...
Number of rows committed    = 22767

connect reset



Application 2:

connect to sample

select lock_mode, lock_object_type, substr(table_name, 1, 16) as table_name
  from table(sysproc.snapshot_lock('SAMPLE',-1)) as lockinfo
  where lock_object_type in (1,2,4,5,6,11,18,19)

connect reset

This example includes two import operations: one to populate the NEWSTAFF table with some data, and the other one to import more data in INSERT_UPDATE mode, which requires that a primary key be defined on the target table. Table 2, based on sample output from a query against the table that is returned by the SNAPSHOT_LOCK table function, shows that IX table locking is used during an ALLOW WRITE ACCESS data import operation.
Lockh ModeLock Object TypeTable Name
Next-key Share (NS)RowSYSTABLES
Exclusive (X)RowNEWSTAFF
Intent Share (IS)TableSYSTABLES
Intent Exclusive (IX)TableNEWSTAFF
The DB2 UDB load utility very efficiently writes data from one or more input files to a database table. The load utility also accepts input from named pipes, devices (such as tape devices), or user-defined cursors. The load utility, unlike the import utility, does not use SQL to write data to the target table; it writes formatted pages directly to the table and consequently, is much faster than the import utility. The file from which the load utility reads data can be in one of several standard file formats, most commonly PC/IXF, DEL, or ASC. For more information about these file formats.. Besides specifying the format of the data that is to be read from the input file, you must also provide the name of that file, the target table name, and the mode under which the load utility is to execute. INSERT mode adds the input data to the table without changing any existing data. REPLACE mode deletes all existing data from the table (by truncating the data object) and then inserts the input data. (There is also a RESTART mode and a TERMINATE mode, but we will not discuss them further here. For information about these modes.  That is all the information that is required to initiate a basic DB2 load operation. Figure 3 shows a partial syntax diagram for the LOAD command that includes the required parameters.

Figure 3. Partial syntax of the DB2 LOAD command
LOAD command
A DB2 load operation requires a database connection. Other applications that want to access data in the same table concurrently do not need to disconnect from the database; such applications can have read access to the target table if the ALLOW READ ACCESS clause has been specified on the LOAD command. The access is restricted to data that existed prior to the start of the load operation. On the other hand, ALLOW NO ACCESS (the default) prevents concurrent applications from accessing target table data. The load utility accomplishes this by requesting a super exclusive (Z) lock on the table and retaining this lock for the duration of the load operation (see Listing 4). When a table holds a Z lock, no concurrent application can read or update data in that table.

Listing 4. A database lock snapshot reveals that Z locking is used during an ALLOW NO ACCESS data load operation
Application 1:

connect to sample

load from newstaffdata.del of del modified by chardel"" coldel, decpt.
  insert into newstaff2

SQL3501W  The table space(s) in which the table resides will not be placed in
backup pending state since forward recovery is disabled for the database.

SQL3109N  The utility is beginning to load data from file
"/home/melnyk/newstaffdata.del".

SQL3500W  The utility is beginning the "LOAD" phase at time "12/07/2004
17:39:18.311340".

SQL3519W  Begin Load Consistency Point. Input record count = "0".

SQL3520W  Load Consistency Point was successful.

SQL3110N  The utility has completed processing.  "22767" rows were read from
the input file.

SQL3519W  Begin Load Consistency Point. Input record count = "22767".

SQL3520W  Load Consistency Point was successful.

SQL3515W  The utility has finished the "LOAD" phase at time "12/07/2004
17:39:19.630882".

Number of rows read         = 22767
Number of rows skipped      = 0
Number of rows loaded       = 22767
Number of rows rejected     = 0
Number of rows deleted      = 0
Number of rows committed    = 22767

connect reset



Application 2:

connect to sample

get snapshot for locks on sample

            Database Lock Snapshot

...

List of Locks
...

 Lock Name                   = 0x0002000E000000000000000054
 Lock Attributes             = 0x00000000
 Release Flags               = 0x40000000
 Lock Count                  = 255
 Hold Count                  = 0
 Lock Object Name            = 14
 Object Type                 = Table
 Tablespace Name             = USERSPACE1
 Table Schema                = MELNYK
 Table Name                  = NEWSTAFF2
 Mode                        = Z

connect reset

In the case of an ALLOW READ ACCESS load operation, the load utility maintains a share (S) lock for the duration of the load operation, and then upgrades the S lock to a Z lock when data is being committed. The S lock allows concurrent applications that use read-only queries to access the target table, but applications attempting to write data to the table are prevented from doing so. Of course, you cannot request a REPLACE and ALLOW READ ACCESS load operation simultaneously: Because a load replace operation truncates all existing table data before loading new data, there is no pre-existing data to query!
ALLOW READ ACCESS operations are great when you have a huge amount of data to load. Such operations can take a substantial amount of time, and you will appreciate having virtually uninterrupted access to pre-existing table data while the utility is running. Moreover, if an ALLOW READ ACCESS load operation fails or is aborted, pre-existing table data can still be read.
You can use the LOCK WITH FORCE option on the LOAD command to force applications holding conflicting locks on the target table off the system; in this way, your load operation avoids risking a timeout while waiting for a lock. At the start of an ALLOW NO ACCESS load operation, the utility may force concurrent applications that are trying to read from or write to the target table. At the start of an ALLOW READ ACCESS load operation, however, the utility may force applications that are trying to write to the table, and towards the end of the load operation (when data is being committed), the utility may force applications that are trying to read from or write to the target table.
Because locks are not persistent, the load utility also uses table states to control access to tables; these states remain in effect even if a load operation aborts. For more information about table states, see Demystifying table and table space states.

No comments:

Post a Comment