Synchronizing data in real time is such a big problem to solve in business application. It’s difficult to ensure that the data is updated correctly or not in a “zombie” status (a status, that should never happens) because differences always exist between programming cases, unit test cases and real cases. Although we’re trying to cover all possible cases but unexpected problems still happen when the program runs with many users in different environments (location, internet speed, latency, network cable, hardware, how the user uses program…). It is an art of combination of programming technique and MS SQL Server administrative job so that even if an error happens, we can still hold ACID (Atomicity, Consistency, Isolation, Durability) properties of our databases and our programs don’t simply crash.
How this combination looks like, depends on balancing the data integrity requirements of the application against the overhead of each isolation level. Each program is different, we don’t have a solution for all of them. But I think if we understand how isolation level of MS SQL works, we can find out the best solution for our case. Let’s start with the concept of “transaction”. From Wikipedia,
A transaction comprises a unit of work performed within a database management system (or similar system) against a database, and treated in a coherent and reliable way independent of other transactions. Transactions in a database environment have two main purposes:
- To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure, when execution stops (completely or partially) and many operations upon a database remain uncompleted, with unclear status.
- To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the program’s outcome are possibly erroneous.
In this blog post, we will focus on the 2. purposes of transaction. I will use MS SQL server to make examples for isolation levels. I would like to emphasize that the explanation below is applied for MS SQL server. They maybe not correct for the other kinds of database server.
When you open a transaction, it will be specified with a default isolation level. But it doesn’t matter what kind of isolation level applied, a transaction isolation level doesn’t affect the locks acquired for protecting data modification. That means, regardless of isolation level was set, a transaction for updating data, always gets an exclusive lock on any data it modifies and keeps locking until modification completes. Therefore, an isolation level makes more sense for reading operations where it defines the level of protection from the effect of modification made by other transactions.
So a transaction isolation level controls:
– How long the read locks are held.
– Whether a read operation happening on rows modified by another transaction
- Blocks until the exclusive lock (from updating transaction) on the row is released.
- Retrieves the committed version of the row that existed at the time the statement or transaction started.
- Reads the uncommitted data modification.
Bases on cases of reading operations above, we have following transaction isolation levels (from low to high)
- Read uncommitted
- Read committed (default)
- Repeatable read
A lower isolation level increases not only the ability of many users to access data at the same time, but also increases the number of concurrency effects users might encounter (such as dirty reads or lost updates). Conversely, a higher isolation level reduces not only the types of concurrency effects that users may encounter, but also requires more system resources and increases the possibility that one transaction will block another (dead lock cases). Now end with description, we make some examples to understand how these isolation level work. In MS SQL server, create a new database call “isolationlevel” and table “Product”
USE [isolationlevel] CREATE TABLE [dbo].[Product]([ID] [int] IDENTITY(1,1) NOT NULL, [COL_KEY] [nvarchar](50) NULL,[COL_VALUE] [int] NULL) INSERT INTO [dbo].[Product]([COL_KEY],[COL_VALUE]) VALUES ('PlayStation4',400) INSERT INTO [dbo].[Product]([COL_KEY],[COL_VALUE]) VALUES ('SamsungGalaxyTab2',169) INSERT INTO [dbo].[Product]([COL_KEY],[COL_VALUE]) VALUES ('KindleFireHD',169)
1. Read uncommitted
Specifies that statements can read rows that have been modified by other transactions but not yet committed.
In Management Studio, open new query window and paste 1. SQL query to it
BEGIN TRAN UPDATE dbo.Product SET COL_VALUE = 401 WHERE COL_KEY = 'PlayStation4' WAITFOR DELAY '00:00:13' ROLLBACK
Open a new second query window and paste this 2. SQL query to it
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED SELECT * FROM dbo.Product
Execute the first SQL query and then execute immediately the second SQL query, you will see the concurrency effects of “Read uncommitted”.
The first query makes a rollback after 13 seconds setting value of “PlayStation4” to 400 but the second query doesn’t wait for this rollback. It just reads the current value in table which is 401. This value is exactly “dirty reads”, it never exists in the database. The 1. query has made a rollback of his changes, but 2. query doesn’t care about it. He just reads the current value in database.
“Read uncommitted” blocks nothing and can causes “dirty reads” (reading tentative data) or “lost updates”
2. Read committed
Specifies that statements cannot read data that has been modified but not committed by other transactions. This option is the SQL Server default. In compare to read uncommitted, this prevents dirty reads. However data can be changed by other transactions resulting in phantom data.
We use the example above again, but now change the isolation level to read committed.
SET TRANSACTION ISOLATION LEVEL READ COMMITTED SELECT * FROM dbo.Product
When we execute both queries (1. query first, then 2. query). You’ll see that the 2. query doesn’t return immediately but will wait until the 1. query finishes. As described above, “Read committed” has effect only on READ operation. The other operations like UPDATE, INSERT won’t be effected by this isolation level. For example, now we change the 1. query to
BEGIN TRAN UPDATE dbo.Product SET COL_VALUE = 401 WHERE ID > 2 WAITFOR DELAY '00:00:13' COMMIT SELECT * FROM dbo.Product WHERE ID > 2
and the 2. query to
SET TRANSACTION ISOLATION LEVEL READ COMMITTED INSERT INTO [dbo].[Product]([COL_KEY],[COL_VALUE]) VALUES ('SeagateDrive',74)
Executing 2 queries after each other, you see the 2. query returns immediately and INSERT operation was executed without waiting.
After 1. query, we expect that all records with id greater than 2, are updated. But the 2. query at that time inserts new record with inappropriate isolation level and therefore creates phantom data.
“Read committed” blocks READ and can cause the Non-Repeatable Read (reading changed data) or phantom data.
3. Repeatable read
Specifies that statements cannot read data that has been modified but not yet committed by other transactions and that no other transactions can modify data that has been read by the current transaction until the current transaction completes.
For this example, the 1. query looks like following
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ BEGIN TRAN SELECT * FROM dbo.Product WAITFOR DELAY '00:00:13' SELECT * FROM dbo.Product COMMIT
and the 2. query is
UPDATE dbo.Product SET [COL_VALUE] = 403 WHERE ID>2 SELECT * FROM dbo.Product
By the effect of isolation level, the 1. query has completely locked the data range from any UPDATE operations. The 2. query has to wait until the 1. query finishes so that he can start.
As described in definition, this isolation level will hold data integrity in current transaction from modifying (UPDATE) by other transactions. However, INSERT or DELETE operations are still permitted. Phantom reads is possible at this isolation level. Although, modifying any rows that have been read by the current transaction are prevented, other transactions can insert new rows that match the search conditions of statements in current transaction. If the current transaction executes the statement again, it will retrieve the new rows, which results in phantom reads.
“Repeatable Read” blocks READ + UPDATE and can cause “phantom reads”.
Specifies that data read by any statement in a transaction will be the transactionally consistent version of the data that existed at the start of the transaction.
This isolation level uses row versioning to keep data safe from modifying by other transactions with INSERT/DELETE. At the start of current transaction, all data modifications will be recognized and current status of rows will be stored in tempdb (which costs of course more resources). Data modifications made by other transactions after the start of the current transaction are not visible to statements executing in the current transaction. The effect is as if the statements in a transaction get a snapshot of the committed data as it existed at the start of the transaction.
“Snapshot” blocks nothing and uses row versioning.
is highest isolation level of MS SQL server and specifies the following:
- Statements cannot read data that has been modified but not yet committed by other transactions.
- No other transactions can modify data that has been read by the current transaction until the current transaction completes.
- Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes.
In short, this isolation level will block any operation on the queried data.
For example, the 1. query is
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE BEGIN TRAN SELECT * FROM dbo.Product WAITFOR DELAY '00:00:13' SELECT * FROM dbo.Product COMMIT
and the 2. query is
INSERT INTO [dbo].[Product]([COL_KEY],[COL_VALUE]) VALUES ('CanonEOS',519)
You see that the 2. query has to wait until 1. query finishes. You can try with DELETE operation.
“Serializable” blocks all operations of other transactions.
I hope that this post helps you to understand more about how isolation levels work in MS SQL server and find a consistent solution for your application when you have to handle many users accessing/modifying same data resources.
Note: The italic paragraphs (mostly definition) was quoted from Wikipedia or Microsoft website.