THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alexander Kuznetsov

Learning PostgreSql: READ COMMITTED and Data Integrity

As we have just discussed, READ COMMITTED isolation level behaves very much like Sql Server's READ_COMMITTED_SNAPSHOT. As such, we need to be very careful with data integrity - lots of code that just works on Sql Server under its default isolation level, READ COMMITTED, does not work on PostgreSql under its default isolation level, which is also READ COMMITTED, but behaves differently.

Note: whenever we use READ_COMMITTED_SNAPSHOT on Sql Server, we need to be just as careful with data integrity  - otherwise we can introduce lots of subtle bugs. Hugo Kornelis described these bugs in his blog series entitled "Snapshot isolation: a threat for integrity". Also I described the same effects in the book entitled "Defensive Database Programming". This is why we are going to discuss how to enforce data integrity when readers do not block writers.

I am well aware that there is a lot of advice to just enable READ_COMMITTED_SNAPSHOT and boost performance, without mentioning the side effects impacting data integrity. We need to be very careful with such advice: performance is important, of course, but I am usually working on systems where sacrificing data integrity to boost performance is not acceptable.

Demonstrating the problem.

Suppose that we have the following business rule: high priority tickets cannot be assigned to developers who are on vacation. Also suppose that to enforce this business rule, we are using subqueries whenever we modify data in Developers and Tickets tables. It does not matter if we add subqueries directly to our DML commands or wrap them in triggers - the problem stays the same.

Let us set up the tables and some test data:

CREATE TABLE Developers(
  
Name VARCHAR NOT NULL PRIMARY KEY,
  
Status VARCHAR NOT NULL
  );
INSERT INTO Developers(Name, Status)
VALUES('Jeff', 'Active');  
CREATE TABLE Tickets(
  
Problem VARCHAR NOT NULL PRIMARY KEY,
  
Priority VARCHAR NOT NULL,
  
AssignedTo VARCHAR NULL,
  
CONSTRAINT FK_devlopers FOREIGN KEY(AssignedTo)
  
REFERENCES Developers(Name),
  
CONSTRAINT CHK_high_priority_always_assigned 
    
CHECK((Priority='high' AND AssignedTo IS NOT NULL)
      OR 
Priority<>'high')
  );
INSERT INTO Tickets(Problem, Priority, AssignedTo)
VALUES('TPS report hangs','low','Jeff');

We are ready to reproduce the problem. In one session, let us begin a transaction, update Jeff's status to "Vacation", but not commit it yet:

BEGIN TRANSACTION;
UPDATE Developers SET Status='Vacation'
WHERE Name='Jeff'
AND NOT EXISTS(SELECT * FROM Tickets AS t 
WHERE t.AssignedTo = Developers.Name
   
AND t.Priority='high');

In another session, let us begin a transaction, increase the ticket's priority to high, but not commit it yet:

BEGIN TRANSACTION;
UPDATE 
Tickets SET Priority='high'
WHERE Problem='TPS report hangs'
AND (SELECT Status FROM Developers WHERE Name=Tickets.AssignedTo)<>'Vacation';

Apparently the update succeeds:

SELECT * FROM Tickets;

"TPS report hangs";"high";"Jeff"

The reason why the update succeeded is simple: this session does not see the uncommitted data from the other session, and it is not blocked, because writers do not block readers. This session read Jeff's status, which is still active:

SELECT * FROM Developers ;

"Jeff";"Active"

We can commit both transactions. Now we have data that violates our data integrity rules: a high priority ticket is assigned to a developer who is on vacation. Let us restore the original data:

UPDATE Developers SET Status='Active'
WHERE Name='Jeff';

UPDATE Tickets SET Priority='low'
WHERE Problem='TPS report hangs';

Let us try out a higher isolation level. Our attempts to use REPEATABLE READ and even SERIALIZABLE make no difference. However, after a short investigation, we realize that the following script works differently on PostgreSql and Sql Server:

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION;
On PostgreSql, the SET command does not affect the isolation level of the following transaction. 
The following section explains why.

SET TRANSACTION works differently

Unlike in SQL Server, a standalone SET TRANSACTION command has no effect whatsoever. The following script shows that:

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
select current_setting('transaction_isolation');

"read committed"

The reason is simple: in PostgreSql, SET TRANSACTION command affects only the current transaction. The standalone SET TRANSACTION runs in its own transaction, so when it completes, its transaction completes too, restoring the default isolation level. The correct way to set isolation level is as follows:

BEGIN TRANSACTION;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
select current_setting('transaction_isolation');    

"repeatable read"

Also we can use an equivalent, but more concise, method:

BEGIN ISOLATION LEVEL REPEATABLE READ;
select current_setting('transaction_isolation');    
"repeatable read"

Trying out higher isolation levels

Now that we know how to set isolation levels properly, let us rerun the same scenario under higher isolation levels.

Under REPEATABLE READ, we still get the same problem, and we can commit both transactions.

After restoring the original data, let us replay the scenario under SERIALIZABLE isolation level. As under lower isolation levels, both modifications complete - this is completely different from Sql Server, where one modification would be blocked. However, when we try to commit both transactions, only one will succeed, while another will fail with the following error message:

ERROR:  could not serialize access due to read/write dependencies among transactions

So, essentially PostreSql implementation is optimistic - it only detects conflicts at commit time.

Mimicking Sql Server's behavior

We can make sure that subqueries that enforce data integrity are blocked by writers, which mimics Sql Server's implementation of isolation levels. The following scripts shows how to do that - note the FOR UPDATE clause in the subquery. In one session, run this:

--restore original data
UPDATE Developers SET Status='Active'
WHERE Name='Jeff';

UPDATE Tickets SET Priority='low'
WHERE Problem='TPS report hangs'; 

-- enforce data integrity Sql Server style
BEGIN TRANSACTION;
UPDATE Developers SET Status='Vacation'
WHERE Name='Jeff'
AND NOT EXISTS(SELECT * FROM Tickets AS t 
WHERE t.AssignedTo = Developers.Name
   
AND t.Priority='high' FOR UPDATE); 

In another session, run this:

BEGIN TRANSACTION;
UPDATE Tickets SET Priority='high'
WHERE Problem='TPS report hangs'
AND (SELECT Status FROM Developers WHERE Name=Tickets.AssignedTo FOR UPDATE)<>'Vacation'; 

This command hangs, exactly like it does on SQL Server. Let us return to the first session and commit. The update in the second session immediately completes, but the ticket's priority is not updated:

SELECT * FROM Developers;
"Jeff""Vacation" 
SELECT * FROM Tickets;
"TPS report hangs""low""Jeff" 

As we have seen, FOR UPDATE clause allows us to replicate SQL Server's pessimistic implementation of isolation levels on PostgreSql.

Conclusion

As we have seen, PostgreSql implementation of isolation levels is substantially different. As such, we need to be very careful whenever we port DML that uses subqueries to enforce data integrity - otherwise we may end up with dirty data, just like when we turn on READ_COMMITTED_SNAPSHOT on Sql Server. There is much more about the differences in the implementation of isolation levels - we shall continue next time.

Note that we could use constraints to enforce this business rule and save ourselves a lot of trouble. I've described the solution in my free eBook "Defensive Database Programming", in the chapter entitled "Advanced Use of Constraints".

 

Published Monday, December 02, 2013 12:34 PM by Alexander Kuznetsov

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

tobi said:

The bug could still occur in SQL Server under true READ COMMITTED if both transactions happened to complete the validation part of the query while nothing was written/U-locked yet. I think that's possible. READ COMMITTED is just not the right isolation level for most cases where you need to think carefully about concurrency. It is too weak for almost everything.

December 3, 2013 4:00 AM
 

Alexander Kuznetsov said:

Tobi,

I tried your suggestion out, but could not reproduce. Can you post a repro?

December 3, 2013 12:27 PM
 

tobi said:

If its not possible for you to repro then I was probably mistaken.

December 3, 2013 12:57 PM
 

Bob Henkel said:

If data integrity is of utmost importnace try serial level 13.2.3. Serializable Isolation Level -> http://www.postgresql.org/docs/9.1/static/transaction-iso.html

While there are downsides this is the safest and easiest for keeping the data true.  Might be nice to compare SQL Server vs PostgreSQL as far as Serializable Isolation Level is concerned.

December 10, 2013 11:28 AM
 

Alexander Kuznetsov said:

Bob,

I concur: I do plan to compare SQL Server vs PostgreSQL as far as Serializable Isolation Level is concerned - but I plan to do the same thing for REPEATABLE READ first. Data integrity and bug-free development are really important for the project, so we shall analyze some of the differences in higher isolation levels in detail.

December 10, 2013 12:45 PM
 

Chip said:

Very clear article, thank you.

This command will set and keep the isolation level for a connection to PostgreSQL.

set session characteristics as transaction isolation level read committed;

February 20, 2014 8:44 AM

Leave a Comment

(required) 
(required) 
Submit

About Alexander Kuznetsov

Alex Kuznetsov has been working with object oriented languages, mostly C# and C++, as well as with databases for more than a decade. He has worked with Sybase, SQL Server, Oracle and DB2. He regularly blogs on sqlblog.com, mostly about database unit testing, defensive programming, and query optimization. Alex has written a book entitled "Defensive Database Programming with Transact-SQL" and several articles on simple-talk.com and devx.com. Currently he works as an agile developer.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement