THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alexander Kuznetsov

  • Learning PostgreSql: bulk loading data

    In this post we shall start loading data in bulk.

    For better performance of inserts, we shall load data into a table without constraints and indexes. This sounds familiar. There is a bulk copy utility, and it is very easy to invoke from C#. The following code feeds the output from a T-SQL stored procedure into a PostgreSql table:

    using (var pgTableTarget = new PgTableTarget(PgConnString, "Data.MyPgTable", GetColumns()))
    using (var conn = new SqlConnection(connectionString))
    {
       conn.Open
    ();
       
    using (var command = conn.CreateCommand())
       
    {
           command.CommandText 
    = "EXEC MyStoredProc";
           
    command.CommandType = CommandType.Text;
           
    command.CommandTimeout = 0;
           
    using (var dr = command.ExecuteReader())
           
    {
               var columnTypes 
    = SetFields(dr.GetSchemaTable());
               
    var adapter = new SqlDataReaderToPgTableAdapter(pgTableTarget);
               
    while (dr.Read())
               
    {
                   
    for (var columnIndex = 0; columnIndex < columnTypes.Count; columnIndex++)
                   
    {
                       adapter.AddValue
    (dataReader: dr, columnIndex: columnIndex, columnType: columnTypes[columnIndex]);
                   
    }
                   pgTableTarget.EndRow
    ();
               
    }
           }
       }
    }

    (snip)

    private const int TYPE = 24;

    internal static List<string> SetFields(DataTable schema)
    {
       
    return (from DataRow dataRow in schema.Rows select dataRow[TYPE].ToString().ToLower()).ToList();
    }
     

    This code uses a couple of helper classes we developed ourselves. Here is a simple wrapper around COPY utility:

    using System;
    using System.Collections.Generic;
    using System.Data;
    using Npgsql;

    namespace Drw.Qr.FinDb.PostgresDataLoad
    {
        
    public class PgTableTarget : IDisposable
        {
            
    private readonly Npgsql.NpgsqlConnection _conn;
            
    private readonly NpgsqlCommand _command;
            
    private readonly NpgsqlCopySerializer _serializer;
            
    private readonly NpgsqlCopyIn _copyIn;

            
    public PgTableTarget(string connString, string tableName, IEnumerable<string> columns)
            
    {
                _conn 
    = new NpgsqlConnection(connString);
                
    _conn.Open();
                
    _command = _conn.CreateCommand();
                
    var copyStr = string.Format("COPY {0}({1}) FROM STDIN", tableName, string.Join(",", columns));
                
    _command.CommandText = copyStr;
                
    _command.CommandType = CommandType.Text;
                
    _serializer = new NpgsqlCopySerializer(_conn);
                
    _copyIn = new NpgsqlCopyIn(_command, _conn, _serializer.ToStream);
                
    _copyIn.Start();
            
    }

            
    public void AddString(string value)
            
    {
                _serializer.AddString
    (value);
            
    }

            
    public void AddNull()
            
    {
                _serializer.AddNull
    ();
            
    }

            
    public void AddInt32(int value)
            
    {
                _serializer.AddInt32
    (value);
            
    }

            
    public void AddNumber(double value)
            
    {
                _serializer.AddNumber
    (value);
            
    }

            
    public void EndRow()
            
    {
                _serializer.EndRow
    ();
                
    _serializer.Flush();
            
    }

            
    public void Dispose()
            
    {
                _copyIn.End
    ();
                
    _serializer.Flush();
                
    _serializer.Close();
                
    _command.Dispose();
                
    _conn.Dispose();
            
    }
        }
    }
      

    The following code is a straightforward adapter:


    using System;
    using System.Data.SqlClient;

    namespace Drw.Qr.FinDb.PostgresDataLoad
    {
        
    public class SqlDataReaderToPgTableAdapter
        {
            
    private readonly PgTableTarget _pgTableTarget;

            
    public SqlDataReaderToPgTableAdapter(PgTableTarget pgTableTarget)
            
    {
                _pgTableTarget 
    = pgTableTarget;            
            
    }

            
    public void AddValue(SqlDataReader dataReader, 
                                 
    int columnIndex, string columnType)
            
    {
                
    if (dataReader.IsDBNull(columnIndex))
                
    {
                    _pgTableTarget.AddNull
    ();
                    
    return;
                
    }
                
    switch (columnType)
                
    {
                    
    case "varchar":
                    
    case "char":
                        
    _pgTableTarget.AddString(dataReader.GetString(columnIndex));
                        
    break;
                    
    case "decimal":
                        
    _pgTableTarget.AddNumber((double)dataReader.GetDecimal(columnIndex));
                        
    break;
                    
    case "int":
                        
    _pgTableTarget.AddInt32(dataReader.GetInt32(columnIndex));
                        
    break;
                    
    case "smallint":
                        
    _pgTableTarget.AddInt32(dataReader.GetInt16(columnIndex));
                        
    break;
                    
    case "real":
                        
    _pgTableTarget.AddNumber(dataReader.GetFloat(columnIndex));
                        
    break;
                    
    case "float":
                        
    _pgTableTarget.AddNumber(dataReader.GetDouble(columnIndex));
                        
    break;
                    
    default:
                        
    throw new ArgumentException("Not supported type: " + columnType);
                
    }            
            }
        }
    }
      

    Although both classes do not support all the available types, they do support all the types we need for this small project.

    As I was typing this post, the code already moved over several million rows.

  • Learning PostgreSql: Functions and refcursors

    In this post we shall create a function that returns data, and invoke it from our C# client. There are no stored procedures in PostgreSql, only functions. This is different from T-SQL, but consistent with many other languages, such as C#.

    Creating a function

    Functions can return many different types. Learning all the available options might take some time.

    However, for the project we are working on, we need to replicate several T-SQL stored procedures which take column list as a parameter, and use dynamic SQL to return requested columns. So, we need the ability to return a result set from PostgreSql without having to specify exactly its structure. Because refcursors fit the bill, we shall learn how to use them, and that should be enough for now.

    The following function returns a result set, although it does not use dynamic SQL yet:

    CREATE OR REPLACE FUNCTION public.SelectTest (pTicker VARCHAR)
    RETURNS refcursor AS
    $body$
    DECLARE 
      
    ref1 refcursor;
    BEGIN

    OPEN 
    ref1 FOR 
    select 
    * from Prices WHERE Ticker=pTicker;
    RETURN ref1;

    END;
    $body$
    LANGUAGE 
    'plpgsql'

    This function explicitly returns a result set. It is not enough to just issue a SELECT command. Again, this is different from T-SQL, but consistent with many object-oriented languages.

    Calling the function from C#

    Let us invoke our function from C#:

    [Test, Explicit]
    public void ReadFromProc2()
    {
       
    using (var conn = new Npgsql.NpgsqlConnection(ConnectionString: ConnString))
       
    {
           conn.Open
    ();
           
    using (var tran = conn.BeginTransaction())
           
    using (var command = conn.CreateCommand())
           
    {
               command.CommandText 
    = "SelectTest";
               
    command.CommandType = CommandType.StoredProcedure;
               
    command.Parameters.Add(new NpgsqlParameter());
               
    command.Parameters[0].NpgsqlDbType = NpgsqlDbType.Varchar;
               
    command.Parameters[0].Value = "AAPL";
               
    using (var dr = command.ExecuteReader())
               
    {
                   var rc 
    = 0;
                   
    while (dr.Read())
                   
    {
                       Console.WriteLine
    ((rc++) + " " + dr.GetString(0) + " " + dr.GetDateTime(1) + " " + dr.GetDecimal(2));
                   
    }
               }
           }
       }
    }

    Note: this code snippet does not handle null values. This is done to keep the example short.

    Note:  it is essential to keep the transaction open for all the time we are consuming the data.This is documented here: http://npgsql.projects.pgfoundry.org/docs/manual/UserManual.html

    Using dynamic SQL

    We need to make sure we can call from C# functions that use dynamic SQL - that is part of the functionality we are going to replicate. The following function uses a dynamic column list:

    CREATE OR REPLACE FUNCTION public.SelectTestWithColumnList (Ticker VARCHAR, ColumnList VARCHAR)
    RETURNS refcursor AS
    $body$
    DECLARE 
      
    ref1 refcursor;
    BEGIN

    OPEN 
    ref1 FOR EXECUTE 
    'select ' || ColumnList || ' from Prices WHERE Ticker=$1;'
    USING     Ticker;
    RETURN ref1;

    END;
    $body$
    LANGUAGE 
    'plpgsql'

    Of course, this is all completely different from T-SQL.

    The following code calls this function from C#:

     [Test, Explicit]
    public void ReadFromProcWithColumnList()
    {
       
    using (var conn = new Npgsql.NpgsqlConnection(ConnectionString: ConnString))
       
    {
           conn.Open
    ();
           
    using (var tran = conn.BeginTransaction())
           
    using (var command = conn.CreateCommand())
           
    {
               command.CommandText 
    = "SelectTestWithColumnList";
               
    command.CommandType = CommandType.StoredProcedure;
               
    command.Parameters.Add(new NpgsqlParameter());
               
    command.Parameters[0].NpgsqlDbType = NpgsqlDbType.Varchar;
               
    command.Parameters[0].Value = "AAPL";
               
    command.Parameters.Add(new NpgsqlParameter());
               
    command.Parameters[1].NpgsqlDbType = NpgsqlDbType.Varchar;
               
    command.Parameters[1].Value = "AsOfDate,Price";
               
    using (var dr = command.ExecuteReader())
               
    {
                   var rc 
    = 0;
                   
    while (dr.Read())
                   
    {
                       Console.WriteLine
    ((rc++) + " " + dr.GetDateTime(0) + " " + dr.GetDecimal(1));
                   
    }
               }
           }
       }
    }

    This code works as expected.

    Next steps

    There is much more to be learned about functions. We shall get back to it later.

    However, we want to move over the data to our new database. We need some meaningful data to practice with. While the data is being migrated, we shall continue our learning.

  • Learning PostgreSql: Reading and Writing From .Net

    In this post we shall do some setup tasks, save a few rows of data from a .Net client to PostgreSql, and read it back.

    Setting up

    We have set up a virtual machine running Red Hat Linux, installed PostgreSql 9.3 on it, and made sure there is enough disk space. 9.3 is a very recent version, released this September. Because PostgreSqlis not known for releasing before the full testing is complete, we did not have to wait for the next service pack or something like that.

    Smoke test

    On the client machine running Windows 7, we installed SQL Manager Lite, which is a lightweight GUI for simple tasks. We had no problems installing it and connecting to the server. Our first query was cut and pasted from the following article: "Postgres SQL Injection Cheat Sheet".

    We ran this:

    SELECT version()

    followed by this

    SELECT 1;

    Creating a table

    The following script was issued from SQL Manager Lite:

    CREATE TABLE prices (
      
    ticker VARCHAR(20),
      
    asofdate TIMESTAMP WITHOUT TIME ZONE,
      
    price NUMERIC(6,2)
    ) 

    Note: the default schema in PostgreSql is named public, not dbo.

    Connecting from .Net

    We have chosen to use a free open source .Net provider named  Npgsql, documented here: http://npgsql.projects.pgfoundry.org/docs/manual/UserManual.html. We have reviewed the source code (it was developed in C#) and built the binaries for our project.

    The following code wrote a few rows into our table:

            private const string ConnString = "Server=myserver;Port=5432;Database=AkTest;User Id=myuser;Password=mypassword";

            
    [Test, Explicit]
            
    public void WriteTest()
            
    {
                
    using(var conn = new Npgsql.NpgsqlConnection(ConnectionString: ConnString))
                
    {
                    conn.Open
    ();
                    
    using(var command = conn.CreateCommand())
                    
    {
                        command.CommandText 
    =
                            
    "INSERT INTO Prices(Ticker, AsOfDate, Price)VALUES('GOOG', '2013-10-23',1001.00),('AAPL', '2013-10-23',517.49)";
                        
    command.CommandType = CommandType.Text;
                        
    command.ExecuteNonQuery();
                    
    }
                }
            }

    We were able to select these rows from SQL Manager Lite. The following code read these rows from .Net:

           [Test, Explicit]
            
    public void ReadTest()
            
    {
                
    using (var conn = new Npgsql.NpgsqlConnection(ConnectionString: ConnString))
                
    {
                    conn.Open
    ();
                    
    using (var command = conn.CreateCommand())
                    
    {
                        command.CommandText 
    =
                            
    "SELECT Ticker, AsOfDate, Price FROM Prices ORDER BY Ticker, AsOfDate;";
                        
    command.CommandType = CommandType.Text;
                        
    using(var dr = command.ExecuteReader())
                            
    while (dr.Read())
                            
    {
                                Console.WriteLine
    (string.Format("Ticker: {0}, AsOfDate: {1}, Price: {2}", dr.GetString(0), dr.GetTimeStamp(1), dr.GetDecimal(2)));
                            
    }
                    }
                }
            }

    In the next post we shall create a stored procedure which reads data, and invoke it from our C# client.

  • Learning PostgreSql: First Steps

    In this series of blog posts we shall migrate some functionality from SQL Server to PostgreSql 9.2. The emphasis of these blog posts will be on what PostgreSql does differently from Sql Server - I assume that the reader has considerable knowledge of Sql Server, but might know nothing of PostgreSql. Also we shall concentrate on development, not administration. In a true agile fashion, we shall learn only what we need to get this particular job done, and nothing else, but we shall strive to learn it very well.

    The reason for migration

    Prior to SQL Server 2012, in many cases it used to cheaper to add more memory to solve SQL Server performance problems, and not to spend too much time tuning queries. That was especially true for agile development: if we retire or modify a query, all the effort invested in its tuning may be lost, but the investment in hardware is not lost - the hardware used to speed up obsolete queries stays with us.

    So I was wondering if it may be cheaper to port at least some functionality to PostgreSql rather than upgrade the whole system to Enterprise Edition of SQL Server 2012.

    I have a system which is a good candidate to try this idea out. I have a system that reads from just a few summary tables, using about a dozen stored procedures. These summary tables use up a considerable amount of storage, and they are not used by any other applications.

    So, we are going to migrate a few tables, rewrite just a few stored procedures, and redirect a considerable amount of workload off our SQL Server database to the new PostgreSql one.

    This is a very well-defined little project. We are not going to encounter the most difficult problem - discovering the requirements. All we need to do is replicate a working system on a new platform. As such, we shall be able to concentrate on easier problems, such as learning new technologies and ensuring stable and acceptable performance.

    In the next post we shall create a table, write to it from a client written in C#, and read the data back into a c# client.

  • Inline UDFs do not prevent parallel execution plans

    Lots of things can prevent parallel execution plans, but inline UDFs are not one of them. Inline UDFs are essentially macros - they are at least as fast as CTEs. As such, the optimizer can produce exactly the same parallel execution plans whether if our queries are wrapped in inline UDFs or copied and pasted into the main query.

    As usual, I will provide a repro script.

    Prerequisites

    Let us set up a table with 2 million rows - that should be enough to qualify for a parallel plan:


    CREATE TABLE dbo.Numbers(n INT NOT NULL PRIMARY KEY);
    GO

    INSERT INTO dbo.Numbers
        
    ( n )
            
    VALUES  ( 1 );
    GO
    DECLARE @i INT;
        
    SET @i=0;
    WHILE @i<21  
        
    BEGIN
        INSERT INTO
    dbo.Numbers
            
    ( n )
            
    SELECT n + POWER(2, @i)
            
    FROM dbo.Numbers;
        
    SET @i = @i + 1;
        
    END;  
    GO

    SELECT  n ,
            
    'asdfasdfas' AS Filler
    INTO    dbo.ParallelTest
    FROM    dbo.Numbers ;
    GO

    ALTER TABLE dbo.ParallelTest ADD CONSTRAINT PK_ParallelTest PRIMARY KEY(n) ;
    GO
     

    Running a query with parallel execution plan

    The following query runs with parallel execution plan (number of executions is 12):

    SET STATISTICS IO ON;
    SET STATISTICS TIME ON;
    GO
    SELECT COUNT(*) FROM dbo.ParallelTest
    WHERE Filler LIKE '%nosuchstring%'

    Let us wrap it in an inline UDF:

    CREATE FUNCTION dbo.MatchCountInParallelTestByPattern ( @Pattern VARCHAR(10) )
    RETURNS TABLE
    AS RETURN
      
    ( SELECT  COUNT(*) AS MatchCount
        
    FROM    dbo.ParallelTest
        
    WHERE   Filler LIKE @Pattern
      
    ) ;

    The following query invokes the inline UDF, yet still runs with exactly the same parallel execution plan (number of executions is the same: 12):


    SELECT  MatchCount
    FROM    dbo.MatchCountInParallelTestByPattern('%nosuchstring%') ;
     

    Conclusion

    At the time of this writing, reusing code wrapped in inline UDFs does not prevent parallel execution plans, at least on the following version:

    Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64)   Apr  2 2010 15:48:46   Copyright (c) Microsoft Corporation  Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7600: ) 

     

  • Using constraints to enforce uniqueness of ordered sets of rows

    Even the simplest data integrity rules are not easy to implement in the database if instead of individual rows we need to deal with groups or subsets.

    For example, making sure that a column value is unique in a table is as trivial as creating a unique constraint or index. However, what are we supposed to do if we do not want to store a group of values twice? What if we store cooking recipes as sequences of steps (sets of values), and we do not want to store one and the same recipe (one set of values) more than once?

    In the following sections we shall solve this problem using only constraints and indexed views, just for fun, as a SQL puzzle. (Of course, we could use a trigger or some other thing with subqueries - but that would be a different puzzle).

    At the time of this writing I have not yet used in production the technique I am going to describe. I think it is quite complex and looks tricky, and the reason is simple: in my opinion RDBMS have not been designed to deal with this kind of problems. As such, we have to come up with complex workarounds.

    Enforcing Uniqueness of Ordered Sets, of Limited Size Only

    In this solution we shall deal with sets that have at most five elements.

    In some cases, the order of elements in a subset matters. For example, the following two sequences of instructions yield different results.

    Cooking a hard-boiled egg:

     

    • Put eggs into pan, add water
    • Bring water to boiling, boil 4 minutes;
    • Remove eggs and cool them down in cold water

     

    Keeping the egg uncooked:

     

    • Put eggs into pan, add water
    • Remove eggs and cool them down in cold water
    • Bring water to boiling, boil 4 minutes

     

    These two recipes are clearly different, and they yield different results. Of course, the second recipe makes no sense, but we want the database to store them both.

    Suppose, however, that the database should not be able to store one and the same recipe more than once.

    Setting up tables and test data

    Before implementing this requirement, let us create the tables and add test data. Both recipes discussed above involve the same three steps:


    CREATE TABLE dbo.RecipeSteps
        
    (
          
    RecipeStepId INT NOT NULL ,
          
    Intsructions VARCHAR(50) NOT NULL ,
          
    CONSTRAINT PK_RecipeSteps PRIMARY KEY ( RecipeStepId ) 
        ) ;
    GO
    INSERT  INTO dbo.RecipeSteps
            
    ( RecipeStepId, Intsructions )
    VALUES  ( 1, 'Put eggs into pan, add water' ),
            ( 
    2, 'Bring water to boiling, boil 4 minutes' ),
            ( 
    3, 'Remove eggs and cool them down in cold water' ) ; 

     

    Let us upload upload the first recipe, as follows:

     

    CREATE TABLE dbo.RecipesStepByStep
        
    (
          
    RecipeId INT NOT NULL ,
          
    StepNumber SMALLINT NOT NULL ,
          
    RecipeStepId INT NOT NULL ,
          
    CONSTRAINT PK_RecipesStepByStep PRIMARY KEY ( RecipeId, StepNumber ) ,
          
    CONSTRAINT FK_RecipesStepByStep_RecipeSteps FOREIGN KEY ( RecipeStepId ) REFERENCES dbo.RecipeSteps ( RecipeStepId )
        ) ;
    GO

    INSERT  INTO dbo.RecipesStepByStep
            
    ( RecipeId, StepNumber, RecipeStepId )
    VALUES  ( 1, 1, 1 ),
            ( 
    1, 2, 2 ),
            ( 
    1, 3, 3 ) ;

    SELECT  StepNumber ,
            
    Intsructions
    FROM    dbo.RecipeSteps AS rs
            
    JOIN dbo.RecipesStepByStep AS sbs ON rs.RecipeStepId = sbs.RecipeStepId
    WHERE   sbs.RecipeId = 1 ;

    StepNumber Intsructions
    ---------- --------------------------------------------------
    1          Put eggs INTO pan, ADD water
    2          Bring water 
    TO boiling, boil 4 minutes
    3          Remove eggs 
    AND cool them down IN cold water

     

    Let us also add the second recipe:



    INSERT  INTO dbo.RecipesStepByStep
            
    ( RecipeId, StepNumber, RecipeStepId )
    VALUES  ( 2, 1, 1 ),
            ( 
    2, 2, 3 ),
            ( 
    2, 3, 2 ) ;

    SELECT  StepNumber ,
            
    Intsructions
    FROM    dbo.RecipeSteps AS rs
            
    JOIN dbo.RecipesStepByStep AS sbs ON rs.RecipeStepId = sbs.RecipeStepId
    WHERE   sbs.RecipeId = 2 ;

    StepNumber Intsructions
    ---------- --------------------------------------------------
    1          Put eggs INTO pan, ADD water
    2          Remove eggs 
    AND cool them down IN cold water
    3          Bring water 
    TO boiling, boil 4 minutes 

     

    Right now nothing prevents us from adding a second copy of the first recipe. You can run the following script and see for yourself that it succeeds:

     


    BEGIN TRANSACTION ;

    INSERT  INTO dbo.RecipesStepByStep
            
    ( RecipeId, StepNumber, RecipeStepId )
    VALUES  ( 3, 1, 1 ),
            ( 
    3, 2, 2 ),
            ( 
    3, 3, 3 ) ;

    SELECT  StepNumber ,
            
    Intsructions
    FROM    dbo.RecipeSteps AS rs
            
    JOIN dbo.RecipesStepByStep AS sbs ON rs.RecipeStepId = sbs.RecipeStepId
    WHERE   sbs.RecipeId = 3 ;

    ROLLBACK ;

    To enforce uniqueness of recipes, we shall use an indexed view and a unique index on top of it.

    To keeps the scripts simple, let us assume that recipies cannot consist of more than five steps:

    ALTER TABLE dbo.RecipesStepByStep 
    ADD CONSTRAINT RecipesStepByStepMax5 CHECK (StepNumber BETWEEN 1 AND 5) ;

    Let us also create an indexed view that stores all the steps of a recipe in one row.

    Once we have all the steps in one row, we can use a unique index to make sure that a sequence of steps is unique. We want the contents of the indexed view to look as follows, with the first line representing the first recipe:

     


    RecipeId    Step1       Step2       Step3       Step4       Step5
    ----------- ----------- ----------- ----------- ----------- -----------
    1           1           2           3           0           0
    2           1           3           2           0           0

     

    Note: To get around the limitations of indexed views, we need to make sure that all the pivoted columns are not nullable. This is why columns Step4 and Step5 are zeroes instead of nulls – otherwise we would not be able to create the unique index RecipesStepByStepsPivoted_UniqueRecipes in the following script:

     


    CREATE VIEW dbo.RecipesStepByStepsPivoted
    WITH SCHEMABINDING
    AS
    SELECT 
    RecipeId, 
    count_big(*) AS Cnt,
    SUM(CASE WHEN StepNumber = 1 THEN RecipeStepId ELSE 0 END) AS Step1,
    SUM(CASE WHEN StepNumber = 2 THEN RecipeStepId ELSE 0 END) AS Step2,
    SUM(CASE WHEN StepNumber = 3 THEN RecipeStepId ELSE 0 END) AS Step3,
    SUM(CASE WHEN StepNumber = 4 THEN RecipeStepId ELSE 0 END) AS Step4,
    SUM(CASE WHEN StepNumber = 5 THEN RecipeStepId ELSE 0 END) AS Step5
    FROM dbo.RecipesStepByStep
    GROUP BY RecipeId ;
    GO

    CREATE UNIQUE CLUSTERED INDEX RecipesStepByStepsPivoted_CI
    ON dbo.RecipesStepByStepsPivoted(RecipeId) ;
    GO

    CREATE UNIQUE INDEX RecipesStepByStepsPivoted_UniqueRecipes
    ON dbo.RecipesStepByStepsPivoted(Step1, Step2, Step3, Step4, Step5) ;
    GO
      
    SELECT  RecipeId ,
            
    Step1 ,
            
    Step2 ,
            
    Step3 ,
            
    Step4 ,
            
    Step5 
    FROM    dbo.RecipesStepByStepsPivoted ;


    RecipeId    Step1       Step2       Step3       Step4       Step5
    ----------- ----------- ----------- ----------- ----------- -----------
    1           1           2           3           0           0
    2           1           3           2           0           0

    It is easy to verify that the unique index built on top of the indexed view does enforce the uniqueness of recipes:


    BEGIN TRANSACTION ;

    INSERT  INTO dbo.RecipesStepByStep
            
    ( RecipeId, StepNumber, RecipeStepId )
    VALUES  ( 3, 1, 1 ),
            ( 
    3, 2, 2 ),
            ( 
    3, 3, 3 ) ;

    SELECT  StepNumber ,
            
    Intsructions
    FROM    dbo.RecipeSteps AS rs
            
    JOIN dbo.RecipesStepByStep AS sbs ON rs.RecipeStepId = sbs.RecipeStepId
    WHERE   sbs.RecipeId = 3 ;

    ROLLBACK ;

    Cannot INSERT duplicate KEY row IN object 'dbo.RecipesStepByStepsPivoted' WITH UNIQUE INDEX 'RecipesStepByStepsPivoted_UniqueRecipes'.
    The statement has been terminated.


    RecipeId    Step1       Step2       Step3       Step4       Step5
    ----------- ----------- ----------- ----------- ----------- -----------
    1           1           2           3           0           0
    2           1           3           2           0           0

     

    Note: because we are using an undex to enforce uniqueness, we must set a limit on the number of elements. It does not have to be as low as five – we set it that low just to keep the examples short.

    Let us close a couple of loopholes in this limited solution and move on to another problem.

    First, we used zeroes in the indexed view to indicate that there is no such step in the recipe. In our example both recipies had three steps, so the indexed view had zeroes in Step4 and Step5 columns in both rows. To distinguish between missing steps and valid step IDs, we need to make sure that zero is not a valid ID:

     

    ALTER TABLE dbo.RecipeSteps
    ADD CONSTRAINT CHK_RecipeSteps_PositiveId CHECK ( RecipeStepId > 0 );

    Also we need to make sure that step numbers in the recipe have no gaps, otrherwise we still can store the same recipe twice. The following script demonstrates the loophole:

     

    BEGIN TRANSACTION ;

    INSERT  INTO dbo.RecipesStepByStep
            
    ( RecipeId, StepNumber, RecipeStepId )
    VALUES  ( 3, 1, 1 ),
            ( 
    3, 3, 2 ),
            ( 
    3, 5, 3 ) ;

    SELECT  StepNumber ,
            
    Intsructions
    FROM    dbo.RecipeSteps AS rs
            
    JOIN dbo.RecipesStepByStep AS sbs ON rs.RecipeStepId = sbs.RecipeStepId
    WHERE   sbs.RecipeId = 3 ;

    ROLLBACK ;

    StepNumber Intsructions
    ---------- --------------------------------------------------
    1          Put eggs INTO pan, ADD water
    3          Bring water 
    TO boiling, boil 4 minutes
    5          Remove eggs 
    AND cool them down IN cold water
     

     

    To close the loophole, we need to make sure that step numbers have no gaps. The following script enforces the rule:

     

    ALTER TABLE dbo.RecipesStepByStep
    ADD PreviousStepNumber AS CAST(CASE WHEN StepNumber > 1 THEN StepNumber-1 END AS SMALLINT) PERSISTED ;
    GO

    ALTER TABLE dbo.RecipesStepByStep
    ADD CONSTRAINT FK_RecipesStepByStep_PreviousStep
    FOREIGN KEY(RecipeId, PreviousStepNumber) 
    REFERENCES dbo.RecipesStepByStep(RecipeId, StepNumber);

     

    Conclusion

    As we have seen, enforcing the uniqueness of ordered groups of rows with constraints is doable, although somewhat complex. I am not sure if I would use this approach with a large production table. I think that RDBMS is not designed to natively solve these kinds of problems - this is why solving this apparently simple problem is so complicated.

    If you are using some platform other than RDBMS to solve such problems, please comment this post and share your experiences.

     

     

     

  • The client code that handles timeouts

    After timeouts we need to make sure that active transactions, if any, are rolled back. All timeout handling must be done on the client. This post provides the implementation and unit tests.

    Implementation

    The following class extends SqlCommand and rolls back active transactions after timeouts:


        
    public static class SqlCommandExtentions
        {
            
    public static void ExecuteNonQueryWithErrorHandling(this SqlCommand command)
            
    {
                
    try
                
    {
                    command.ExecuteNonQuery
    ();
                
    }
                
    catch (SqlException e)
                
    {
                    HandleSqlException
    (command, e);
                    
    throw;
                
    }
            }

            
    public static SqlDataReader ExecuteReaderWithErrorHandling(this SqlCommand command)
            
    {
                
    try
                
    {
                    
    return command.ExecuteReader();
                
    }
                
    catch (SqlException e)
                
    {
                    HandleSqlException
    (command, e);
                    
    throw;
                
    }
            }

            
    private static void HandleSqlException(SqlCommand command, SqlException e)
            
    {
                
    const int timeoutCode = -2;
                
    if (e.Number != timeoutCode) return;
                
    var rollbackCommand = command.Connection.CreateCommand();
                
    rollbackCommand.CommandText = "IF @@TRANCOUNT > 0 BEGIN ; ROLLBACK ;  END ;";
                
    rollbackCommand.CommandType = CommandType.Text;
                
    rollbackCommand.CommandTimeout = 0;
                
    rollbackCommand.ExecuteNonQueryWithErrorHandling();
            
    }
        }
      

    Testing

    We need to unit test the two methods. For each one, we need to test the following four cases:

    • Command succeeds 
    • Command blows up with some other error, which must not be caught by our handling
    • Command times out, active transaction
    • Command times out, no active transaction

    The following tests cover these four cases for ExecuteReader. The other four tests are very similar, so there is no need to post them.

     
            [Test]
            
    public void ExecuteReaderWithErrorHandling_WorksOnSuccess()
            
    {
                
    using (var dr = ExecuteReaderWithErrorHandling(sqlConn, "SELECT 1 AS n"))
                
    {
                    Assert.IsTrue
    (dr.Read());
                    
    Assert.AreEqual(1, dr.GetInt32(0));
                
    }
            }

            [Test]
            
    public void ExecuteReaderWithErrorHandling_HandlesTimeout_ActiveTransaction()
            
    {
                
    try
                
    {
                    ExecuteReaderWithErrorHandling
    (sqlConn, "EXEC dbo.IWillTimeOut", timeoutInSeconds: 1);
                    
    Assert.Fail("Must throw exception");
                
    }
                
    catch(SqlException e)
                
    {
                    Assert.AreEqual
    (-2, e.Number);
                    
    VerifyNoOpenTransaction(sqlConn);
                
    }
            }

            [Test]
            
    public void ExecuteReaderWithErrorHandling_HandlesTimeout
    _NoActiveTransaction()
            
    {
                
    try
                
    {
                    ExecuteReaderWithErrorHandling
    (sqlConn, "EXEC dbo.IWillTimeOutWithoutTransaction", timeoutInSeconds: 1);
                    
    Assert.Fail("Must throw exception");
                
    }
                
    catch (SqlException e)
                
    {
                    Assert.AreEqual
    (-2, e.Number);
                    
    VerifyNoOpenTransaction(sqlConn);
                
    }
            }

            [Test]
            
    public void ExecuteReaderWithErrorHandling_ThrowsOtherExceptions()
            
    {
                
    try
                
    {
                    ExecuteReaderWithErrorHandling
    (sqlConn, "EXEC dbo.IThrowError8134");
                    
    Assert.Fail("Must throw exception");
                
    }
                
    catch (SqlException e)
                
    {
                    Assert.AreEqual
    (8134, e.Number);
                    
    VerifyNoOpenTransaction(sqlConn);
                
    }
            }

    These unit tests utilize a few database objects:

     
    CREATE FUNCTION dbo.IWillCompleteInNSeconds ( @NumSeconds INT )
    RETURNS INT
    AS
        BEGIN
    ;
          
    DECLARE @StartTime DATETIME ,
            
    @Toggle INT ;
          
    SELECT  @StartTime = GETDATE() ,
                  
    @Toggle = 1 ;
          
    WHILE DATEDIFF(SECOND, @StartTime, GETDATE()) < @NumSeconds
            
    BEGIN ;
              
    SET @Toggle = 1 - @Toggle ;
            
    END ;
          
    RETURN 0 ;
        
    END ;
    GO

    CREATE PROCEDURE dbo.IWillTimeOut
    AS
      BEGIN
    ;
        
    BEGIN TRANSACTION ;
        
    SELECT  dbo.IWillCompleteInNSeconds( 5 ) AS SomeNumber ;
        
    COMMIT ;
      
    END ;
    GO

    CREATE PROCEDURE dbo.IWillTimeOutWithoutTransaction
    AS
      BEGIN
    ;
        
    WAITFOR DELAY '00:00:05' ;
      
    END ;
    GO

    CREATE PROCEDURE dbo.IThrowError8134
    AS
    BEGIN
    ;
        
    SET XACT_ABORT ON ;
        
    DECLARE @i INT ;
      
    SELECT @i = 1/0 ;
      
    SELECT 1 AS n;
    END ;
     

  • After the timeout

    After the timeout, we may sometimes, but not always, end up with an outstanding transaction. If this is the case, we need to rollback the transaction ourselves. Otherwise the consequent commands which use the same connection might not work as expected.

    Let us debug through some client code and see what is happening in more detail.

    Prerequisites

    We shall need a few database objects, as follows:

     CREATE TABLE dbo.Log1(InTran VARCHAR(20));
    GO
    CREATE PROCEDURE dbo.TestTran
    AS
    BEGIN
       IF
    @@TRANCOUNT > 0 BEGIN
                           INSERT
    dbo.Log1(InTran) VALUES('In Tran');
                          
    END
                          ELSE BEGIN
                           INSERT
    dbo.Log1(InTran) VALUES('Not In Tran');
                          
    END ;
    END ;
    GO

    CREATE FUNCTION dbo.IWillCompleteInNSeconds ( @NumSeconds INT )
    RETURNS INT
    AS
        BEGIN
    ;
          
    DECLARE @StartTime DATETIME ,
            
    @Toggle INT ;
          
    SELECT  @StartTime = GETDATE() ,
                  
    @Toggle = 1 ;
          
    WHILE DATEDIFF(SECOND, @StartTime, GETDATE()) < @NumSeconds
            
    BEGIN ;
              
    SET @Toggle = 1 - @Toggle ;
            
    END ;
          
    RETURN 0 ;
        
    END ;
    GO

    CREATE PROCEDURE dbo.IWillTimeOut
    AS
      BEGIN
    ;
        
    BEGIN TRANSACTION ;
        
    SELECT  dbo.IWillCompleteInNSeconds( 5 ) AS SomeNumber ;
        
    COMMIT ;
      
    END ;
    GO
     

    Running the example

    In the following test, add three breakpoints at the placeholder statements:

           [Test]
            
    public void TimeoutTest()
            
    {
                
    for (var i = 0; i < 1000; i++)
                
    {
                    
    using (var conn = new SqlConnection(localConnString))
                    
    {
                        conn.Open
    ();
                        
    try
                        
    {
                            
    const string iwilltimeout = "EXEC dbo.IWillTimeOut";
                            
    ExecuteSql(conn, iwilltimeout, 1);
                        
    }
                        
    catch (SqlException e)
                        
    {
                            System.Console.WriteLine
    (e.Message + " " + e.Number);
                        
    }
                        
    const string saveTrancount = "EXEC dbo.TestTran";
                        
    ExecuteSql(conn, saveTrancount);
                        
    System.Console.WriteLine("Breakpoint placeholder 1");
                    
    }
                    
    using (var conn = new SqlConnection(localConnString))
                    
    {
                        conn.Open
    ();
                        
    const string saveTrancount = "EXEC dbo.TestTran";
                        
    System.Console.WriteLine("Breakpoint placeholder 2");
                        
    ExecuteSql(conn, saveTrancount);
                        
    System.Console.WriteLine("Breakpoint placeholder 3");
                    
    }
                }
            }
      

    Let us start a profiling session that captures individual SP:StmtStarting and SP:StmtCompleted events. Also let us debug through this code.

    First breakpoint - outstanding transaction

    When we stop at the first breakpoint, let us run the following query:

    SELECT * FROM dbo.Log1 WITH(NOLOCK)

    It will return "In Tran", which means that after the timeout dbo.TestTran was called in the middle of an outstanding transaction.

    Note: not all timeouts result in outstanding transactions. For example, later you can rerun this example using a different stored procedure:


    ALTER PROCEDURE dbo.IWillTimeOut
    AS
      BEGIN
    ;
        
    BEGIN TRANSACTION ;
        
    WAITFOR DELAY '00:00:05' ;
        
    COMMIT ;
      
    END ;
    GO

    You can see for yourself that when this procedure times out, there will be no outstanding transaction.

    Second breakpoint - still outstanding transaction

    Let us debug to the second breakpoint. Technically the first connection went out of scope and has been reused by the second using block, but we can still see the same uncommitted data:

    SELECT * FROM dbo.Log1 WITH(NOLOCK)

    The connection has not been reset yet - we do not see sp_reset_connection in the Profiler yet.

    Third breakpoint - connection reset

    Only when we actually get to run a command against the reused connection, the connection is actually reset. We can observe sp_reset_connection call in the Profiler, and the uncommitted data inserted by the first invocation of dbo.TestTran is gone.

    Note: to my best knowledge, there is no guarantee that the same connection will be reused. It consistently happens on my box, but it could work differently on yours. As usual with closed source systems, one can never know for sure.

    Consequences of continuing to use the same connection after the timeout

    As we have seen, if we catch a timeout and later modify the database using the same connection, all the modifications done after the timeout may be eventually rolled back and lost, without us getting any error messages.

    Conclusion

    We need to rollback the outstanding transaction ourselves. We must also be aware that not all timeouts result in outstanding transactions. I have written two extension methods that implement it, and I will post it soon.

    You may want to read more about connection pools here

  • Book Review: Pro SQL Server 2008 Relational Database Design and Implementation

    Investing in proper database design is a very efficient way to cut maintenance costs. If we expect a system to last, we need to make sure it has a good solid foundation - high quality database design.

    Surely we can and sometimes do cut corners and save on database design to get things done faster. Unfortunately, such cutting corners frequently comes back and bites us: we may end up spending a lot of time solving issues caused by poor design. 

    So, solid understanding of relational database design is absolutely essential for database developers, including Agile folks like myself. Let me put it stronger: we Agile developers especially need solid understanding of relational database design, because we are quite likely to cut corners and spend just enough effort to pass unit tests.

    This is why I think the book by "Pro SQL Server 2008 Relational Database Design and Implementation" by Louis Davidson, Kevin Kline, and others is a must read.

    The following is a chapter by chapter synopsis

    Introduction to Database Concepts

    This chapter introduces several crucial theoretical concepts, and each concept is demonstrated with a short crisp example. The chapter must be read before proceeding any further.

    The Language of Data Modeling

    This chapter describes various aspects of traditional waterfall methodology. As such, it can safely be skipped by Agile developers.

    Conceptual Data Modeling

    This chapter deals with business rules and requirements, and provides very good practical examples. As the previous chapter, all the examples are described in the context of waterfall methodology. However, this should not turn us off. Agile developers need to identify business rules and requirements, and implement them via database design, as mush as anyone else.

    This chapter is a must read for everyone, even if some of us do not use Erwin or Visio.

    The Normalization Process

    This chapter builds up on the concepts introduced in the previous one. As such, we shall get better results if we read about Conceptual Data Modeling first.

    As before, all theoretical concepts are illustrated with succinct practical examples. 

    In my opinion, this chapter is most important in this book - it gives a great description of a complex technique that is not always described well. If it was the last chapter in this book, the book would still be a highly useful one.

    Of course, there are other books describing normalization. This one really stands out because it provides clear in-depth explanations with good concrete practical examples.

    The rest of the book

    The next chapters of the book are describing various steps needed to implement the complete RDBMS solution, including developing T-SQL, securing access, indexing and so on. All these chapters are useful and practical as well.

    Although these chapters are very useful and a pleasure to read, I am not going into more detail here. The reason: many other books describe similar things as well. In my opinion, it is the first chapters that really make this book unique.

    Conclusion

    We may be using very different methodologies to develop systems build on top of RDBMS. However, eventually professionally developed systems that are meant to last should have proper database design.

    The book provides a practical description of what  is proper database design. As such, it will be really useful for all developers who want to develop robust systems using RDBMS.

    Enjoy!

     

  • Awesome videos on Agile

    The following videos are a must-see for anyone interested in Agile methodology:

     

    Decisions, Decisions

     

    Embracing Uncertainty

     

    Patterns of Effective Delivery

    Enjoy!

     

  • Yet another use of OUTER APPLY in defensive programming

    When a SELECT is used to populate variables from a subquery, it fails to change them if the subquery returns nothing - and that can lead to subtle bugs. We shall use OUTER APPLY to eliminate this problem.

    Prerequisites

    All we need is the following mock function that imitates a subquery:

    CREATE FUNCTION dbo.BoxById ( @BoxId INT )
    RETURNS TABLE
    AS RETURN
      
    ( SELECT  CAST(1 AS INT) AS [Length] ,
                
    CAST(2 AS INT) AS [Width] ,
                
    CAST(3 AS INT) AS [Height]
        
    WHERE   @BoxId = 1
      
    ) ;

    Let us assume that this subquery is slow and we do not want to run it more than absolutely necessary.

    Short recap on SET vs. SELECT

    The following approach uses SET to populate variables:

    DECLARE @Length INT ,
      
    @Width INT ,
      
    @BoxId INT ;

    SELECT  @Length = 0 ,
            
    @Width = 0  ;

    SET @BoxId = 1 ;

    SET @Length = ( SELECT  [Length]
                    
    FROM    dbo.BoxById(@BoxId)
                  ) ;
    SET @Width = ( SELECT [Width]
                  
    FROM   dbo.BoxById(@BoxId)
                 ) ;

    This script will always change the values of variables, even when the subquery returns nothing, which is good. However, it runs a possibly slow/expensive subquery twice. Also it duplicates code. We want to avoid both these disadvantages.

    The following script uses SELECT, it has no duplication, and it runs the subquery only once:

    DECLARE @Length INT ,
      
    @Width INT ,
      
    @BoxId INT ;
      
    SELECT  @Length = 0 ,
            
    @Width = 0  ;

    SET @BoxId = 1 ;

    SELECT  @Length = [Length] ,
            
    @Width = [Width]
    FROM    dbo.BoxById(@BoxId) ;

    SELECT  @Length AS [Length] ,
            
    @Width AS [Width] ;
     

    This code looks cleaner and performs better, but it has the following problem: if the subquery returns nothing, the variables are not changed. You can change @BoxId from 1 to any other number, rerun the script, and see for yourself.

    When the variables are not changed by an assignment, that can lead to subtle bugs, 

    like this one.

    To avoid this vulnerability, we can use OUTER APPLY, as follows:

    DECLARE @Length INT ,
      
    @Width INT ,
      
    @BoxId INT ;
      
    SELECT  @Length = 0 ,
            
    @Width = 0  ;

    SET @BoxId = 2 ;

    SELECT  @Length = [Length] ,
            
    @Width = [Width]
    FROM    (SELECT 1 AS PlaceHolder) AS AlwaysOneRow
    OUTER APPLY dbo.BoxById(@BoxId) ;

    SELECT  @Length AS [Length] ,
            
    @Width AS [Width] ;

    As a result, we still have no duplication, run an expensive subquery only once, and also always change the variables, making the code less vulnerable to bugs.
  • Catching multiple exceptions on the client is robust and easy

    Maria Zakourdaev has just demonstrated that if our T-SQL throws multiple exceptions,
    ERROR_MESSAGE() in TRY..CATCH block will only expose one.

    When we handle errors in C#, we have a very easy access to all errors.

    The following procedure throws two exceptions:

    CREATE PROCEDURE dbo.ThrowsTwoExceptions
    AS
      BEGIN
    ;
        
    RAISERROR('Error 1', 16, 1) ;
        
    RAISERROR('Error 2', 16, 1) ;
      
    END ;
    GO

    EXEC dbo.ThrowsTwoExceptions ;

    Both exceptions are shown by SSMS:

    Msg 50000, LEVEL 16, State 1, PROCEDURE ThrowsTwoExceptions, Line 4
    Error 1
    Msg 50000
    , LEVEL 16, State 1, PROCEDURE ThrowsTwoExceptions, Line 5
    Error 2

    The following C# code shows how easily we can access both exceptions:

    try
    {
        var command
    = connection.CreateCommand();
        
    command.CommandText = "dbo.ThrowsTwoExceptions";
        
    command.CommandType = CommandType.StoredProcedure;
        
    command.ExecuteNonQuery();
    }
    catch(SqlException e)
    {
        Console.WriteLine
    (e.ToString());
        
    foreach (var error in e.Errors)
        
    {
            var exception
    = error as SqlError;
            
    if(exception != null)
            
    Console.WriteLine(string.Format("Number: {0} Message: {1}", exception.Number, exception.Message));
        
    }
    }

    Its output is as follows:

    Number: 50000 Message: Error 1
    Number: 50000 Message: Error 2

    As we have seen, catching more than one exception is easy if we use C#.

  • Avoiding nested transactions might not improve performance.

    Beginning a transaction only when @@TRANCOUNT=0 might not improve performance at all. At least, I did not notice any difference whatsoever. No matter if I use this pattern:

    BEGIN TRAN ;
    -- (snip)
    COMMIT ;

    or a more complex one:

    DECLARE @trancount INT ;
    SET @trancount = @@TRANCOUNT ;
    IF 
    @trancount = 0 BEGIN ;
      
    BEGIN TRAN ;
    END ;
    --(snip)
    IF 
    @trancount = 0 BEGIN ;
      
    COMMIT ;
    END ;
     

    the performance stays the same: I was not able to notice any difference. Here are my benchmarks.

    Prerequisites

    All we need is the following two tables:

    CREATE TABLE dbo.Toggle1
        
    (
          
    id INT NOT NULL
                
    PRIMARY KEY ,
          
    i INT NOT NULL
        ) ;
    GO
    INSERT  INTO dbo.Toggle1
            
    ( id, i )
    VALUES  ( 1, 0 ) ;
    GO
    CREATE TABLE dbo.Toggle2
        
    (
          
    id INT NOT NULL
                
    PRIMARY KEY ,
          
    i INT NOT NULL
        ) ;
    GO
    INSERT  INTO dbo.Toggle2
            
    ( id, i )
    VALUES  ( 1, 0 ) ;
     

    Simple benchmarking

    First, let us run a very simple case, without nested stored procedure calls:

    CREATE PROCEDURE dbo.ToggleWithNestedTrans
      
    @NumIterations INT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SET XACT_ABORT ON ;
            
    DECLARE @i INT ;
            
    SET @i = 0 ;
            
    WHILE @i < @NumIterations
                
    BEGIN ;
                    
    BEGIN TRAN ;
                    
    UPDATE  dbo.Toggle1
                    
    SET     i = 1 - i
                    
    WHERE   id = 1 ;
                    
    BEGIN TRAN ;
                    
    UPDATE  dbo.Toggle2
                    
    SET     i = 1 - i
                    
    WHERE   id = 1 ;
                    
    COMMIT ;
                    
    COMMIT ;
                    
    SET @i = @i + 1 ;
                
    END ;
        
    END ;
    GO

    CREATE PROCEDURE dbo.ToggleAvoidingNestedTrans
      
    @NumIterations INT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SET XACT_ABORT ON ;
            
    DECLARE @i INT ;
            
    SET @i = 0 ;
            
    WHILE @i < @NumIterations
                
    BEGIN ;
                    
    BEGIN TRAN ;
                    
    UPDATE  dbo.Toggle1
                    
    SET     i = 1 - i
                    
    WHERE   id = 1 ;
                    
    DECLARE @trancount INT ;
                    
    SET @trancount = @@TRANCOUNT ;
                    
    IF 
    @trancount = 0 BEGIN ;
                      
    BEGIN TRAN ;
                  
    END ;
                    
    UPDATE  dbo.Toggle2
                    
    SET     i = 1 - i
                    
    WHERE   id = 1 ;
                    
    IF
    @trancount = 0 BEGIN ;
                      
    COMMIT ;
                  
    END ;
                    
    COMMIT ;
                    
    SET @i = @i + 1 ;
                
    END ;
        
    END ;
    GO

    CREATE PROCEDURE dbo.ToggleNoNestedTrans
      
    @NumIterations INT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SET XACT_ABORT ON ;
            
    DECLARE @i INT ;
            
    SET @i = 0 ;
            
    WHILE @i < @NumIterations
                
    BEGIN ;
                    
    BEGIN TRAN ;
                    
    UPDATE  dbo.Toggle1
                      
    SET     i = 1 - i
                      
    WHERE   id = 1 ;
                    
    UPDATE  dbo.Toggle2
                      
    SET     i = 1 - i
                      
    WHERE   id = 1 ;
                    
    COMMIT ;
                    
    SET @i = @i + 1 ;
                
    END ;
        
    END ;
    GO

    EXEC dbo.ToggleWithNestedTrans @NumIterations = 100000;
    GO
    EXEC dbo.ToggleAvoidingNestedTrans @NumIterations = 100000;
    GO

    EXEC dbo.ToggleWithNestedTrans @NumIterations = 100000;
    GO
    EXEC dbo.ToggleAvoidingNestedTrans @NumIterations = 100000;
    GO

    EXEC dbo.ToggleWithNestedTrans @NumIterations = 100000;
    GO
    EXEC dbo.ToggleAvoidingNestedTrans @NumIterations = 100000;

    GO

    EXEC dbo.ToggleNoNestedTrans @NumIterations = 100000;
    GO
    EXEC dbo.ToggleNoNestedTrans @NumIterations = 100000;
    GO
    EXEC dbo.ToggleNoNestedTrans @NumIterations = 100000;
    GO
     
     

    In fact, real execution costs are the same for both approaches, and not different from then run without nested transactions at all.

    Benchmarking with nested stored procedure calls

    Let us benchmark another, possibly more realistic scenario, when the avoiding nested transaction logic is inside another, nested stored procedure. Here is the benchmarking script:

    CREATE PROCEDURE dbo.Toggle2WithNestedTran
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SET XACT_ABORT ON ;
            
    BEGIN TRAN ;
            
    UPDATE  dbo.Toggle2
            
    SET     i = 1 - i
            
    WHERE   id = 1 ;
            
    COMMIT ;
        
    END ;
    GO

    ALTER PROCEDURE dbo.ToggleWithNestedTrans
      
    @NumIterations INT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SET XACT_ABORT ON ;
            
    DECLARE @i INT ;
            
    SET @i = 0 ;
            
    WHILE @i < @NumIterations
                
    BEGIN ;
                    
    BEGIN TRAN ;
                    
    UPDATE  dbo.Toggle1
                    
    SET     i = 1 - i
                    
    WHERE   id = 1 ;
                    
    EXEC dbo.Toggle2WithNestedTran ;
                    
    COMMIT ;
                    
    SET @i = @i + 1 ;
                
    END ;
        
    END ;
    GO

    CREATE PROCEDURE dbo.Toggle2AvoidingNestedTrans
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SET XACT_ABORT ON ;
            
    DECLARE @trancount INT ;
            
    SET @trancount = @@TRANCOUNT ;
            
    IF
    @trancount = 0 BEGIN ;
              
    BEGIN TRAN ;
          
    END ;
            
    UPDATE  dbo.Toggle2
            
    SET     i = 1 - i
            
    WHERE   id = 1 ;
            
    IF 
    @trancount = 0 BEGIN ;
              
    COMMIT ;
          
    END ;
        
    END ;
    GO

    ALTER PROCEDURE dbo.ToggleAvoidingNestedTrans
      
    @NumIterations INT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SET XACT_ABORT ON ;
            
    DECLARE @i INT ;
            
    SET @i = 0 ;
            
    WHILE @i < @NumIterations
                
    BEGIN ;
                    
    BEGIN TRAN ;
                    
    UPDATE  dbo.Toggle1
                    
    SET     i = 1 - i
                    
    WHERE   id = 1 ;
                    
    EXEC dbo.Toggle2AvoidingNestedTrans ;
                    
    COMMIT ;
                    
    SET @i = @i + 1 ;
                
    END ;
        
    END ;
    GO

    Again, I was not able to notice any difference.

    Conclusion

    Apparently the following logic does not affect performance at all:

    DECLARE @trancount INT ;
    SET @trancount = @@TRANCOUNT ;
    IF @TRANCOUNT = 0 BEGIN ;
      
    BEGIN TRAN ;
    END ;
    --(snip)
    IF @TRANCOUNT = 0 BEGIN ;
      
    COMMIT ;
    END ;

    As such, this pattern seems to be just an unnecessary complication.

    What do you think - are you getting the same results when you benchmark on your system?

    Are you using this pattern for other than performance? What are those?

  • Using XACT_ABORT ON may be faster than using TRY...CATCH

    To ensure atomicity of transactions, we can use XACT_ABORT ON or wrap the transaction in TRY block and rollback in CATCH block. In some cases, the XACT_ABORT ON approach uses noticeably less CPU. I am posting repro scripts. Please run them, tweak them, and post your findings.

    Environment

    I've run my scripts on 2008 R2 Dev Edition. Snapshot isolation is enabled, READ_COMMITTED_SNAPSHOT is not enabled.

    Test data

     We are using the same test data as in my previous post:

     CREATE TABLE dbo.Toggle1
        
    (
          
    id INT NOT NULL
                
    PRIMARY KEY ,
          
    i INT NOT NULL
        ) ;
    GO
    INSERT  INTO dbo.Toggle1
            
    ( id, i )
    VALUES  ( 1, 0 ) ;
    GO
    CREATE TABLE dbo.Toggle2
        
    (
          
    id INT NOT NULL
                
    PRIMARY KEY ,
          
    i INT NOT NULL
        ) ;
    GO
    INSERT  INTO dbo.Toggle2
            
    ( id, i )
    VALUES  ( 1, 0 ) ;

    Benchmarking

    The following two stored procedures run the same modifications 10K times. The first one uses XACT_ABORT ON to ensure atomicity of the transaction:

    CREATE PROCEDURE dbo.Toggle10kTimesWithXactAbortOn
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SET XACT_ABORT ON ;
            
    DECLARE @i INT ;
            
    SET @i = 0 ;
            
    WHILE @i < 10000
                
    BEGIN ;
                    
    BEGIN TRAN ;
                    
    UPDATE  dbo.Toggle1
                    
    SET     i = 1 - i
                    
    WHERE   id = 1 ;
                    
    UPDATE  dbo.Toggle2
                    
    SET     i = 1 - i
                    
    WHERE   id = 1 ;
                    
    COMMIT ;
                    
    SET @i = @i + 1 ;
                
    END ;
        
    END ;
    The second procedure uses a ROLLBACK in a CATCH block:

     CREATE PROCEDURE dbo.Toggle10kTimesWithTryCatch
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    DECLARE @i INT ;
            
    SET @i = 0 ;
            
    WHILE @i < 10000
                
    BEGIN ;
                    
    BEGIN TRY ;
                        
    BEGIN TRAN ;
                        
    UPDATE  dbo.Toggle1
                        
    SET     i = 1 - i
                        
    WHERE   id = 1 ;
                        
    UPDATE  dbo.Toggle2
                        
    SET     i = 1 - i
                        
    WHERE   id = 1 ;
                        
    COMMIT ;
                    
    END TRY
                    
    BEGIN CATCH ;
                        
    ROLLBACK ;
                    
    END CATCH ;
                    
    SET @i = @i + 1 ;
                
    END ;
        
    END ;

    Benchmarking

    Let us run these two procedures several times:

    EXEC dbo.Toggle10kTimesWithXactAbortOn ;
    GO
    EXEC dbo.Toggle10kTimesWithTryCatch ;

    Typical results are as follows:

    dbo.Toggle10kTimesWithXactAbortOn

    CPU: 265-280

    dbo.Toggle10kTimesWithTryCatch

    CPU: 406-468

     

    Conclusion

    As we have seen, sometimes using XACT_ABORT ON may use less CPU as opposed to using TRY...CATCH.

    I am and not making any blanket statements here - you are encouraged to run your own benchmarks and see if you are observing this effect in your situation. If you do so, can you do us all a favor and post your findings.


  • Wrapping related changes in a transaction may use less CPU.

    Wrapping related changes in a transaction is a good way to ensure data integrity. Besides, in some cases it just runs noticeably faster, using less CPU. As usual, I am posting repro scripts, which you can run, tweak, and see for yourself.

    Environment

    I've run my scripts on 2008 R2 Dev Edition. Snapshot isolation is enabled, READ_COMMITTED_SNAPSHOT is not enabled.

    Test data

     After applying Occum's razor, all we need is two tables with one row in each:

     CREATE TABLE dbo.Toggle1
        
    (
          
    id INT NOT NULL
                
    PRIMARY KEY ,
          
    i INT NOT NULL
        ) ;
    GO
    INSERT  INTO dbo.Toggle1
            
    ( id, i )
    VALUES  ( 1, 0 ) ;
    GO
    CREATE TABLE dbo.Toggle2
        
    (
          
    id INT NOT NULL
                
    PRIMARY KEY ,
          
    i INT NOT NULL
        ) ;
    GO
    INSERT  INTO dbo.Toggle2
            
    ( id, i )
    VALUES  ( 1, 0 ) ;
     

    Benchmarking

    The following two stored procedures run the same modifications 10K times. The first one does not use an explicit transaction:

     CREATE PROCEDURE dbo.Toggle50kTimesX3WithoutTransaction
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SET XACT_ABORT OFF ;
            
    DECLARE @i INT ;
            
    SET @i = 0 ;
            
    WHILE @i < 50000
                
    BEGIN ;
                    
    UPDATE  dbo.Toggle1
                    
    SET     i = 1 - i
                    
    WHERE   id = 1 ;
                    
    UPDATE  dbo.Toggle2
                    
    SET     i = 1 - i
                    
    WHERE   id = 1 ;
                    SET @i = @i + 1 ;
                
    END ;
        
    END ;

    The second procedure does use an explicit transaction:

    CREATE PROCEDURE dbo.Toggle50kTimesX3WithXactAbortOff
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SET XACT_ABORT OFF ;
            
    DECLARE @i INT ;
            
    SET @i = 0 ;
            
    WHILE @i < 50000
                
    BEGIN ;
                    
    BEGIN TRAN ;
                    
    UPDATE  dbo.Toggle1
                    
    SET     i = 1 - i
                    
    WHERE   id = 1 ;
                    UPDATE  dbo.Toggle2
                    
    SET     i = 1 - i
                    
    WHERE   id = 1 ;
                    COMMIT ;
                    
    SET @i = @i + 1 ;
                
    END ;
        
    END ;

    Let's run the following script a few times:

    EXEC dbo.Toggle50kTimesX3WithoutTransaction ;
    GO
    EXEC dbo.Toggle50kTimesX3WithXactAbortOff ;
     

    Typical CPU and duration is as follows:

     dbo.Toggle50kTimesX3WithoutTransaction

    CPU: 1900-2340 Duration:5900

    dbo.Toggle50kTimesX3WithXactAbortOff

    CPU: 1500-1700 Duration: 3200-3300

    Conclusion

    As we have seen, wrapping related changes in as transaction may improve performance.By related, I mean changes that should all succeed or all fail if we are ensuring atomicity.

    However, we typically need to keep transactions short, so we should not bundle unrelated changes in transactions - that may hurt concurrency.

    Also I am being very careful and not making any blanket statements here. I did not research this effect with due diligence. I noticed the effect in one case and came up with a repro that works in one particular case on one server, this is all.

    I would really appreciate if you run the repro, tweak it as you see needed, and post the results here. TIA!

    In general, we should not assume that larger transactions always mean better performance - the opposite is true in many cases. We should always benchmark the specific case at hand and decide on case by case basis.

    Edit: I have simplified the scripts a little bit, and that changed the numbers a little bit.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement