THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Piotr Rodak

if datepart(dw, getdate()) in (6, 7)
    use pubs;



  • Hadoop growing pains

    This post is not going to be about SQL Server. I have been reading recently more and more about “Big Data” – very catchy term that describes untamed increase of the data that mankind is producing each day and the struggle to capture the meaning of these data. Ten years ago, and perhaps even three years ago this need was not so recognized. Increasing number of smartphones and discernable trend of mainstream Internet traffic moving to the smartphone generated one means that there is bigger and bigger stream of information that has to be stored, transformed, analysed and perhaps monetized. The nature of this traffic makes if very difficult to wrap it into boundaries of relational database engines. The amount of data makes it near to impossible to process them in relational databases within reasonable time. This is where ‘cloud’ technologies come to play.

    I just read a good article about the growing pains of Hadoop, which became one of the leading players on distributed processing arena within last year or two. Toby Baer concludes in it that lack of enterprise ready toolsets hinders Hadoop’s apprehension in the enterprise world. While this is true, something else drew my attention. According to the article there are already about half of a dozen of commercially supported distributions of Hadoop. For me, who has not been involved into intricacies of open-source world, this is quite interesting observation. On one hand, it is good that there is competition as it is beneficial in the end to the customer. On the other hand, the customer is faced with difficulty of choosing the right distribution. In future, when Hadoop distributions fork even more, this choice will be even harder. The distributions will have overlapping sets of features, yet will be quite incompatible with each other. I suppose it will take a few years until leaders emerge and the market will begin to resemble what we see in Linux world. There are myriads of distributions, but only few are acknowledged by the industry as enterprise standard. Others are honed by bearded individuals with too much time to spend.

    In any way, the third fact I can’t help but notice about the proliferation of distributions of Hadoop is that IT professionals will have jobs.

     

  • Running SSIS packages from C#

    Most of the developers and DBAs know about two ways of deploying packages: You can deploy them to database server and run them using SQL Server Agent job or you can deploy the packages to file system and run them using dtexec.exe utility. Both approaches have their pros and cons. However I would like to show you that there is a third way (sort of) that is often overlooked, and it can give you capabilities the ‘traditional’ approaches can’t.

    I have been working for a few years with applications that run packages from host applications that are implemented in .NET. As you know, SSIS provides programming model that you can use to implement more flexible solutions. SSIS applications are usually thought to be batch oriented, with fairly rigid architecture and processing model, with fixed timeframes when the packages are executed to process data. It doesn’t to be the case, you don’t have to limit yourself to batch oriented architecture. I have very good experiences with service oriented architectures processing large amounts of data. These applications are more complex than what I would like to show here, but the principle stays the same: you can execute packages as a service, on ad-hoc basis. You can also implement and schedule various signals, HTTP calls, file drops, time schedules, Tibco messages and other to run the packages. You can implement event handler that will trigger execution of SSIS when a certain event occurs in StreamInsight stream.

    This post is just a small example of how you can use the API and other features to create a service that can run SSIS packages on demand.

    I thought it might be a good idea to implement a restful service that would listen to requests and execute appropriate actions. As it turns out, it is trivial in C#. The application is implemented as console application for the ease of debugging and running. In reality, you might want to implement the application as Windows service. To begin, you have to reference namespace System.ServiceModel.Web and then add a few lines of code:

    1. Uri baseAddress = new Uri("http://localhost:8011/");
    2.  
    3.             WebServiceHost svcHost = new WebServiceHost(typeof(PackRunner), baseAddress);
    4.              
    5.             try
    6.             {
    7.                 svcHost.Open();
    8.  
    9.                 Console.WriteLine("Service is running");
    10.                 Console.WriteLine("Press enter to stop the service.");
    11.                 Console.ReadLine();
    12.  
    13.                 svcHost.Close();
    14.             }
    15.             catch (CommunicationException cex)
    16.             {
    17.                 Console.WriteLine("An exception occurred: {0}", cex.Message);
    18.                 svcHost.Abort();
    19.             }

    The interesting lines are 3, 7 and 13. In line 3 you create a WebServiceHost object. In line 7 you start listening on the defined URL and then in line 13 you shut down the service.

    As you have noticed, the WebServiceHost constructor is accepting type of an object (here: PackRunner) that will be instantiated as singleton and subsequently used to process the requests. This is the class where you put your logic, but to tell WebServiceHost how to use it, the class must implement an interface which declares methods to be used by the host. The interface itself must be ornamented with attribute ServiceContract.

    1. [ServiceContract]
    2.     public interface IPackRunner
    3.     {
    4.         [OperationContract]
    5.         [WebGet(UriTemplate = "runpack?package={name}")]
    6.         string RunPackage1(string name);
    7.  
    8.         [OperationContract]
    9.         [WebGet(UriTemplate = "runpackwithparams?package={name}&rows={rows}")]
    10.         string RunPackage2(string name, int rows);
    11.     }

    Each method that is going to be used by WebServiceHost has to have attribute OperationContract, as well as WebGet or WebInvoke attribute. The detailed discussion of the available options is outside of scope of this post. I also recommend using more descriptive names to methods Smile.

    Then, you have to provide the implementation of the interface:

    1. public class PackRunner : IPackRunner
    2.     {
    3.         ...

    There are two methods defined in this class. I think that since the full code is attached to the post, I will show only the more interesting method, the RunPackage2.

     

    1. /// <summary>
    2. /// Runs package and sets some of its variables.
    3. /// </summary>
    4. /// <param name="name">Name of the package</param>
    5. /// <param name="rows">Number of rows to export</param>
    6. /// <returns></returns>
    7. public string RunPackage2(string name, int rows)
    8. {
    9.     try
    10.     {
    11.         string pkgLocation = ConfigurationManager.AppSettings["PackagePath"];
    12.  
    13.         pkgLocation = Path.Combine(pkgLocation, name.Replace("\"", ""));
    14.  
    15.         Console.WriteLine();
    16.         Console.WriteLine("Calling package {0} with parameter {1}.", name, rows);
    17.         
    18.         Application app = new Application();
    19.         Package pkg = app.LoadPackage(pkgLocation, null);
    20.  
    21.         pkg.Variables["User::ExportRows"].Value = rows;
    22.         DTSExecResult pkgResults = pkg.Execute();
    23.         Console.WriteLine();
    24.         Console.WriteLine(pkgResults.ToString());
    25.         if (pkgResults == DTSExecResult.Failure)
    26.         {
    27.             Console.WriteLine();
    28.             Console.WriteLine("Errors occured during execution of the package:");
    29.             foreach (DtsError er in pkg.Errors)
    30.                 Console.WriteLine("{0}: {1}", er.ErrorCode, er.Description);
    31.             Console.WriteLine();
    32.             return "Errors occured during execution. Contact your support.";
    33.         }
    34.         
    35.         Console.WriteLine();
    36.         Console.WriteLine();
    37.         return "OK";
    38.     }
    39.     catch (Exception ex)
    40.     {
    41.         Console.WriteLine(ex);
    42.         return ex.ToString();
    43.     }
    44. }

     

    The method accepts package name and number of rows to export. The packages are deployed to the file system. The path to the packages is configured in the application configuration file. This way, you can implement multiple services on the same machine, provided you also configure the URL for each instance appropriately.

    To run a package, you have to reference Microsoft.SqlServer.Dts.Runtime namespace. This namespace is implemented in Microsoft.SQLServer.ManagedDTS.dll which in my case was installed in the folder “C:\Program Files (x86)\Microsoft SQL Server\100\SDK\Assemblies”. Once you have done it, you can create an instance of Microsoft.SqlServer.Dts.Runtime.Application as in line 18 in the above snippet. It may be a good idea to create the Application object in the constructor of the PackRunner class, to avoid necessity of recreating it each time the service is invoked. Then, in line 19 you see that an instance of Microsoft.SqlServer.Dts.Runtime.Package is created. The method LoadPackage in its simplest form just takes package file name as the first parameter.

    Before you run the package, you can set its variables to certain values. This is a great way of configuring your packages without all the hassle with dtsConfig files. In the above code sample, variable “User:ExportRows” is set to value of the parameter “rows” of the method.

    Eventually, you execute the package. The method doesn’t throw exceptions, you have to test the result of execution yourself. If the execution wasn’t successful, you can examine collection of errors exposed by the package. These are the familiar errors you often see during development and debugging of the package. I you run the package from the code, you have opportunity to persist them or log them using your favourite logging framework.

    The package itself is very simple; it connects to my AdventureWorks database and saves number of rows specified in variable “User::ExportRows” to a file.

    You should know that before you run the package, you can change its connection strings, logging, events and many more.

    I attach solution with the test service, as well as a project with two test packages.

    To test the service, you have to run it and wait for the message saying that the host is started.

    Then, just type (or copy and paste) the below command to your browser.

    http://localhost:8011/runpackwithparams?package=%22ExportEmployees.dtsx%22&rows=12

    When everything works fine, and you modified the package to point to your AdventureWorks database, you should see "OK” wrapped in xml:

    I stopped the database service to simulate invalid connection string situation. The output of the request is different now:

    And the service console window shows more information:

    As you see, implementing service oriented ETL framework is not a very difficult task. You have ability to configure the packages before you run them, you can implement logging that is consistent with the rest of your system. In application I have worked with we also have resource monitoring and execution control. We don’t allow to run more than certain number of packages to run simultaneously. This ensures we don’t strain the server and we use memory and CPUs efficiently.

    The attached zip file contains two projects. One is the package runner. It has to be executed with administrative privileges as it registers HTTP namespace. The other project contains two simple packages.

    This is really a cool thing, you should check it out!

  • Server level permissions for developers–why you should read books

    It is quite difficult recently for me to find some time to write a new post, so I don’t seem to be leading the rankings of the most frequent bloggers. I rarely recently have opportunity to lay my hands on the code too, so natural sources of ‘inspiration’ are less often. Hopefully this will change in 2012 and I will have more opportunities to write something useful.

    In a big enterprise the roles of server users are usually more distinguished than in smaller companies. You have DBAs, developers and business users. Each of these roles has generally different goals and different rights when it comes to connecting to the database servers. Large organizations tend to implement service oriented approach to processes, which means that various teams are responsible for executing various tasks ordered by other teams. In the case of developers and DBAs, the latter are responsible for all maintenance and management tasks on servers which are ‘owned’ by developers. Such organisation of work often results in sealing off developers from the maintenance related activities, such as creating databases or snapshots, running traces, viewing server state and troubleshooting blocks and deadlocks. This approach unfortunately, while imposes more order to the server utilization and prevents accidental failures, tends to slow down the development process, sometimes preventing any work from being done by significant number of developers for time anything from few minutes to few hours. This is a waste of time and money of course. In my case, development team wanted to have right to kill user sessions on development servers. You know, sometimes there is a process that takes a lot of time and locks, or there is an open transaction keeping schema locks all over the place and and the owner of it is gone for the day without committing it.

    Problem is, that to be able to kill a session, you have to be a sysadmin, processadmin, or have ALTER ANY CONNECTION right granted to you at server level. Obviously, in certain environments these permissions are not the ones DBAs want to give to random people developers. If you GRANT ALTER ANY CONNECTION to a login, this login can kill some important sessions, not only the runaway ones. It may break audit, scheduled maintenance, backups, other teams ETL processes – just imagine your mailbox – it ain’t gonna look pretty. It would be much better to be able to grant login right to kill only spids belonging to certain users or working on certain databases. How can you eat the cake and have the cake?

    I remembered that I had read about this idea in one of the books, namely Expert SQL Server 2005 Development by Adam Machanic, Lara Rubbelke and Hugo Kornelis. This is an excellent book, you should really read it if you haven’t yet done so. There is a chapter in the book, discussing the security features introduced in SQL Server 2005. These features allow architects to design more versatile and complex applications without jeopardizing security requirements. As you know, the security model in SQL Server 2005 has been completely revamped, for example the notion of the schema was separated from the ownership and database user. There are more features, one of most important probably is the ability to sign stored procedures with certificates and to create logins and users from the certificates. The discussion about security architecture and certificates is huge and going beyond scope of this post. One of the starting points you can check out is the MSDN article "Securing SQL Server”.

    In short, if you were to grant server wide elevated rights to certain groups of users, how would you do it? Here’s a little shopping list:

    1. Create a certificate
    2. Create a login from the certificate
    3. Grant ALTER ANY CONNECTION to the login
    4. If you want to have the procedure in database other than master, you must copy the certificate to this database. You do it with backup-restore technique.
    5. Create a user for the login
    6. Create stored procedure that will issue KILL command
    7. Sign the procedure with the certificate
    8. Grant EXECUTE on the procedure to developers.

    Here are some highlights of the solution. I attach the testing code at the end of the post as usual.

    Sample code for creating certificate and the login is as follows.

    1. ---create certificate that the procedure will be signed with
    2. create certificate kill_session_certificate
    3.     encryption by password = '1410SomeReallyStrongPassword2011!'
    4.     with subject = 'Enable KILL through procedure'
    5. go
    6. ---create login that will be granted right to kill sessions
    7. create login kill_login from certificate kill_session_certificate
    8. go
    9.  
    10. --licence to kill
    11. grant ALTER ANY CONNECTION to kill_login

    This is the stored procedure developers can call to kill sessions.

    1. create procedure killsession(@sessionid int)
    2. as
    3. begin
    4.     /* Piotr Rodak: procedure calls KILL command. Must be signed by certificate to work.*/
    5.     print 'killsession: executing as ' + suser_name()
    6.     declare @command varchar(300)
    7.     set @command = 'kill ' + convert(varchar, @sessionid)
    8.     print @command
    9.     ---you can filter only sessions running on certain databases here..
    10.     
    11.     exec(@command)
    12.     
    13.     ---you can add logging here..
    14.     print 'killed session ' + convert(varchar, @sessionid)
    15. end

    You can create the procedure in any database, but if you choose a database different than master you have to copy the certificate from master to the target database.

    And here’s code to sign the procedure:

    1. ---sign the procedure with the certificate
    2. add signature to killsession by certificate kill_session_certificate
    3. with password ='1410SomeReallyStrongPassword2011!'

    Since you are executing the command wrapped into a stored procedure, you have potential to implement additional audit, like grab execution plan or some other performance indicators of the server at the moment when the command is issued. This can be extremely helpful for troubleshooting reasons. I think this is pretty cool.

    Here’s the testing script.

    Enjoy.

  • OUTPUT clause and windowing functions

    A few days ago I was asked a question how to remove duplicates from a table. As usually in such cases, the duplicate values were related only to several columns, not all of them. So classical approach, to copy distinct data into temp table, truncate the table with duplicates and then copy the data back wouldn’t work. In such cases I find one of the windowing functions, row_number() to be the easiest to use. You just define criteria of numbering of rows and then you remove all rows are not first in their partition. Let’s have a look how it works.

    First, let’s create and populate sample table:

    1. if object_id('tempdb..#t') is not null
    2.         drop table #t
    3.  
    4. create table #t(Id char(3), randomData uniqueidentifier default(newid()))
    5.  
    6. insert #t(Id)
    7. values
    8. ('123'), ('124'),
    9. ('133'), ('133'), ('133'),
    10. ('141'), ('141'),
    11. ('121'), ('121'),
    12. ('145'), ('145'),
    13. ('152')
    14.  
    15. select * from #t

    When you run the query, you should see result similar to the following illustration:

    Deleting duplicates is pretty straightforward. You define partitioned row numbering and then you delete all rows that have row number bigger than 1:

    1. ;with dupes as
    2. (
    3.     select *, row_number() over(partition by Id order by Id) as rn
    4.     from #t
    5. )
    6. delete dupes
    7. where rn > 1
    8.  
    9. select * from #t

    As you see, the duplicates were removed.

    The execution plan for the above statement is pretty straightforward:

    The most interesting operator in this plan is the Sequence Project operator. It adds the calculated row number to the output, based on the criteria that you pass to the row_number() OVER clause. Here we partitioned the input by column Id which causes the row number to be restarted from 1 whenever new Id is encountered in the input. The row number is returned as result of internal function row_number, which you can see in the following picture:

    On the left to the Sequence Project operator you can see Filter operator. This operator filters the input based on the certain predicate. You can see in  the picture below, that Filter operator allows to pass though only rows that have Expr1004 > 1.

    You can check that in this particular case only five rows were let through and subsequently deleted. Two with Id 133, one 141, one 121 and one 145.

    What if you want to look at the rows that are deleted? Let’s repopulate the table and run the delete statement, this time with the OUTPUT clause:

    1. truncate table #t
    2.  
    3. insert #t(Id)
    4. values
    5. ('123'), ('124'),
    6. ('133'), ('133'), ('133'),
    7. ('141'), ('141'),
    8. ('121'), ('121'),
    9. ('145'), ('145'),
    10. ('152')
    11.  
    12.  
    13. ;with dupes as
    14. (
    15.     select *, row_number() over(partition by Id order by Id) as rn
    16.     from #t
    17. )
    18. delete dupes
    19. output deleted.*
    20. where rn > 1

    The OUTPUT clause allows you to return rows from pseudo-tables inserted and deleted, the same that you have access to in triggers. You can return these rows to the client or to a temporary table or table variable for further processing, or audit. Let’s have a look at the execution plan.

    What’s this?! The Sort, Segment and Sequence Project operators where added second time to the plan! When you look at their details you will see that they have exactly the same predicates and conditions as the set on the right-hand side of the Filter operator. The difference is that they are applied to the data stream returned by the output clause. You can verify this looking at the results returned by the statement:

    Have you noticed something? the rn column contains values 1 and 2, while it should contain only values 2 and 3, because this is what the Filter operator allowed to flow through. This is the evidence that the new operators work on the output data stream. You can see that they work on 5 rows, the ones that were deleted:

    It is obvious that if you use output clause on queries that otherwise perform very well, you may run into problems. When you have many rows, the Sort operator can become a bottleneck when it is executed twice. The example also shows that the row number identifier of the row is not reliable within boundaries of single query. This may become an issue for you, if you want to identify rows based on the output from an update or delete statement.

    From purely performance point of view, what can you do to avoid these additional operators? The answer is simple – don’t return the row_number() column in the OUTPUT clause.

    1. truncate table #t
    2.  
    3. insert #t(Id)
    4. values
    5. ('123'), ('124'),
    6. ('133'), ('133'), ('133'),
    7. ('141'), ('141'),
    8. ('121'), ('121'),
    9. ('145'), ('145'),
    10. ('152')
    11.  
    12.  
    13. ;with dupes as
    14. (
    15.     select *, row_number() over(partition by Id order by Id) as rn
    16.     from #t
    17. )
    18. delete dupes
    19. output deleted.Id, deleted.randomData
    20. where rn > 1

    As you see in the snippet above, I explicitly return columns from the deleted table and I don’t return the rn column. The optimizer realized that the column is not needed and removed the additional operators from the plan, so looks exactly the same as the first plan we saw in this post.

    Well, if someone asked me what I think about this behavior, I would say it is a bug in the SQL Server engine. The windowing functions shouldn’t be applied to the results of the OUTPUT clause because they affect performance and potentially affect the logic of the application.

    This is the link to the test script, so you can run your own investigations.

  • Security – how to assign rights to database objects

    A few days ago my team mate asked me to check out why one of the logins doesn’t have right to execute to a few stored procedures. Allegedly it had the right before. After a few minutes of digging in the source control it turned out that these procedures were scripted with explicit EXECUTE rights to them. However, some later updates to the code did not contain the rights scripted. Developers rarely pay too much attention to the security design of the code they write. It is not common that the code is implemented with least possible access level in mind. A classic example of this is TRUNCATE TABLE. While this is very handy command and can be very useful in certain scenarios, developers don’t realize that the user account executing it has to have some elevated rights.

    I have seen many databases which implement explicit rights to objects. This is to some degree a legacy issue, as in SQL Server 2000 and before security model was completely different and more difficult to manage. The common practice at the time was to script rights to objects, and to make things simple everyone who had right to connect to database could do everything in it. Even recently I saw a table with explicit rights like this:

    1. create table dbo.sometable
    2. (
    3.     column1 int,
    4.     columnt2 char(1),
    5.     column3 varchar(255)
    6. )
    7. go
    8.  
    9. grant select on dbo.sometable to [public]
    10. grant delete on dbo.sometable to [public]
    11. grant insert on dbo.sometable to [public]
    12. grant update on dbo.sometable to [public]
    13.  
    14. go

    This is definitely a bad practice. 

     

    Let’s examine slightly different scenario, the one that I mentioned at the beginning of this post. Let’s analyse what options you have when you need to give user rights to execute a stored procedure but do don’t want to give rights to public role:

    1. Assign explicit rights for the user to the procedure.
    2. Assign explicit rights to a role and assign user to that role
    3. Create the procedure within a separate schema, assign user to specific role and assign rights for the schema to the role.

    The above three options have their pros and cons. The first option is in most cases an aftermath of legacy design and while it is most granular, it is also hardest to manage and monitor.

    Second option is also very granular, the only difference is after all that the rights are assigned to a database role. Main advantage of this solution is that you can assign more than one user to the database role and they will be automatically granted required access to the objects.

    The third option requires some consideration regarding design of the database. You may need to change some code calling the procedures. You may still want to create a database role and assign rights for the schema to the role. This gives you best flexibility.

    Let’s examine some of the effects each of these options brings in when implemented. I created script file that examines six scenarios you may come across in your database. This is definitely not a complete list, bear in mind.

    First, let’s create test database and login that we will use to test the solutions. I will also create a sample table and stored procedure in the database.

    1. use master
    2. go
    3. create login RemoteLogin with password='Str()ngPwd', check_policy=off
    4. go
    5. create database testExecRights
    6. go
    7. use testExecRights
    8. go
    9. create user [RemoteUser] from login [RemoteLogin]
    10. go
    11. create table dbo.tTest(a int, b int)
    12. go
    13. create procedure dbo.pTestInsert(@a int, @b int)
    14. as
    15. begin
    16.     insert dbo.tTest(a, b) values(@a, @b)
    17. end
    18. go

    The first scenario shows what happens when you create an user in the database, but you don’t assign any specific rights for it.

     

    1. ---Scenario 1: user has public right to the database
    2. execute as login='RemoteLogin'
    3. select suser_name()
    4. go
    5.  
    6.     begin try
    7.         ---should fail
    8.         exec dbo.pTestInsert 1, 1
    9.         return;
    10.     end try
    11.     begin catch
    12.         declare @errormsg nvarchar(2000)
    13.         set @errormsg = N'Scenario 1: Exception occured in ' + error_procedure() +
    14.             N': ' + error_message() + N' (' + convert(nvarchar, error_number()) + N')'
    15.         raiserror(@errormsg, 16, 1)
    16.     end catch

    This code throws an exception, as expected:

    Msg 50000, Level 16, State 1, Line 10
    Scenario 1: Exception occured in pTestInsert: The EXECUTE permission was denied on the object 'pTestInsert', database 'testExecRights', schema 'dbo'. (229)

    Let’s have a look what we’ll get when we assign the right to execute the stored procedure explicitly:

    1. ---Scenario 2: user has explicit right to execute the procedure
    2. ---this is executed at sa/dbo level
    3. grant execute on dbo.pTestInsert to [RemoteUser]
    4. go
    5. execute as login='RemoteLogin'
    6. select suser_name()
    7. go
    8.     begin try
    9.         ---should succeed
    10.         exec dbo.pTestInsert 1, 1
    11.         return;
    12.     end try
    13.     begin catch
    14.         declare @errormsg nvarchar(2000)
    15.         set @errormsg = N'Scenario 2: Exception occured in ' + error_procedure() + N': ' + error_message() + N' (' + convert(nvarchar, error_number()) + N')'
    16.         raiserror(@errormsg, 16, 1)
    17.     end catch
    18. go
    19. revert
    20. select suser_name()
    21. --revoke granted right
    22. revoke execute on dbo.pTestInsert to [RemoteUser]
    23. go

    In line 3 in the above snippet you see the command to grant execute right on the stored procedure to the RemoteUser. Please note that the user still has no other rights – he can’t insert rows to the tTest table directly, he even can’t select from the table. Since the above snippet is part of the bigger script, at the end of it, in line 22 you can see command 'REVOKE’ to – revoke the right granted in the line 3. So, how the above code works? Yes, it works OK. It doesn’t throw exception because the permissions are right.

    As I mentioned before, there were procedures which the remote login mysteriously lost ability to call. In my case, the issue was caused by a release script, which did not re-establish appropriate rights to them. There are two ways you can modify a procedure during release. One is obviously ALTER PROCEDURE, which does not change the security configuration of the procedure, the other is a sequence of DROP and CREATE statements. Scenarios 3 and 4 are shown in the below scripts.

    1. ---Scenario 3: user has explicit right to execute the procedure but the stored procedure is ALTERED
    2. --- in subsequent release
    3. go
    4. ---this is executed at sa/dbo level
    5. grant execute on dbo.pTestInsert to [RemoteUser]
    6. go
    7. go
    8. ---now ALTER the procedure
    9. alter procedure dbo.pTestInsert (@a int, @b int)
    10. as
    11. begin
    12.     insert dbo.tTest(a, b) values (@a, @a * @b)
    13. end
    14. go
    15. execute as login='RemoteLogin'
    16. select suser_name()
    17. go
    18.     begin try
    19.         ---should succeed
    20.         exec dbo.pTestInsert 1, 1
    21.         return;
    22.     end try
    23.     begin catch
    24.         declare @errormsg nvarchar(2000)
    25.         set @errormsg = N'Scenario 3: Exception occured in ' + error_procedure() + N': ' + error_message() + N' (' + convert(nvarchar, error_number()) + N')'
    26.         raiserror(@errormsg, 16, 1)
    27.     end catch
    28. go
    29. revert
    30. select suser_name()
    31.  
    32. go
    33. --revoke granted right
    34. revoke execute on dbo.pTestInsert to [RemoteUser]
    35. go

     

    1. ---Scenario 4: user has explicit right to execute the procedure but the stored procedure is
    2. ---DROPPED an CREATED in subsequent release
    3. go
    4. ---this is executed at sa/dbo level
    5. grant execute on dbo.pTestInsert to [RemoteUser]
    6. go
    7. go
    8. ---now DROP and CREATE the procedure
    9. drop procedure dbo.pTestInsert
    10. go
    11. create procedure dbo.pTestInsert (@a int, @b int)
    12. as
    13. begin
    14.     insert dbo.tTest(a, b) values (@a, @a * @b)
    15. end
    16. go
    17. execute as login='RemoteLogin'
    18. select suser_name()
    19. go
    20.     begin try
    21.         ---should fail
    22.         exec dbo.pTestInsert 1, 1
    23.         return;
    24.     end try
    25.     begin catch
    26.         declare @errormsg nvarchar(2000)
    27.         set @errormsg = N'Scenario 4: Exception occured in ' + error_procedure() + N': ' + error_message() + N' (' + convert(nvarchar, error_number()) + N')'
    28.         raiserror(@errormsg, 16, 1)
    29.     end catch
    30. go
    31. revert
    32. select suser_name()

    Scenario 4 throws an exception as expected:

    Msg 50000, Level 16, State 1, Line 9
    Scenario 4: Exception occured in pTestInsert: The EXECUTE permission was denied on the object 'pTestInsert', database 'testExecRights', schema 'dbo'. (229)

    Obviously there is an issue here. The release process must take care of the proper assignment of the rights. The problem is if the script consists of smaller parts prepared by developers, it may be virtually impossible to verify if the rights are scripted correctly in the release script. One can script all explicit permissions to objects before the release and then reconcile them with the permissions existing after the release. This also can be tricky, as sometimes the rights have to change, sometimes new objects don’t have properly scripted permissions etc. What if you have may users that need to perform certain operations on tables and views? In general, it looks like the idea of assigning rights one by one for all users and objects is bad in most scenarios. Imagine the amount of failure points if you have 3000+ objects in database and you change, say, a few hundred of them within the release. What can you do about it?

    First thing that you should do is to break explicit relationship of users and objects. You can do this by creating a database role which will serve as single point of security assignments. By having several roles in the database, you can tune the security fairly easily without having to rescript permissions for all users each time you need to make a change. Scenario 5 shows such database role example:

    1. ---Scenario 5: belongs to DATABASE ROLE with explicit right to execute the procedure
    2. ---and the stored procedure is DROPPED an CREATED in subsequent release
    3. go
    4. ---this is executed at sa/dbo level
    5. create role [pTestExecRightRole]
    6. go
    7. grant execute on dbo.pTestInsert to [pTestExecRightRole]
    8. go
    9. exec sp_addrolemember 'pTestExecRightRole', 'RemoteUser'
    10. go
    11. ---now DROP and CREATE the procedure
    12. drop procedure dbo.pTestInsert
    13. go
    14. create procedure dbo.pTestInsert (@a int, @b int)
    15. as
    16. begin
    17.     insert dbo.tTest(a, b) values (@a, @a * @b)
    18. end
    19. go
    20. execute as login='RemoteLogin'
    21. select suser_name()
    22. go
    23.     begin try
    24.         ---should succeed
    25.         exec dbo.pTestInsert 1, 1
    26.         return;
    27.     end try
    28.     begin catch
    29.         declare @errormsg nvarchar(2000)
    30.         set @errormsg = N'Scenario 5: Exception occured in ' + error_procedure() + N': ' + error_message() + N' (' + convert(nvarchar, error_number()) + N')'
    31.         raiserror(@errormsg, 16, 1)
    32.     end catch
    33. go
    34. revert
    35. select suser_name()
    36. go
    37. exec sp_droprolemember 'pTestExecRightRole', 'RemoteUser'
    38. go
    39. drop role [pTestExecRightRole]
    40. go

    The issue with the above is that still, when an object is dropped and recreated, the rights assigned to it are lost. The script above throws the same exception again:

    Msg 50000, Level 16, State 1, Line 9
    Scenario 5: Exception occured in pTestInsert: The EXECUTE permission was denied on the object 'pTestInsert', database 'testExecRights', schema 'dbo'. (229)

    To solve this you have to group objects in a similar way you do it with users. While database role is used as a container for usSers, schema  serves the same purpose for database objects. Scenario 6 shows this approach.

    1. ---Scenario 6: user belongs to DATABASE ROLE with explicit right to execute on the SCHEMA
    2. ---and the stored procedure is DROPPED an CREATED in subsequent release
    3. go
    4. ---this is executed at sa/dbo level
    5. create role [pTestExecRightRole]
    6. go
    7. create schema RemoteExec authorization dbo
    8. go
    9. grant execute on schema::RemoteExec to [pTestExecRightRole]
    10. go
    11. exec sp_addrolemember 'pTestExecRightRole', 'RemoteUser'
    12. go
    13. ---now DROP and CREATE the procedure
    14. drop procedure dbo.pTestInsert
    15. go
    16. create procedure RemoteExec.pTestInsert (@a int, @b int)
    17. as
    18. begin
    19.     insert dbo.tTest(a, b) values (@a, @a * @b)
    20. end
    21. go
    22. execute as login='RemoteLogin'
    23. select suser_name()
    24. go
    25.     begin try
    26.         ---should succeed
    27.         exec RemoteExec.pTestInsert 1, 1
    28.         return;
    29.     end try
    30.     begin catch
    31.         declare @errormsg nvarchar(2000)
    32.         set @errormsg = N'Scenario 6: Exception occured in ' + error_procedure() + N': ' + error_message() + N' (' + convert(nvarchar, error_number()) + N')'
    33.         raiserror(@errormsg, 16, 1)
    34.     end catch
    35. go
    36. revert
    37. select suser_name()
    38.  
    39. go

    As you see, I created schema RemoteExec in line 7. I granted explicit right to execute on all objects belonging to the schema to the database role pTestExecRightRole. Subsequently I simulate release process by dropping existing procedure dbo.pTestInsert and creating it in the schema RemoteExec. As you see, I don’t apply any explicit rights to the stored procedure. The fact that it belongs to the RemoteExec schema is sufficient for it to be callable by the user belonging to pTestExecRightRole. You can easily add and remove procedures from the RemoteExec schema and they will automatically inherit any rights you have chosen to assign to them, without any manual coding. More importantly, the rights are not tied to the objects, so you will not loose required functionality after making changes to the objects.

    As I said before, this approach has its consequences. You may need to modify some of the code that calls the stored procedures to explicitly reference schema, in some cases you need to modify the code of the stored procedures. If you work with older code the most likely schema it references is an implicit dbo – object names are used without any schema part. This may be OK and resolve at runtime, but in some cases it won’t especially if you move the procedure to a different schema. It is a good practice to explicitly reference schemas in your code to avoid any confusion and problems.

    You can download the test script with all scenarios and cleanup code from this location. Enjoy.

  • Uninstalling Reporting Server 2008 on Windows Server 2008

    Ha. I had quite disputable pleasure of installing and reinstalling and reinstalling and reinstalling – I think about 5 times before it worked – Reporting Server 2008 on Windows Server with the same year number in name.

    During my struggle I came across an error which seems to be not quite unfamiliar to some more unfortunate developers and admins who happen to uninstall SSRS 2008 from the server. I had the SSRS 2008 installed as named instance, SQL2008. I wanted to uninstall the server and install it to default instance. And this is when it bit me – not the first time and not the last that day Smile. The setup complained that it couldn’t access a DLL:

    Error message: TITLE: Microsoft SQL Server 2008 Setup ------------------------------ The following error has occurred: Access to the path 'C:\Windows\SysWOW64\perf-ReportServer$SQL2008-rsctr.dll' is denied. For help, click: http://go.microsoft.com/fwlink?LinkID=20476&ProdName=Microsoft+SQL+Server&EvtSrc=setup.rll&EvtID=50000&ProdVer=10.0.1600.22&EvtType=0x60797DC7%25400x84E8D3C0 ------------------------------ BUTTONS: OK

    This is a screenshot that shows the above error:

    This issue seems to have a bit of literature dedicated to it and even seemingly a KB article http://support.microsoft.com/kb/956173 and a similar Connect item: http://connect.microsoft.com/SQLServer/feedback/details/363653/error-messages-when-upgrading-from-sql-2008-rc0-to-rtm

    The article describes issue as following:

    When you try to uninstall Microsoft SQL Server 2008 Reporting Services from the server, you may receive the following error message:

    An error has occurred:
    Access to the path 'Drive_Letter:\WINDOWS\system32\perf-ReportServer-rsctr.dll' is denied.

    Note Drive_Letter refers to the disc drive into which the SQL Server installation media is inserted.

    In my case, the Note was not true; the error pointed to a dll that was located in Windows folder on C:\, not where the installation media were.

    Despite this difference I tried to identify any processes that might be keeping lock on the dll. I downloaded Sysinternals process explorer and ran it to find any processes I could stop. Unfortunately, there was no such process.

    I tried to rerun the installation, but it failed at the same step.

    Eventually I decided to remove the dll before the setup was executed. I changed name of the dll to be able to restore it in case of some issues. Interestingly, Windows let me do it, which means that indeed, it was not locked by any process.

    I ran the setup and this time it uninstalled the instance without any problems:

     

    To summarize my experience I should say – be very careful, don’t leave any leftovers after uninstallation – remove/rename any folders that are left after setup has finished. For some reason, setup doesn’t remove folders and certain files. Installation on Windows Server 2008 requires more attention than on Windows 2003 because of the changed security model, some actions can be executed only by administrator in elevated execution mode. In general, you have to get used to UAC and a bit different experience than with Windows Server 2003.

  • BNF – how to read syntax?

    A few days ago I read post of Jen McCown (blog) about her idea of blogging about random articles from Books Online. I think this is a great idea, even if Jen says that it’s not exciting or sexy. I noticed that many of the questions that appear on forums and other media arise from pure fact that people asking questions didn’t bother to read and understand the manual – Books Online. Jen came up with a brilliant, concise acronym that describes very well the category of posts about Books Online – RTFM365. I take liberty of tagging this post with the same acronym.

    I often come across questions of type – ‘Hey, i am trying to create a table, but I am getting an error’. The error often says that the syntax is invalid.

    1 CREATE TABLE dbo.Employees 2 (guid uniqueidentifier CONSTRAINT DEFAULT Guid_Default NEWSEQUENTIALID() ROWGUIDCOL, 3 Employee_Name varchar(60) 4 CONSTRAINT Guid_PK PRIMARY KEY (guid) ); 5

    The answer is usually(1), ‘Ok, let me check it out.. Ah yes – you have to put name of the DEFAULT constraint before the type of constraint:

    1 CREATE TABLE dbo.Employees 2 (guid uniqueidentifier CONSTRAINT Guid_Default DEFAULT NEWSEQUENTIALID() ROWGUIDCOL, 3 Employee_Name varchar(60) 4 CONSTRAINT Guid_PK PRIMARY KEY (guid) );

    Why many people stumble on syntax errors? Is the syntax poorly documented? No, the issue is, that correct syntax of the CREATE TABLE statement is documented very well in Books Online and is.. intimidating. Many people can be taken aback by the rather complex block of code that describes all intricacies of the statement.

    However, I don’t know better way of defining syntax of the statement or command.

    The notation that is used to describe syntax in Books Online is a form of Backus-Naur notatiion, called BNF for short sometimes. This is a notation that was invented around 50 years ago, and some say that even earlier, around 400 BC – would you believe? Originally it was used to define syntax of, rather ancient now, ALGOL programming language (in 1950’s, not in ancient India).

    If you look closer at the definition of the BNF, it turns out that the principles of this syntax are pretty simple. Here are a few bullet points:

    1. italic_text is a placeholder for your identifier
    2. <italic_text_in_angle_brackets> is a definition which is described further.
    3. [everything in square brackets] is optional
    4. {everything in curly brackets} is obligatory
    5. everything | separated | by | operator is an alternative
    6. ::= “assigns” definition to an identifier

    Yes, it looks like these six simple points give you the key to understand even the most complicated syntax definitions in Books Online. Books Online contain an article about syntax conventions – have you ever read it? Smile

    Let’s have a look at fragment of the CREATE TABLE statement:

    1 CREATE TABLE 2 [ database_name . [ schema_name ] . | schema_name . ] table_name 3 ( { <column_definition> | <computed_column_definition> 4 | <column_set_definition> } 5 [ <table_constraint> ] [ ,...n ] ) 6 [ ON { partition_scheme_name ( partition_column_name ) | filegroup 7 | "default" } ] 8 [ { TEXTIMAGE_ON { filegroup | "default" } ] 9 [ FILESTREAM_ON { partition_scheme_name | filegroup 10 | "default" } ] 11 [ WITH ( <table_option> [ ,...n ] ) ] 12 [ ; ]

    Let’s look at line 2 of the above snippet: This line uses rules 3 and 5 from the list. So you know that you can create table which has specified one of the following.

    • just name – table will be created in default user schema
    • schema name and table name – table will be created in specified schema
    • database name, schema name and table name – table will be created in specified database, in specified schema
    • database name, .., table name – table will be created in specified database, in default schema of the user.

    Note that this single line of the notation describes each of the naming schemes in deterministic way. The ‘optionality’ of the schema_name element is nested within database_name.. section. You can use either database_name and optional schema name, or just schema name – this is specified by the pipe character ‘|’.

    The error that user gets with execution of the first script fragment in this post is as follows:

    Msg 156, Level 15, State 1, Line 2
    
    Incorrect syntax near the keyword 'DEFAULT'.

    Ok, let’s have a look how to find out the correct syntax. Line number 3 of the BNF fragment above contains reference to <column_definition>. Since column_definition is in angle brackets, we know that this is a reference to notion described further in the code. And indeed, the very next fragment of BNF contains syntax of the column definition.

    1 <column_definition> ::= 2 column_name <data_type> 3 [ FILESTREAM ] 4 [ COLLATE collation_name ] 5 [ NULL | NOT NULL ] 6 [ 7 [ CONSTRAINT constraint_name ] DEFAULT constant_expression ] 8 | [ IDENTITY [ ( seed ,increment ) ] [ NOT FOR REPLICATION ] 9 ] 10 [ ROWGUIDCOL ] [ <column_constraint> [ ...n ] ] 11 [ SPARSE ]

    Look at line 7 in the above fragment. It says, that the column can have a DEFAULT constraint which, if you want to name it, has to be prepended with [CONSTRAINT constraint_name] sequence. The name of the constraint is optional, but I strongly recommend you to make the effort of coming up with some meaningful name yourself. So the correct syntax of the CREATE TABLE statement from the beginning of the article is like this:

    1 CREATE TABLE dbo.Employees 2 (guid uniqueidentifier CONSTRAINT Guid_Default DEFAULT NEWSEQUENTIALID() ROWGUIDCOL, 3 Employee_Name varchar(60) 4 CONSTRAINT Guid_PK PRIMARY KEY (guid) );

    That is practically everything you should know about BNF. I encourage you to study the syntax definitions for various statements and commands in Books Online, you can find really interesting things hidden there.

    Technorati Tags: ,,,

     

    (1) No, my answer usually is a question – ‘What error message? What does it say?’. You’d be surprised to know how many people think I can go through time and space and look at their screen at the moment they received the error.

  • Setting reporting database–what happens behind the scenes

    When you install reporting server, you have in general two options available – either you install the reporting server and configure its database at the same time, or install just reporting service and point it to an existing database later.

    We had recently opportunity to configure server to point to existing reporting database. We encountered a security related issue, because the account we were using to connect to the SQL Server database engine was not ‘good enough’ to configure reporting service connectivity. Namely, the installation fails if you are not a sysadmin, security admin, or dbo on master and msdb databases.

    So what is happening if you are configuring reporting server to point to existing database? This is how it seems to work in SQL Server 2008 R2:

    First, new role called RSExecRole is added to master database if it doesn’t exist:

    1 use master 2 go 3 if not exists (select * from sysusers where issqlrole = 1 and name = 'RSExecRole') 4 BEGIN 5 EXEC sp_addrole 'RSExecRole' 6 END 7 8 go

    Then do the same in the msdb database:

    1 use msdb 2 3 go 4 5 if not exists (select * from sysusers where issqlrole = 1 and name = 'RSExecRole') 6 BEGIN 7 EXEC sp_addrole 'RSExecRole' 8 END 9 go

    And in reporting database and the temp database for reporting:

    1 USE [ReportServer$SQL1] 2 3 go 4 5 if not exists (select * from sysusers where issqlrole = 1 and name = 'RSExecRole') 6 BEGIN 7 EXEC sp_addrole 'RSExecRole' 8 END 9 10 go 11 12 USE [ReportServer$SQL1TempDB] 13 go 14 15 if not exists (select * from sysusers where issqlrole = 1 and name = 'RSExecRole') 16 BEGIN 17 EXEC sp_addrole 'RSExecRole' 18 END 19 20 go 21

    Next the reporting server configuration manager is mapping user to the login we want to use in each of the databases, interestingly in the order as below:

    1 2 USE [ReportServer$SQL1] 3 4 go 5 6 7 if not exists (select name from master.dbo.syslogins where name = N'RSUser' and sysadmin =1) 8 BEGIN 9 if not exists (select name from sysusers where name = N'RSUser' and issqluser = 1) 10 BEGIN 11 EXEC sp_grantdbaccess N'RSUser' 12 END 13 END 14 15 go 16 17 18 USE [ReportServer$SQL1TempDB] 19 20 go 21 22 23 if not exists (select name from master.dbo.syslogins where name = N'RSUser' and sysadmin =1) 24 BEGIN 25 if not exists (select name from sysusers where name = N'RSUser' and issqluser = 1) 26 BEGIN 27 EXEC sp_grantdbaccess N'RSUser' 28 END 29 END 30 31 go 32 33 34 USE [msdb] 35 36 go 37 38 39 if not exists (select name from master.dbo.syslogins where name = N'RSUser' and sysadmin =1) 40 BEGIN 41 if not exists (select name from sysusers where name = N'RSUser' and issqluser = 1) 42 BEGIN 43 EXEC sp_grantdbaccess N'RSUser' 44 END 45 END 46 47 go 48 49 USE [master] 50 go 51 52 if not exists (select name from master.dbo.syslogins where name = N'RSUser' and sysadmin =1) 53 BEGIN 54 if not exists (select name from sysusers where name = N'RSUser' and issqluser = 1) 55 BEGIN 56 EXEC sp_grantdbaccess N'RSUser' 57 END 58 END 59 60 GO 61

    Eventually the user is added to the RSExecRole:

    1 2 USE [ReportServer$SQL1] 3 go 4 5 if not exists (select name from master.dbo.syslogins where name = N'RSUser' and sysadmin = 1) 6 BEGIN 7 EXEC sp_addrolemember 'RSExecRole', N'RSUser' 8 END 9 10 go 11 12 13 USE [msdb] 14 go 15 16 if not exists (select name from master.dbo.syslogins where name = N'RSUser' and sysadmin = 1) 17 BEGIN 18 EXEC sp_addrolemember 'RSExecRole', N'RSUser' 19 END 20 go 21 22 USE [master] 23 go 24 25 if not exists (select name from master.dbo.syslogins where name = N'RSUser' and sysadmin = 1) 26 BEGIN 27 EXEC sp_addrolemember 'RSExecRole', N'RSUser' 28 END 29 30 go 31 32 33 USE [ReportServer$SQL1TempDB] 34 go 35 36 if not exists (select name from master.dbo.syslogins where name = N'RSUser' and sysadmin = 1) 37 BEGIN 38 EXEC sp_addrolemember 'RSExecRole', N'RSUser' 39 END 40 41

    Ha. I was about to give a rant about implementation of sp_addrolemember system procedure, which in SQL 2005 is using sysusers and other obsolete objects. The version in SQL Server 2008 R2 has been rewritten from what I see, and is using sys.database_principals in its logic. Well done Microsoft Smile.

     

     

     

     

  • database in use–a little trick to disconnect users

    While I am still waiting for my SQL Server 2008 R2 Dev Edition to come, there’s a little trick you can do if you want to drop users from your database and you don’t have sysadmin rights to kill sessions.

    You can set database to single user mode with this command:

    1 alter database your_database set single_user with rollback immediate

    This command disconnects all connected users and rolls back any running transactions. It doesn’t allow you to disconnect particular users, but it may be handy if you are db_owner of the database but don’t have sa right to run KILL command.

    I use this command to restore database from snapshot – the restore command requires that there are no active connection to database while it is executed.

     

    That’s about it for today, next article will be longer, I promise Smile.

     

    Technorati Tags: ,
  • DEFAULT constraints and sp_bindefault

    When you deal with old databases, with origins somewhere around second half of nineteen nineties, you can expect all kinds of oddities, like funny naming conventions, (lack of) referential integrity, absence of clustered indexes and things like rules and defaults created separately from tables DDL.

    I just happen to work with such database. I see many tables which some time ago were migrated from a Sybase system that have their DEFAULT constraints defined with sp_bindefault stored procedure. You don’t have to memorize exact syntax of the invocation of this procedure because it is marked as obsolete and should not be used anymore:

    This feature will be removed in a future version of Microsoft SQL Server. Do not use this feature in new development work, and modify applications that currently use this feature as soon as possible. We recommend that you create default definitions by using the DEFAULT keyword of the ALTER TABLE or CREATE TABLE statements instead. For more information, see Creating and Modifying DEFAULT Definitions.

    Now, when I read the documentation, the sp_bindefault has some interesting capability. If you create default and user defined type (alias), and you bind this default to the type with sp_bindefault, you can inadvertently affect all columns of that type in all tables in the database if they don’t have any default or are using existing default of the alias type:

       1: create table dbo.t1
       2: (
       3: col1 int not null,
       4: col2 int not null constraint DF_t1col2 default(0)
       5: )
       6: go
       7: ---create default for the column
       8: create default dbo.DFValue_1 as 1
       9: go
      10: ---bind default to the column
      11: exec sp_bindefault 'dbo.DFValue_1', 'dbo.t1.col1'
      12: go
      13: ---create udt
      14: create type ssn
      15: from varchar(15) not null
      16: go
      17: --create default - empty string
      18: create default dbo.DF_UDTEmpty as ''
      19: go
      20: --bind default to the udt
      21: exec sp_bindefault 'dbo.DF_UDTEmpty', 'dbo.ssn'
      22: go
      23: --add new column to the table
      24: alter table dbo.t1 add secnumber1 ssn
      25: go
      26: ---insert into the table and select results
      27: insert dbo.t1(col1, col2, secnumber1) default values
      28:  
      29: ---secnumber1 is empty string
      30: select * from dbo.t1
      31: go
      32: ---create second table
      33: create table dbo.t2
      34: (
      35:     col1 int constraint DF_t2_col1 default(10),
      36:     secnumber ssn
      37: )
      38: go
      39: create default dbo.DF_SSNNotEmpty as '<unknown ssn>'
      40: go
      41: ---note here - no @futureonly param provided
      42: exec sp_bindefault 'dbo.DF_SSNNotEmpty', 'dbo.ssn'
      43: go
      44: --insert into t1 and t2
      45: insert dbo.t1(col1, col2, secnumber1) default values
      46: insert dbo.t2(col1, secnumber) default values
      47: go
      48: ---check contents of the tables
      49: select * from dbo.t1 --what?
      50: select * from dbo.t2
      51: go

    If you run the above script you will see that the default value of the column secnumber1 in table dbo.t1 has changed to ‘<unknown ssn>’! You can really hurt yourself if you make a mistake and omit the parameter @futureonly in the call to sp_bindefault. If you use @futureonly parameter, the above behavior will not occur. It’s pretty easy to make changes you don’t know about until user calls you at 3 am on Sunday.

    It is good idea to change tables to use standard column DEFAULT constraints. This should be of course made with care, probably only when you change the tables anyway. But, if you were curious how many tables you have in your database that use the DEFAULT objects, you can run this script:

       1: ---all default objects with names of tables they are assigned to.
       2: select b.default_object_id, a.name default_object_name, 
       3: b.object_id table_id, object_schema_name(b.object_id) table_schema, 
       4: object_name(b.object_id) table_name, b.name column_name, b.column_id
       5: from sys.objects a inner join sys.all_columns b
       6: on a.object_id = b.default_object_id
       7: where 1=1
       8: and a.object_id not in (select object_id from sys.default_constraints)

    Full testing script is attached to the post as usual, enjoy :).

  • Partitioned views

    I must say I am a big fan of views. They give flexibility of changing the logic of the data retrieval without affecting DAL layers in your application. I posted series of articles about updateable views and triggers on views, their application in SSIS for example.

    Recently we finished development of system that loads data from several external sources, makes some calculations on the data and sends data over FTP to downstream systems. I designed the whole data interface to use views and stored procedures. This way, even though we used ORM to perform some of the activities, it was not accessing bare tables. This is good, because you can implement simpler and more flexible security model. The other advantage is that you if you use some simple techniques, you can load data to table and then expose the new data through a view without exposing the logic to the upper layers of your application.

    It is quite common requirement to be able to maintain multiple versions of data within table. The requirement is also that data import should not affect other processes which read current version of data. With views it is really simple to achieve. There are many ways of doing this, but let’s have a look at partitioned views technique.

    Partitioned view is a view that selects data from more than one table. These tables have to follow specific requirements. Amongst the other, there has to be so called partitioning column defined, which has trusted CHECK constraint defined on it. The tables have to be joined by UNION ALL operator. If you want the view to be updateable, the partitioning column has to be part of the primary key of all tables belonging to the view. And you have to select all columns from the table.

    OK, but what is the purpose of creating partitioned views? The short answer is performance. Tables tend to grow bigger and bigger. It becomes more and more expensive to run index updates, backups, and queries against them. If you are able to partition your data and you do it in a right way, you can alleviate some of the issues related to big tables. Classic example would be partitioning by date – you can keep month worth of data in each table and span the view over 12 tables to cover the whole year. Nice thing about partitioned views is that you can index underlying tables in a different way, so for example history tables have more indexes allowing for faster data retrieval, while current month table can be optimized for faster inserts and updates.

    Let’s have a look at this scenario. I will use as an example table Sales.CurrencyRate from AdventureWorks database. This table contains 13532 rows in my database, but let’s imagine we have 135 million rows.

    When I run this query, I get 37 rows spanning years 2001 - 2004:

       1: select count(*) NumberOfRates, Year([CurrencyRateDate]) RateYear, Month([CurrencyRateDate]) RateMonth 
       2: from Sales.CurrencyRate    
       3: group by Year([CurrencyRateDate]), Month([CurrencyRateDate])   
       4: order by RateYear, RateMonth

    We will create four tables, each for each year that is represented by CurrencyRateDate column.

    A slightly abbreviated DDL for the Sales.CurrencyRate table is as follows:

       1: CREATE TABLE [Sales].[CurrencyRate](
       2:     [CurrencyRateID] [int] IDENTITY(1,1) NOT NULL,
       3:     [CurrencyRateDate] [datetime] NOT NULL,
       4:     [FromCurrencyCode] [nchar](3) COLLATE Latin1_General_CI_AS NOT NULL,
       5:     [ToCurrencyCode] [nchar](3) COLLATE Latin1_General_CI_AS NOT NULL,
       6:     [AverageRate] [money] NOT NULL,
       7:     [EndOfDayRate] [money] NOT NULL,
       8:     [ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_CurrencyRate_ModifiedDate]  DEFAULT (getdate()),
       9:  CONSTRAINT [PK_CurrencyRate_CurrencyRateID] PRIMARY KEY CLUSTERED 
      10: (
      11:     [CurrencyRateID] ASC
      12: )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
      13: ) ON [PRIMARY]

    If we want to have updateable partitioned view, we need to modify the constraint to include date part, which will allow the execution engine to identify partition that has to be modified during DML operation. The easiest is to add the CurrencyRateDate column to the primary key.

    Let’s create four tables then, one for each year of the data. I will not place here whole DDL for all tables, you can find it in the attached file. This is however the first partition table:

       1: CREATE TABLE [Sales].[CurrencyRate_2001](
       2:     [CurrencyRateID] [int] IDENTITY(1,1) NOT NULL,
       3:     [CurrencyRateDate] [datetime] NOT NULL,
       4:     [FromCurrencyCode] [nchar](3) COLLATE Latin1_General_CI_AS NOT NULL,
       5:     [ToCurrencyCode] [nchar](3) COLLATE Latin1_General_CI_AS NOT NULL,
       6:     [AverageRate] [money] NOT NULL,
       7:     [EndOfDayRate] [money] NOT NULL,
       8:     [ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_CurrencyRate_2001_ModifiedDate]  DEFAULT (getdate()),
       9:     constraint CHK_CurrencyRateDate_2001 CHECK (CurrencyRateDate>= '20010101' and CurrencyRateDate < '20020101'),
      10:  CONSTRAINT [PK_CurrencyRate_2001_CurrencyRate_2001ID] PRIMARY KEY CLUSTERED 
      11: (
      12:     [CurrencyRateID] ASC,
      13:     [CurrencyRateDate] ASC
      14: )
      15: ) ON [PRIMARY]

    As you see above, the CurrencyRateDate column has been included into the primary key of the table and I added constraint CHK_CurrencyRateDate_2001 that will take care of the range of dates that can exist in the table. You can use inequality constraint if you’d like to store rows for dates from 2001 or older. Tables for 2002, 2003 and 2004 contain constraints disallowing inserting data outside of the ranges, similar to the the constraint shown above.

    I removed identity column from partition tables to make the view updateable – this is one of requirements.

    The view is defined as follows:

       1: create view Sales.vCurrencyRate
       2: as
       3: select * from Sales.CurrencyRate_2001
       4: union all 
       5: select * from Sales.CurrencyRate_2002
       6: union all
       7: select * from Sales.CurrencyRate_2003
       8: union all
       9: select * from Sales.CurrencyRate_2004

     

    Let’s insert some data to the view – that is, the underlying partition tables:

       1: insert Sales.vCurrencyRate
       2: (
       3:     [CurrencyRateID],
       4:     [CurrencyRateDate],
       5:     [FromCurrencyCode],
       6:     [ToCurrencyCode],
       7:     [AverageRate],
       8:     [EndOfDayRate],
       9:     [ModifiedDate]
      10: )
      11: select [CurrencyRateID],
      12:     [CurrencyRateDate],
      13:     [FromCurrencyCode],
      14:     [ToCurrencyCode],
      15:     [AverageRate],
      16:     [EndOfDayRate],
      17:     [ModifiedDate] from Sales.CurrencyRate

    This is the execution plan created for the insert:

    PVInsert1

    As you see, database engine split data amongst all partition tables according to the partitioning column constraints. This way, you can easily move data from one table to several smaller partitions. Of course, you wouldn’t want to run simple insert statement as above for 135 million rows, but you get the idea.

    Partitioned views can be very useful, but apart from the conditions that I listed earlier in this article, there are some catches that might bite you and you have to carefully ponder on your design if you want to make them updateable.

    These are some of these gotchas:

    Tables referenced in updateable partitioned views cannot have identity column. If they have, you will get following error message:

    Msg 4433, Level 16, State 4, Line 3
    Cannot INSERT into partitioned view 'AdventureWorks.Sales.vCurrencyRate' because table '[AdventureWorks].[Sales].[CurrencyRate_2001]' has an IDENTITY constraint.

    Tables cannot have computed columns if you want to have updateable partitioned view. If you try to insert rows into view on tables that have such columns, the error message is following:

    Msg 4406, Level 16, State 1, Line 1
    Update or insert of view or function 'Sales.vSalesOrderHeader' failed because it contains a derived or constant field.

    The partitioning column has to be defined using equality or inequality operators. For example, you cannot use datepart(year, …) to defined date range for the column, you have to use either BETWEEN.. AND.. or >=.. < operator. Otherwise the error message will be as follows:

    Msg 4436, Level 16, State 12, Line 3
    UNION ALL view 'AdventureWorks.Sales.vCurrencyRate' is not updatable because a partitioning column was not found.

     

     

    The partitioned views do not have to be updateable. You can design tables with computed columns, IDENTITY columns and still use partitioned views to offload certain queries. One of the scenarios is for example daily upload of data. You can create a staging table which will accept the data and after ETL is finished, you alter view definition to include new table and possibly exclude one with older data.

    I attach the script for your convenience. You can run it on AdventureWorks database or use it as a template for your code.

  • Hello

    Hello.

    This is my first post on SQLBlog. I have been blogging for a few years now. Initially I used SQLBlogCasts platform, but in June I moved my blog to Blogspot. I underestimated though the drop of the visibility of my blog. I fought hard, but the visibility hasn’t increased to appropriate levels – and it is hard to maintain the determination if your work is not useful. Thankfully Adam Machanic offered me helping hand and the blog on SQLBlog.com. This is a proposition very hard to refuse. I feel honored to be listed on the same page along with the biggest names of the SQL Server world.

    I am going to write mainly about topics related to the design of database, issues I encounter in my work and their solutions. I may write sometimes about SSIS and SSRS, because these products are also in scope of my attention.

    It took a while to set up everything, plus I was on vacations, but eventually I am ready to go.

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement