THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Merrill Aldrich

  • SQL File Layout Viewer at SQL Saturday Chicago 2014

    I'm very excited to be speaking at SQL Saturday Chicago again tomorrow, 4/26. I'll be doing a deep dive session on SQL Server file and storage internals with this freeware tool, SQL File Layout Viewer:

    http://sqlblog.com/blogs/merrill_aldrich/archive/2013/10/09/sql-file-layout-viewer-1-2.aspx

    Also looking forward to seeing all my Chicago SQL Server friends. Come on out!

    http://sqlsaturday.com/291/schedule.aspx

  • Snap All the Things!

    Database snapshots can be super handy for things like trying changes in a development environment, or for reverse engineering processes. For example, you can say, “Let’s run this test and then roll the database right back,” or, “Let’s run this process and see what it modifies by comparing the database to a prior snapshot.” That said, I always found it painful to compose the actual statements to create the snapshots, especially, say, for all the user databases on a whole instance. The pain is in the picky details of locating all the logical files and all the physical paths in all the databases. I thought, “Wow, it would be nice to just code-gen the statements to snap all the databases on an instance.”

    Doing this code-gen in T-SQL is also tough, because you have to run queries against tables like sys.database_files, in the context of each database, and that leads to dynamic SQL and dynamic SQL leads inevitably to suffering. To the Shell!

    The statements we need to compose look like:

    CREATE DATABASE [myDB_Snap] ON
        ( name = LogicalName1, filename = 'PhysicalPath1' ),
        ( name = LogicalName2, filename = 'PhysicalPath2' ),
        <etc.>
        ( name = LogicalNameX, filename = 'PhysicalPathX' )
    AS SNAPSHOT OF [myDB] ;

    Conceptually, then, we need to loop over all the databases on the instance, and for each database loop over all the data files. For each file, we need to construct a new physical file name, and then compose the resulting list of files into the middle section of the CREATE DATABASE statement.

    In PowerShell, we can easily loop over all the databases and gather up all their data files with a combination of Get-ChildItem and foreach{}. Assuming you have the SQL Server 2012 sqlps module loaded, that’s something like this, replacing ‘localhost’ and ‘sql2012’ as appropriate for your machine:

    Set-Location SQLSERVER:\sql\localhost\sql2012\databases
    Get-ChildItem | 
    foreach {
        # Looping Through DBs
        $_.name
    }

    In order to drill into the data files for each database, we need to go through the <database>\FileGroups node. We can do that with pair of nested loops, like the following. This example has some extra statements that just output the PowerShell paths, so we can see how we’re navigating the Database\FileGroups\Files tree:

    Set-Location SQLSERVER:\sql\localhost\sql2012\databases
    Get-ChildItem | 
    foreach {
       "Database: $($_.name)"
       $fgroot = ( join-path $_.name 'FileGroups' )
       "Location of FileGroups: $fgroot"
       Get-ChildItem $fgroot |
       foreach {
          $fgpath = ( join-path $fgroot $_.name )
          "Location of Files: $fgpath"
          Get-ChildItem ( join-path $fgpath 'Files' ) |  
          foreach {
            "File: $($_.name) at $($_.filename)"
          }      
       }
    }

    At this point, we have the bones of that CREATE DATABASE statement – we just need to make one CREATE DATABASE statement for each database, and in the body of each of those statements compose an array of comma-separated ( name = x, filename = ‘y’ ) clauses. Here’s the resulting script, with one or two other features added:

    # Compose a T-SQL Script to snap all the user databases on an instance
    
    # Databases to skip:
    $excludedDBs =  @( 'master','model','msdb','tempdb' )
    
    # Suffixes to add to the snapshot files and the db snapshot name:
    $fnsuffix = 'ss'
    $DBsuffix = '_SNAP'
    
    # Arrays/variables used to build the SQL statements:
    $sqlstmtarray = @()
    $onclausearray = @()
    $sqlstmt = ''
    
    Set-Location SQLSERVER:\sql\localhost\sql2012\databases
    
    Get-ChildItem | 
    where { ! ( $excludedDBs -contains $_.name ) } |
    foreach {
       # Loop over databases on the instance
    
       $fgroot = ( join-path $_.name 'FileGroups' )
       Get-ChildItem $fgroot |
       foreach {
          # Loop over the filegroups in the current DB
    
          $fgpath = ( join-path $fgroot $_.name )
          Get-ChildItem ( join-path $fgpath 'Files' ) |  
          foreach {
            # Loop over the data files in the current filegroup
    
            # Compose a line like '( name = logicalname, filename = physicalname + suffix )' for each file
            # and add the line to an array for use later
            $onclausearray += "`( name = [$($_.name)], filename = `'$($_.FileName)$($fnsuffix)`' `)"
    
          }      
       }
       
       # Use the results to compose a Create Database statement 
    
       $sqlstmt = "CREATE DATABASE [$($_.name)$($DBsuffix)] ON`n"
       $sqlstmt += ( $onclausearray -join ", `n" ) # –join converts the array to a comma-sep list
       $sqlstmt += "`n"
       $sqlstmt += "AS SNAPSHOT OF $($_.name) ;"
       $sqlstmtarray += $sqlstmt
       
       # Clear variables for the next iteration
       $sqlstmt = ''
       $onclausearray = @()
    }
    
    # Output all the Create Database statements
    
    $sqlstmtarray

    That’s that! This works for most simple cases, by putting the snap files in the same directories with the original files, and just adding ‘ss’ to the end of the source databases’ physical file names.

  • Speaking in Minneapolis Tuesday Jan 21

    I am happy to offer a session on Storage Internals using File Layout Viewer at the Minneapolis SQL Server User Group January 21, 2014.

    The meeting details are here:

    http://minnesota.sqlpass.org/ 

    SQL Server File Layout Viewer:

    http://sqlblog.com/blogs/merrill_aldrich/archive/2013/10/09/sql-file-layout-viewer-1-2.aspx 

    Come on down and geek out with us!

  • Automation? Think Causation, not Correlation

    I do a fair amount of operations work in my DBA role, so obviously automation is one of my favorite topics. In fact, I get a little twitchy when I see people plowing through repetitive, rote, manual processes because, to me, those people could instead be doing more creative, satisfying and useful work. Still, that doesn’t mean we ought to stumble ahead attempting to automate everything.

    tsql2sday This post is for T-SQL Tuesday #050: Automation, how much of it is the same? hosted by SqlChow ( http://sqlchow.wordpress.com/ | @SqlChow )

    From his blog:

    ”T-SQL Tuesday is a monthly blog party hosted by a different blogger each month. This blog party was started by Adam Machanic ( blog | @adammachanic ). You can take part by posting your own participating post that fits the topic of the month and follows the requirements below. Additionally, if you are interested in hosting a future T-SQL Tuesday, contact Adam Machanic on his blog.”

    While I like automation, I love well thought-out, clear, reliable automation. I always ask myself a few questions when trying to create an automated solution - but I’ve never formalized them into a list that could be shared. That got me thinking: What is it that makes for really good automated solutions? Is there some checklist that could be applied?  Here’s what goes through my head.

    First ask, “Should we do this?” It seems obvious, but there are good and bad candidates for automation. Next, assuming the process at hand is conducive, spend a little quality time on “How should we do this?” There are myriad ways to automate different systems, and I’ve seen some pretty spectacular hammer-the-screw solutions, as I’m sure you have. In real teams there is often a variety of skills and preferences about tools, and it’s important to be objective about picking the right tool over the simply expedient or familiar one. Pick a technology that fits both the task and the organization. Do something expected, so you don’t set a boobie trap for others.

    Next, once the solution is coded up, it’s worth looking at with a critical eye. Is this any good? Does it really fit the original problem? Will it run reliably?

    Lastly, ask whether your teammates or even those who come after you will appreciate that you created a helpful solution, or suffer under it as it breaks down or causes problems.

    One theme underlying all these small decision is to create causation – make things happen for a clear reason and make one action follow the next in a clear cause/effect, predictable way. This goes for the “meta” decisions about whether and how to automate, and also for the details about how your solution is coded.

    So, here’s a short worksheet with some talking points that you can use individually or with your team to improve the effectiveness of automation.

    Should We?

    The first question to ask is whether it’s worthwhile to automate a process. I find that by the time you are even asking this question the answer is probably “yes,” but it’s worth pausing to consider. Is it in the business’s interest to automate? Among the thousands of things you could work on, does this one help the organization? A simple example: If I have to restore one database in an ad-hoc fashion, I don’t automate it. I might even (gasp) use the GUI. Why? Because there’s no benefit from the time invested. On the other hand, if my team has to regularly restore from backups for testing, then absolutely, we automate that.

    Other questions:

    Is the manual process conducive to automation?

    Is the process repetitive?

    How much brain power is involved in performing the existing task? Is there risk of errors from the inconsistency of people performing the work, or, conversely, are there so many choices in the process that automation would not provide the same quality of outcome?

    Will it pay off to invest the extra time to write something production-ready? Anyone can dash off a script in a few minutes, but that’s not really production-ready code that is good enough to drop into an automated solution.

    Will having a machine do this work free up a person to do more useful work?

    How?

    First, an anecdote: I once worked with a guy who was a *nix admin, and we had some automation we needed for basic OS tasks, like move a file from here to there, etc. He didn’t know PowerShell, so he went to the Windows server involved, installed Cygwin and wrote a shell script. This is a smart guy – he would not have been able to do that otherwise – but that was the wrong answer. Why? Because we were a 99% Windows shop, and he created something he personally would have to support forever. It would be just as wrong to go into a Linux shop and use PowerShell to script stuff. It’s not because the technology is better or worse, it has to do with the context of available people, skills, systems and the task.

    Other questions:

    Have you picked a tool that fits the task? I find that the more complicated the code has to be, and the more you struggle to write it, the more likely it is that the tool is the wrong one. It could also be that you need to learn the tool better, but assuming basic skills, in today’s world most things we face day to day are common problems and so the right tool will feel like an easy fit.

    Are you reinventing the wheel? If you are working really hard, pause and think, “There has to be a code library for this.” Use the available frameworks to plug together existing functionality, because almost everything has been done already by someone else.

    Is this method supportable in my organization? Is it something the other staff will expect as a reasonable solution? Or will they say, I don’t understand that / I’ve never seen this before / This tool is totally unfamiliar, or, worst, You did what?” Consider whether the design follows naturally from the problem, or is forced and bent to solve it.

    What about Qualtiy?*

    Once we’ve coded something up that we think fits, it’s still worthwhile to look at it with a critical eye. This can be tough, psychologically, for some folks because, naturally, we all have some pride in the things we produce. When I was in Architecture school, one of the most painful but also useful activities was to constantly have to hang the work up on the wall and have people critique it, sometimes ruthlessly. This process was humbling but it definitely creates a mindset where you can be objective about your work.

    Quality questions:

    Is it finished? The first draft of one of these solutions probably isn’t the real answer. Think carefully before pronouncing the design “done.”

    Is it brittle?

    Is it a landmine? A landmine is something that will sit there working away for some time, but then explode, out of nowhere, all over staff that doesn’t know it’s there or how it works.

    Is it a Rube Goldberg Machine? Are there too many moving parts? I like to keep a Rube Goldberg Machine illustration at my desk, just as a reminder.

    Pretend you are a person who has never seen this before. When designing this did you use a tool that was expected, in the way it was designed?

    List the ways this automation will fail, and what happens for each type of failure. What on the list is preventable?

    Are there race conditions? Invalid assumptions? One of my favorites: “At the start of this script we assume that file ‘x’ is available in this location at 10:00.” That might be true sometimes – but where does said file come from? What happens on the day it’s not there? Late? Is it possible, or better, to create a cause and effect relationship where after a source system successfully delivers a file, the next step in the process is kicked off?

    Did you test it? “Happy path” testing, or real testing?

    Will the People After Us Hate Our Memory and Curse Us Forever?

    Lastly, I find that it’s incredibly useful to consider what people will think of this solution later. While you can’t predict the future, and not every choice can be perfect, if there’s an automated system that is constantly breaking down, or that nobody understands, it’s worth paying attention. Why is it so awful? Beyond the specific details of what is broken, are there patterns and practices that were not followed that would have prevented the situation?

    Can your automation recover from an error? How big does the error have to be to require manual intervention?

    Does it tell anyone when there’s a failure? I have tried to work with a commercial file transfer system that didn’t notify anyone or retry when a file failed to copy from one location to another. You can imagine how that goes, in the middle of the night, on call, etc.

    Lastly, does the system plug in neatly to a modular, Enterprise pattern of scheduling and/or monitoring and notification? No one is happy in a large organization with tens or hundreds of little scripts, one that sends an email, one in VBScript hiding in Windows Scheduled Tasks, one that logs to a file, one that logs to the event log, one in SQL Agent, one in Orchestrator, one in SSIS, one in Oracle Enterprise Manager. Over time, standardize how automation is driven with a rational scheduling system, and how notification works in case of failures. In the real world perfection probably is unrealistic here, but having standard patterns helps tremendously.

    True story: we had a SQL Agent job that called a VBScript that purged files from a share to make up for a badly designed web service that needlessly cached thousands of files to disk. This turned out to be a classic landmine: It worked, but nobody expected it to be there, doing what it was doing, because it just didn’t make sense. After a server move, the files filled that cache and brought the associated system down. Poor automation.

    One More Thing

    I’ve been careful not to point out any specific language or system or tech in this post, because I think these questions apply across all Ops-type technologies. Whatever environment you work in can enable high quality solutions if you pay attention to these issues.

    That said, Hemanth asked in the T-SQL Tuesday invitation what tools we like for automation, and, I have to say PowerShell is the best thing that has happened to Windows in the Enterprise in a long, long time. It makes practically everything on this list better, at least in a Windows world. PowerShell FTW!

    * Yes, misspelled.

  • Organize Those Holiday Photos with PowerShell

    My parents are here visiting with us for a little holiday vacation – and my mother is getting to be more and more computer savvy. She has an iPhone and a digital camera, and loves to take pictures. In fact, today she has almost 6,000 photos between the two devices! She takes so many photos that the media on the devices fills up, and then she goes back through the old photos deleting some just to make space. But moving the files to a computer or a DVD is still beyond her reach, so when she comes to visit there is usually a little family time burning all the photos that she has taken, since the last time we were together, to disks.

    There are so many pictures and movies this time I created a PowerShell snippet to organize them into folders by year and month. In case anyone else needs to organize a giant folder full of files by date, I thought I would share:

    Set-Location '\\Machine\MerrillAldrich\Mom''s Pictures'
    
    get-childitem -Filter *.jpg | foreach {
        $y = get-date $_.creationTime -Format 'yyyy' 
        $m = get-date $_.creationTime -Format 'MM'
        $folder = join-path $y $m
        if ( ! ( test-path $folder ) ) {
            mkdir $folder 
        }
        move-item $_ $folder
    }

    In this case we put the photos onto my MacBook Pro, but PowerShell is so handy it was worth sharing the folder out on the network to run this little script from my ThinkPad.

    Happy New Year!

  • CIOs: Stop Mandating Training

    I love to learn about new technology, and I especially love a long deep-dive technical session with a real expert or a well-crafted, inches thick technical book. Even if either one is expensive. Learning is probably my favorite thing to do.

    Yet I stand before you with an appeal: Stop “sending people to training.”

    Why would I say such a thing? Because failure is baked right into that very phrase: “sending people to training.”

    Death by Training

    Most of us in the IT world have probably experienced this scenario:

    The company has “vouchers” for “training” to be used at some training facility. A group is “sent” there to learn about some technology we ostensibly lack that will help the organization. In the group we have, let’s call him “Joe,” who is relieved to be out of the office with a legitimate excuse not to do any work, because it saves him from the work of not doing any work in the office. There is also “Phil” who has real work to do, but can’t, because he’s been sent to training. His phone is likely to ring a lot in the session, and he may or may not pay attention. Next to Phil is “Pat” who goes to every possible training session with enthusiasm, but as far as anyone can tell has not retained much of any use from those many hours. Undaunted, she is eager to soak up this WhateverTech 2.0 because it really is the future, etc. etc. etc.

    At the front of the room is a barely-interested instructor who just read up on this subject matter the night before, and will lead us in a death march of alternating PowerPoint slides, interactive exercises and coffee/bio breaks.

    This is deadly. It’s waste of time for the individuals involved and a waste of money for the company. If there is anyone capable in this episode, it’s even debilitating for them. This will fail, because the people participating aren’t going to provide the outcome you seek. It’s because they are not engaged, or not capable, or both. It’s because the situation is framed by expecting them to sit through training, not expecting them to succeed at their work. It almost guarantees the vouchers and the hours are wasted.

    Expect Learning, not Training

    What can you do instead? The technology landscape does change rapidly and the people on your team really do need to stay engaged in what is new, to keep your organization efficient and competitive. What you need is a culture of learning, not a culture of training. Consider whether you can incorporate these ideas into your team:

    1. Expect learning instead of expecting training. Create a culture in your team where new skills are valued because there’s a genuine sense that they matter, and reward the people who bring those new skills.
    2. Measure the team’s performance and make visible where new skills and tech have made a difference. Someone automated a horrible manual process? Celebrate that. Fixed a reliability issue that plagued operations? Celebrate. Created an HA solution where you needed one? Super. But make sure these gains are real, so the tech staff who know what’s what on the floor don’t become jaded about false praise.
    3. Support your team in finding quality learning opportunities they they are excited about. In my field that something like the PASS Summit, or SQL Skills Immersion Events. Fund those, and let people go (notice I didn’t say “send them”). If they aren’t excited about real learning opportunities, then start to wonder about how they are fitting in to this learning culture you need to create.
    4. Make learning a legitimate part of your team’s time – “I don’t have time to learn PowerShell” is not something you want to hear, just for example. Then make it an expectation. Then find and reward the results that people produced who stepped up with real commitment.
    5. Find the real skills gaps on your team, and proactively locate people to fill them. Look ahead six months or a year and see if you have people to successfully meet the changes coming down the road. Ask your best technical staff where the gaps are.

    This might look different in different organizations, and people have different styles. I can read a 1,200 page technical book, if it’s a good one. It takes hours – but the organization benefits from that time as I get better at my job. I realize not everyone wants to read like that. Some people use the web, some like conference attendance or conventional training - but good quality training and not PowerPoint death marches – and still others would like mentoring.

    No matter the style, it’s up to you to make the structure and create a set of expectations where the people on your team can, and want to, learn and grow. If they are personally committed then the team becomes unstoppable.

  • Why Hekaton In-Memory OLTP Truly is Revolutionary

    I just returned from the PASS Summit in Charlotte, NC – which was excellent, among the best I have attended – and I have had Dr. David DeWitt’s talk rolling around in my head since he gave it on Thursday. (Dr. DeWitt starts at 27:00 at that link.) I probably cannot do it justice, but I wanted to recap why Hekaton really is revolutionary, and not just a marketing buzzword.

    I am normally skeptical of product announcements, and I find too often that real technical innovation can be overwhelmed by the desire to sell product, and then buried under so many layers of hype that it can be hard to decipher the real value. Ask me about *cough cough* Exadata sometime, or Big Data, if you want proof. In-Memory Databases can also tend this direction.

    This is not that. Dr. DeWitt was genuinely excited by the technology itself, and so am I. So what is it, and why is it important? First let me talk about what it isn’t. I have to use the line of Dr. DeWitt’s argument a lot here, so I hope he will forgive me for borrowing it.

    Hekaton is not Just Your Tables in RAM

    Simply having enough memory to hold the active portions of a database in memory is not enough to make real progress on the performance of database systems in today’s hardware. If you have only worked on systems that are disk-bound, it’s easy to imagine that solving the disk problem would immediately resolve everything. I have worked with a few people who argued along that line: “If only we could fit the database in RAM, queries would be instantaneous.” However, as soon as you work on a system with a larger memory resource and still see how long it still takes to process data, just between the RAM and CPU, you quickly realize that huge RAM is not a silver bullet. Even with all the data in a conventional buffer cache, with little disk IO or very fast disk IO, there’s certainly a ceiling.

    What makes Hekaton significant is that the engineers realized that to make orders of magnitude performance increases in in-memory databases, some deep architectural changes would be required. I think, over and above Columnstore, that this is the most significant change in database tech that I have seen. This is a change that keeps the basic principles of the relational model intact, while totally revising the engine inside the server. It’s a bit like making a car, which all of us can still drive, and which feels familiar, with a brand new kind of engine under the hood. Not just a faster engine -- a different kind of engine.

    This is still ACID, still driven by transactions and SQL, and we still interact with it using the same principles, but because there’s a new technique for working on rows under the hood, it’s dramatically faster than just tables in memory behind the existing query processer.

    It’s not Just Removing “SQL-type” Relational Locks

    One of the sound-bite friendly bits of this release is that it’s “lockless” or “latch-less,” and that’s a concept I can imagine being hyped and creating a lot of misconceptions. This is not just SQL Server running without locks. If that were the case, it would not work.

    First, some background. There are two kinds of locks: locks down at the programming level of the server software itself that are used for code synchronization and locks that we use at the relational level to enforce transactional consistency. I find it helps to think of it this way: the developers that write SQL Server have to create parallel processing code for the server that works without crashing or memory corruption. It has to be stable, while allowing multiple threads to run at the same time, accessing shared items in memory but not corrupting them. In Comp. Sci. or non-SQL circles this synchronization is an important ingredient that makes parallel processing possible, and it’s conventionally implemented with “locks.” Note this, though: synchronization is mandatory. Locks are just the developer’s traditional method to deliver that synchronization. But this is not related to database transactions. It’s not the locks we talk about in SQL Server. It’s about whether the server code will actually run. I’ll return to this in a moment.

    To prevent confusion, when we DBAs have to fiddle with the server at this level to troubleshoot issues, the naming convention in SQL Server land for this code-sync lock is a latch. So, when we mean source-code level parallel thread safety we refer to the thing that implements that as a latch, while developers working on other programs than SQL Server might call it a lock.

    Atop this, the database system itself should deliver to its users an extra layer of transactional consistency that makes the data in the database logically correct. This is also mandatory. Database Locks are the conventional method of implementing transactional consistency at the database level for users.

    So:

    Code-level synchronization is required so the server doesn’t crash or behave erratically.

    Transactional consistency is required so that we get the results we expect from the database.

    Locks and latches are a conventional and proven way to achieve the dual requirements of synchronization and consistency. You can’t just take them away without providing a substitute that also works. This is where the genius starts to show in Hekaton.

    To remove the delays created by database-level transactional locking, they didn’t just rip the locks out. They engineered an optimistic, row-versioning-driven way to process changes that is still transactionally consistent. Without that innovation, not having locks would be a problem, not an optimization. Hekaton is remarkable for providing that alternate means that still provides consistency, without the use of locks as its implementation. At a high level it looks a bit like snapshot isolation does for reads, except that it works for writes, and it’s much more optimal for working with in-memory data structures.

    It’s not Just Removing Latches

    The new query processing method, using optimistic, row-versioned concurrency even for writes inside the engine, would, on it’s own, be a huge accomplishment. But wait! There’s more. To return to the question of code synchronization, we still need that down at the binary level to make the server run. That requirement doesn’t go away.

    So what happened here? Well, as Dr. DeWitt describes, a gentleman called Maurice Herlihy did work in the area of providing code synchronization with a different, far faster, method than latches. This is a similar problem, but not exactly the same, as the locking problem. They share the symptom of blocking threads and creating idle CPU cycles, by keeping a sync lock on a shared resource that multiple threads want to work on. This issue, though, moves beyond user transactions and has to work at the source-code level to guarantee synchronization. Without this, the program itself would tip over, values would be corrupted in memory, bad things would happen.

    Dr. Dewitt could not explain the ‘how’ here in depth. I would take him at his word that it’s probably too complex for his talk. The important idea that I took away is that there now exist techniques that still guarantee code synchronization for parallel work but do not use latches as the implementation of that code synchronization. Whatever the alternate implementation is, it’s dramatically more efficient and solves this problem in a new way. I think the proof of this did show in the performance numbers in the presentation. I would love to know more, but I take him at his word that this works.

    It’s not Just Faster Machinery

    The third thing I really appreciated in the talk is the constant mention of CPU instructions per user transaction, instead of time per user transaction. This orientation to saving instructions per transaction is the key to faster systems in the future, as the number of cores in processors increases but their speed doesn’t. I think the emphasis on making the processing of in-memory data more CPU efficient is fantastic.

    Someone finally had the insight to see that where burning CPU cycles on existing machines, when we were typically disk bound, was no big deal, but now, when there are no cycles to spare because we are not waiting for the disk, it’s become strategically important to make fewer cycles process more data. And Hekaton does that. With the combination of the two locking and latching architecture changes above plus native machine-code compilation to eliminate the overhead of interpreted code, the same CPUs can do more work in fewer instructions.

    The reason I am excited about this is that it will bring big performance gains to little guys like me running a 2-socket machine. This is not something that just pulls together a lot of fancy, expensive machinery (infiniband, PCI flash, scale-out, offloading, etc.) – this is a new way of implementing the query processor that will give huge performance advantages, perhaps even running on the exact same hardware.

  • October Update to Rules-Driven Maintenance

    Happy Fall! It’s a beautiful October here in Minneapolis / Saint Paul. In preparation for my home town SQL Saturday this weekend, as well as the PASS Summit, I offer an update to the Rules-Driven Maintenance code I originally published back in August 2012. It’s hard to believe this thing is now more than two years old – it’s been an incredible help as the number of databases and instance my team manages has grown.

    One enhancement with this update is the ability to set overrides for both Index and Statistics maintenance on a per-object basis, for example to use full scan on specific tables, or set custom thresholds or exclusions for index maintenance.

    Statisics maintenance has gotten a lot smarter, kicking sp_updatestats to the curb in favor of a more intelligent adaptive statistics update based on these findings.

    The basic concepts for this solution remain the same: preferences, which act as rules, are stored in an administrative database locally on each SQL Server, and SQL Agent jobs look at those preferences to decide how to manage all the databases on the instance. This structure allows hundreds or even thousands of SQL Server databases to be maintained and backed up using a single code base, with just five jobs.

    A detailed description and instructions are available back in the original post.

    As with the other version updates, the code here is cumulative, so it can be installed from scratch or over the top of a previous version.

    This software is free under the GNU public license.

  • SQL File Layout Viewer 1.2

    Just ahead of presenting it at SQL Saturday in my home town of Minneapolis / Saint Paul, I’m happy to release an updated version of the SQL Server File Layout Viewer. This is a utility I released back in March for inspecting the arrangement of data pages in SQL Server files.

    If you will be in Minneapolis this Saturday (space permitting), please come out and see this tool in action!

    New Features

    Based on feedback from others in the SQL Server community, I made these enhancements:

    1. Page types now provide descriptive labels.
    2. There’s a new ‘magnifier.’ Click on any page in the analysis image and the surrounding area is enlarged to make details easier to see.
    3. Two new display options will highlight fragmentation using a red band, and show the amount of free space on pages both graphically and with a readout.
    4. Right-click in the analysis image to run DBCC PAGE from a context menu.

    FLV1_2

    The utility has been tested with simple demo databases on SQL Server 2005 - 2012, from Windows 7 and 8 clients. It is not compatible with SQL Server 2000, or databases in 2000 compatibility mode. It requires sysadmin rights on the target SQL Server. As always, a reminder: please do not run this tool on a production database, as it will scan the entire thing and may cause performance problems for your server. Scanning a huge database probably will yield unpredictable results.

    Details

    In the download package you will find an executable file and a Visual Studio 2010 Express Edition solution with all the source code. If you are only interested in running the utility, just run the .exe at the top level of the package. If you are interested in the source code, the complete project is there for your use.

    This is free software. You may download and use it as you see fit, but without warranty of any kind. You are welcome to read or modify the source code if you like, but the code is bound to the included GPL (GNU General Public License) and you may not patent or sell it.

  • Stats on Stats

    I have been working on new features for the Rules-Driven Maintenance Solution, including per-index maintenance preferences and more selective processing for statistics. SQL Server stats is a topic I knew about at a high level, but lately I have had to take a deeper dive into the DMVs and system views and wrap my head around things like sys.dm_db_stats_properties() and sysindexes.rowmodctr.

    Many people have published great stuff on this, and I am indebted to Erin Stellato before and after she joined SQL Skills, also Kendra Little and Grant Fritchey, Ted Krueger, Microsoft, and others. So this post is no earth-shaking new discovery. These are just some notes from my studies.

    First, if you want to move beyond SQL Server’s auto update statistics and a nightly run of sp_updatestats, it’s important to understand where things stand with the statistics objects in the database. I wanted statistics metadata queries that:

    1. Show the age of the statistics values
    2. Show how heavily the underlying table or index has been modified since a statistics update
    3. Show the size of the indexes
    4. Allow me to create a formula relating modifications to size
    5. Work with table partitioning
    6. Work with multiple schemas

    Most of these are straightforward except for the “how heavily has the table been modified” bit. As these other authors point out, there are two places to locate that information: in older versions, the best thing is the rowmodctr column in the legacy sysindexes view, and in new versions the management function sys.dm_db_stats_properties(). The latter is actually only in service packs, post RTM, for SQL Server 2008 R2 and 2012.

    So, there are really two queries – no way around it:

    Update 21 Sept 2013: I found some improvements for these queries and updated them in place.

    /* Legacy sysindexes example */
    
    SELECT  statsSummary.schemaName,
            statsSummary.tableName,
            statsSummary.statsName,
            statsSummary.object_id,
            statsSummary.stats_id,
            statsSummary.statsUpdate,
            statsSummary.statsType,
            statsSummary.indexName,
            statsSummary.index_id,
            statsSummary.modCounter,
            rowcounts.row_count 
    FROM (
            SELECT  sc.name schemaName,
                    o.name tableName,
                    s.name statsName,
                    s.object_id,
                    s.stats_id,
                    STATS_DATE(o.object_id, s.stats_id) statsUpdate,
                    CASE WHEN i.name IS NULL THEN 'COLUMN' ELSE 'INDEX' END AS statsType,
                    ISNULL( i.name, ui.name ) AS indexName,
                    ISNULL( i.index_id, ui.index_id ) AS index_id, 
                    si.rowmodctr AS modCounter
            FROM sys.stats s
                INNER JOIN sys.objects o ON s.object_id = o.object_id
                INNER JOIN sys.schemas sc ON o.schema_id = sc.schema_id
                LEFT JOIN sys.indexes i ON s.object_id = i.object_id AND s.stats_id = i.index_id
    
                -- Getting rowmodctr from sysindexes for legacy versions
                LEFT JOIN sys.sysindexes si ON s.object_id = si.id AND s.stats_id = si.indid
    
                -- If the statistics object is not on an index, get the underlying table
                LEFT JOIN sys.indexes ui ON s.object_id = ui.object_id AND ui.index_id IN ( 0, 1 )
        ) AS statsSummary
        INNER JOIN (
            SELECT object_id, index_id, SUM(row_count) row_count
            FROM sys.dm_db_partition_stats
            GROUP BY object_id, index_id
            HAVING SUM( row_count ) > 0
        ) AS rowCounts ON statsSummary.object_id = rowCounts.object_id 
            AND statsSummary.index_id = rowCounts.index_id
    
    /* 2008 R2 SP2 and higher example */
    
    SELECT  statsSummary.schemaName,
            statsSummary.tableName,
            statsSummary.statsName,
            statsSummary.object_id,
            statsSummary.stats_id,
            statsSummary.statsUpdate,
            statsSummary.statsType,
            statsSummary.indexName,
            statsSummary.index_id,
            statsSummary.modCounter,
            rowCounts.row_count 
    FROM (
            SELECT  sc.name schemaName,
                    o.name tableName,
                    s.name statsName,
                    s.object_id,
                    s.stats_id,
                    STATS_DATE(o.object_id, s.stats_id) statsUpdate,
                    CASE WHEN i.name IS NULL THEN 'COLUMN' ELSE 'INDEX' END AS statsType,
                    ISNULL( i.name, ui.name ) AS indexName,
                    ISNULL( i.index_id, ui.index_id ) AS index_id, 
                    sp.modification_counter AS modCounter
            FROM sys.stats s
                INNER JOIN sys.objects o ON s.object_id = o.object_id
                INNER JOIN sys.schemas sc ON o.schema_id = sc.schema_id
                LEFT JOIN sys.indexes i ON s.object_id = i.object_id AND s.stats_id = i.index_id
    
                -- If the statistics object is not on an index, get the underlying table
                LEFT JOIN sys.indexes ui ON s.object_id = ui.object_id AND ui.index_id IN ( 0, 1 )
    
                CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
        ) AS statsSummary
        INNER JOIN (
            SELECT object_id, index_id, SUM(row_count) row_count
            FROM sys.dm_db_partition_stats
            GROUP BY object_id, index_id
            HAVING SUM( row_count ) > 0
        ) AS rowCounts ON statsSummary.object_id = rowCounts.object_id 
            AND statsSummary.index_id = rowCounts.index_id

    Using one or the other of these, it’s possible to make a list of statistics objects with a comparison of modifications to the underlying table, row count and so on, and then build update statements to act accordingly. This can be useful for large tables where reaching the auto update threshold would take too long, or to prevent updating statistics on tables where there have been only a few modifications but sp_updatestats would update the statistics anyway, possibly with a lower sample value than one would like. The classic example here is to rebuild and index, where you get stats with perfect sampling, and then replace those perfect stats with worse ones as a result of updating statistics with default sampling.

    The Problem with Rowmodctr

    Ted Krueger has a really nice demo showing some cases when auto updates will happen here. Ted does a great job explaining the 500-plus-20-percent rule that SQL Server seems to use for auto update (note, though, the xevents code in the post is only for SQL Server 2012 or later). But there is one gotcha with the way the server counts modifications to a table.

    Ted doesn’t do this, but in other contexts, it’s easy to throw around the 500-plus-20-percent “of the table is modified” rule without strictly observing what that means. For example, is that rule the server ensuring that 20 percent of the different rows in the table - that is, a number of rows that comprise 20 percent of the total number of rows in the table – get modified? Probably not. How do repeated updates to the same rows count? What about data in the table/index that isn’t a key, but just an included column?

    Secondly, it’s important to note for auto-update that the actual action of updating the statistics is not triggered by the changes to the data. I used to think that it was triggered by a select statement against the table where the statistics are invalidated by modification, but that, as it turns out, is only partly true. Really the statistics are only updated when the optimizer needs to use them to make a new query plan. There are cases where a query is already cached, and the threshold for auto update is exceeded, but the statistics are not updated because the optimizer has not requested an update in order to optimize a new query.

    The developers for SQL Server obviously have more access to internal data and counters than we do. Reading between the lines on this type of statistics update vs. table modification DMV issue, I get the sense they are a little bit circumspect about making all this data fully exposed and documented. So, we have what we have, which is STATS_DATE(), sys.dm_db_stats_properties().modification_counter and sys.sysindexes.rowmodctr.

    Here’s a demo showing some of the nuances of this, and how the rowmodctr or modification_counter act in perhaps counterintuitive ways.

    First, set up a test database with views that contain slightly tweaked versions of the stats metadata queries, so we can call them easily, and a test table and indexes:

    CREATE DATABASE StatsDemo;
    GO
    
    USE StatsDemo
    GO
    
    CREATE VIEW dbo.stats_metadata AS
    
    SELECT  statsSummary.schemaName,
            statsSummary.tableName,
            statsSummary.statsName,
            statsSummary.object_id,
            statsSummary.stats_id,
            statsSummary.statsUpdate,
            statsSummary.statsType,
            statsSummary.indexName,
            statsSummary.index_id,
            statsSummary.modCounter,
            rowCounts.row_count 
    FROM (
            SELECT  sc.name schemaName,
                    o.name tableName,
                    s.name statsName,
                    s.object_id,
                    s.stats_id,
                    STATS_DATE(o.object_id, s.stats_id) statsupdate,
                    CASE WHEN i.index_id IS NULL THEN 'COLUMN' ELSE 'INDEX' END AS statstype,
                    ISNULL( i.name, ui.name ) AS indexName,
                    ISNULL( i.index_id, ui.index_id ) AS index_id,
                    sp.modification_counter AS modCounter
            FROM sys.stats s
                INNER JOIN sys.objects o ON s.object_id = o.object_id
                INNER JOIN sys.schemas sc ON o.schema_id = sc.schema_id
    
                -- If a statistics object is on an index, get that index:
                LEFT JOIN sys.indexes i ON s.object_id = i.object_id AND s.stats_id = i.index_id
    
                -- If the statistics objects is not on an index, get the underlying table:
                LEFT JOIN sys.indexes ui ON s.object_id = ui.object_id AND ui.index_id IN ( 0, 1 )
                OUTER APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
        ) AS statsSummary
        INNER JOIN (
            SELECT object_id, index_id, SUM(row_count) row_count
            FROM sys.dm_db_partition_stats
            GROUP BY object_id, index_id
            HAVING SUM( row_count ) > 0
        ) AS rowCounts ON statsSummary.object_id = rowCounts.object_id 
            AND statsSummary.index_id = rowCounts.index_id
    where schemaName = 'dbo'
    
    GO
    
    CREATE VIEW dbo.stats_metadata_legacy AS
    
    SELECT  statsSummary.schemaName,
            statsSummary.tableName,
            statsSummary.statsName,
            statsSummary.object_id,
            statsSummary.stats_id,
            statsSummary.statsUpdate,
            statsSummary.statsType,
            statsSummary.indexName,
            statsSummary.index_id,
            statsSummary.modCounter,
            rowcounts.row_count 
    FROM (
            SELECT  sc.name schemaName,
                    o.name tableName,
                    s.name statsName,
                    s.object_id,
                    s.stats_id,
                    STATS_DATE(o.object_id, s.stats_id) statsUpdate,
                    CASE WHEN i.index_id IS NULL THEN 'COLUMN' ELSE 'INDEX' END AS statsType,
                    ISNULL( i.name, ui.name ) AS indexName,
                    ISNULL( i.index_id, ui.index_id ) AS index_id,
                    ISNULL( si.rowmodctr, usi.rowmodctr ) AS modCounter
            FROM sys.stats s
                INNER JOIN sys.objects o ON s.object_id = o.object_id
                INNER JOIN sys.schemas sc ON o.schema_id = sc.schema_id
    
                -- If a statistics object is on an index, get that index:
                LEFT JOIN sys.indexes i ON s.object_id = i.object_id AND s.stats_id = i.index_id
                LEFT JOIN sys.sysindexes si ON i.object_id = si.id AND i.index_id = si.indid
    
                -- If the statistics objects is not on an index, get the underlying table:
                LEFT JOIN sys.indexes ui ON s.object_id = ui.object_id AND ui.index_id IN ( 0, 1 )
                LEFT JOIN sys.sysindexes usi ON ui.object_id = usi.id AND ui.index_id = usi.indid
        ) AS statsSummary
        INNER JOIN (
            SELECT object_id, index_id, SUM(row_count) row_count
            FROM sys.dm_db_partition_stats
            GROUP BY object_id, index_id
            HAVING SUM( row_count ) > 0
        ) AS rowCounts ON statsSummary.object_id = rowCounts.object_id 
            AND statsSummary.index_id = rowCounts.index_id
    WHERE schemaName = 'dbo'
    GO
    
    CREATE TABLE dbo.SomeData(
        id int IDENTITY(1,1) NOT NULL,
        val1 nvarchar(50) NULL,
        val2 nvarchar(50) NULL,
        val3 nvarchar(50) NULL,
     CONSTRAINT PK_SomeData PRIMARY KEY CLUSTERED  (
            id ASC
        )
    );
    
    CREATE NONCLUSTERED INDEX IX_Val1 ON dbo.SomeData (
        val1 ASC
    );
    GO

    Next, check the statistics views. They should be empty, because our test table starts with no rows, and the those views filter out zero-row tables:

    SELECT * FROM dbo.stats_metadata
    SELECT * FROM dbo.stats_metadata_legacy

    Now we will put a row in the test table. This will invalidate the statistics, and then when the optimizer needs to optimize a query that uses the key columns in the indexes – not when the row is inserted – the stats will auto-update. Look closely at the results at each stage, and the timestamps on the stats update column. Also note that the placement of GO is important, because we need the text of the select statements/batches to be identical later in the demo:

    -- From zero to more than zero rows
    
    INSERT INTO dbo.SomeData (
        val1,
        val2,
        val3
    ) VALUES (
        'A', 
        'B', 
        'C' 
    )
    GO
    
    SELECT * FROM dbo.stats_metadata
    SELECT * FROM dbo.stats_metadata_legacy
    GO
    
    SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
    GO
    
    SELECT * FROM dbo.stats_metadata
    SELECT * FROM dbo.stats_metadata_legacy
    GO
    
    SELECT val1 FROM dbo.SomeData WHERE id = 1
    GO
    
    SELECT * FROM dbo.stats_metadata
    SELECT * FROM dbo.stats_metadata_legacy
    GO

    I get this, which is expected:

    StatsUpdateResults

    The mod counter and row_count columns make sense, and the statsUpdate dates support the idea that a stat for an index is updated when the optimizer needs to use it for a query and the data in the table has been changed.

    Now things get less intuitive, though.

    -- From 1 to 20 rows
    
    INSERT INTO dbo.SomeData (
        val1,
        val2,
        val3
    ) VALUES (
        'A', 
        'B', 
        'C' 
    )
    GO 19
    
    SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
    GO
    SELECT val1 FROM dbo.SomeData WHERE id = 1
    GO
    
    SELECT * FROM dbo.stats_metadata
    SELECT * FROM dbo.stats_metadata_legacy
    GO
    

    So far, so good. 20 rows does not trigger an update. At this point the mod counters show ‘19’ changes, which makes sense.

    Now, what happens if we do a lot of updates, but just to a single row:

    UPDATE dbo.SomeData SET val1 = 'Z' WHERE id = 1
    GO 1000
    
    SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
    GO
    SELECT val1 FROM dbo.SomeData WHERE id = 1
    GO
    
    SELECT * FROM dbo.stats_metadata
    SELECT * FROM dbo.stats_metadata_legacy
    GO
    StatsUpdateResults2

    OK, now wait a second. We updated the table 1000 times (more than 500 + 20 percent). We updated a column that is in both the nonclustered index and also in the clustered index. Why is the mod counter 19 for the clustered index and 1019 for the non? And how is the mod counter so much higher than the actual number of rows in the table? And why no statistics update? The date stamp did not change. Lots of questions.

    Consider: the updated column is a key in the nonclustered index but just a plain old column in the clustered index. So it appears, and this makes sense, that modifications are tracked to the keys related to statistics but not other columns. So the mod counters update when we change a key value of a specific index, and just for that index.

    Next, consider whether the select query is being optimized at this point, or is in cache from before. Hmm. More on this later.

    Lastly, I want to point out that the mod counter has been bumped 1000 times, where we only executed an update to one row, and didn’t change the actual value for 999 of those updates.

    So: modification counter is not sensitive to which rows in a table are changed. It’s not “20 percent of the rows” in that rule of thumb. What it really seems to mean is each time an update/insert/delete is performed, the number of rows affected by each operation is just incremented. It doesn’t mean they were different rows or even different values. This makes sense, because you would have quite a lot of overhead to actually track how many distinct rows are changed with before and after values. I believe what happens is a bit like, for every query “get the result of @@rowcount where the query changed a key column in the index, and add that to the modification counter.”

    If you need more proof, here’s a query that does change a value on each iteration, but for the same row:

    UPDATE dbo.SomeData 
    SET val1 = CASE val1 WHEN 'A' THEN 'Z' ELSE 'A' END 
    WHERE id = 1
    GO 1000
    
    SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
    GO
    SELECT val1 FROM dbo.SomeData WHERE id = 1
    GO
    
    SELECT * FROM dbo.stats_metadata
    SELECT * FROM dbo.stats_metadata_legacy
    GO
    Still no stats update. Same pattern in the stats metadata.

    How about, then, changing more of the rows?

    UPDATE dbo.SomeData 
    SET val1 = CASE val1 WHEN 'A' THEN 'Z' ELSE 'A' END 
    --where id = 1
    GO 1000
    
    SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
    GO
    SELECT val1 FROM dbo.SomeData WHERE id = 1
    GO
    
    SELECT * FROM dbo.stats_metadata
    SELECT * FROM dbo.stats_metadata_legacy
    GO

    Now this is really odd. My statistics metadata shows no update to the stats and 22,019 modifications to the 20 row nonclustered index, but still 19 changes to the clustered index. Huh?

    When you think about it, though, it does make sense. The mod counter is just incremented with the number of affected rows for keys in the index. If a key is not affected, then the counter doesn’t increment. Therefore the statistics for the clustered index are still valid – no keys have changed. There have been no changes to those keys since we populated the table, even though we have changed other data. Statistics about non-key columns are not relevant, so changes to them are not relevant.

    Now let’s bulk up the table, past those milestones of 500 rows and so on:

    -- From 20 to 499 rows
    
    INSERT INTO dbo.SomeData (
        val1,
        val2,
        val3
    ) VALUES (
        'A', 
        'B', 
        'C' 
    )
    GO 479
    
    
    SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
    GO
    SELECT val1 FROM dbo.SomeData WHERE id = 1
    GO
    
    SELECT * FROM dbo.stats_metadata
    SELECT * FROM dbo.stats_metadata_legacy
    GO
    
    -- From 499 to 500 rows
    
    INSERT INTO dbo.SomeData (
        val1,
        val2,
        val3
    ) VALUES (
        'A', 
        'B', 
        'C' 
    )
    GO 1
    
    
    SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
    GO
    SELECT val1 FROM dbo.SomeData WHERE id = 1
    GO
    
    SELECT * FROM dbo.stats_metadata
    SELECT * FROM dbo.stats_metadata_legacy
    GO
    
    -- From 500 to 5000 rows
    
    INSERT INTO dbo.SomeData (
        val1,
        val2,
        val3
    ) VALUES (
        'A', 
        'B', 
        'C' 
    )
    GO 4500
    
    SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
    GO
    SELECT val1 FROM dbo.SomeData WHERE id = 1
    GO
    
    SELECT * FROM dbo.stats_metadata
    SELECT * FROM dbo.stats_metadata_legacy
    GO
    
    

    StatsUpdateResults3

    Would it surprise you to note that there still has been no update to statistics? Or that the mod counter now shows 26,999 changes to this 5,000 row table?

    The answer? Well, I took a look in the plan cache:

    SELECT UseCounts, Cacheobjtype, Objtype, text, query_plan
    FROM sys.dm_exec_cached_plans 
        CROSS APPLY sys.dm_exec_sql_text(plan_handle)
        CROSS APPLY sys.dm_exec_query_plan(plan_handle)
    WHERE text like '%SELECT%FROM%SomeData%'

    In my run of this demo, I have 9 (!) versions of the two test select statements above that are in cache, some are adhoc, some are auto parameterized. But they are optimized already.

    If I do this (only in test, please!):

    DBCC FREEPROCCACHE

    And then I do this:

    SELECT val1 FROM dbo.SomeData WHERE val1 = 'A'
    GO
    SELECT val1 FROM dbo.SomeData WHERE id = 1
    GO
    
    SELECT * FROM dbo.stats_metadata
    SELECT * FROM dbo.stats_metadata_legacy
    GO

    I do finally get a statistics update, and the mod counters reset to 0:

    StatsUpdateResults4

    (This does not mean you should free the proc cache on your server, this is just a demo)

    So, what are the takeaways here?

    1. Rowmodctr or modification_counter does not track the “percentage of the table that has been modified,” a common misconception. I think they really track the “number of times a DML operation affected key values in the index.”
    2. The counters know nothing about whether the modifications were to one row repeatedly or to many rows. They are not perfect – and they can’t tell you, I don’t think, “how much” of the table is changed.
    3. Auto update stats is triggered by query optimization – that is, the optimizer needs to take a “new” query, where the statistics used for optimization have been invalidated, and it will request new statistics, assuming auto update stats is on, which are collected just before this new query is optimized. On a real, running system this probably happens very quickly, but as this demo shows, some cases exist where a plan might be out in cache and no optimization happens to make the stats update.

    This was very educational for me, and is shaping how I plan to updated the rules-based maintenance from “any row changed” for stats to something, hopefully, more accurate. The mod counter values are so coarse, though, that that can’t, I’m finding, be an exact science.

  • Small-Footprint Laptop Lab

    Champagne Dev VMs on a Beer Budget

    Others have posted on this very topic, but I rebuild my home/dev VM lab this past weekend, and it went so well I thought I’d share the steps I went through, in case it helps someone else to get started. I got my start doing this a while back with a very helpful post by Jonathan Kehayias on Virtualbox, but I have evolved my home lab since.

    As I see it, today you have two choices for this sort of work

    1. A cloud service like Azure or Amazon, etc.
    2. A small, local VM farm on your laptop with your own storage and hypervisor.

    There are advantages and disadvantages to each, obviously. Using the cloud you can have zero footprint on your own hardware, and not worry too much about disks and so on. On the other hand, you are tied to an internet connection, which can cramp your style when doing presentations.

    The cloud frees you from a lot of setup tasks. Setup can be, depending on your perspective, dull work you need to plow through or a learning experience. I personally still like going through the process because I learn so much each time. Building servers and performing installation is still part of my job, at least for the foreseeable future. With the cloud you typically click through a selection from a catalog and your VM comes pre-built.

    Just for the experience, I did both in the past few days.

    Azure VM

    Azure is quite straightforward. Here’s a screenshot showing an RDP session to my remote Azure VM, which came with SQL Server 2014 CTP preinstalled:

    AzureRDP

    This is a VM with SQL Server installed – not the Azure PaaS SQL Server offering – so I have the whole OS to fool with here. Creating this was super simple. I have the advantage of an MSDN subscription, which provides credits on Azure for dev/test work. I activated that, and then just followed the instructions on the web site to provision a VM from a catalog image of a Windows 2012 / SQL 2014 CTP virtual machine. Literally click New > Virtual Machine > From Gallery and choose from the catalog.

    You end up with the machine, RDP access, and also a nice portal dashboard with stats about the VM:

    AzureDB

    A few things commend this approach:

    1. Super easy
    2. Nothing on your machine
    3. Not permanent – the VM can be deleted easily when you are done
    4. This is what the future looks like, truly. It’s useful to learn about it and mitigate what Grant Fritchey has aptly called Buggy Whip Weaving.

    Old-School VMs

    ExtSSDIt’s still feasible, of course, to install VMs locally, and these days it doesn’t cost much. I do this with 16 GB RAM in my laptop and a modest USB 3 external disk enclosure with a Crucial SSD inside. This setup works well for demo and learning VMs. I did have to purchase a USB 3 express card, my laptop lacking built-in USB 3, and the only downside is that the express card itself has proven dodgy*. It does not like fancy stunts like “Sleep” or “Restart” or “Eject” and so on. But I blame the card, not the solution architecture.

    Hosting the VMs takes some hypervisor. In my case I have tried several but settled on Windows 8’s built-in Hyper-V. It’s free with the OS, all the basics work, and it has good PowerShell support, which I appreciate.

    The main challenges with Hyper-V I experienced:

    1. No easy clipboard support to copy and past into and out of VMs with the console
    2. It just flat will not work to share my wifi in the laptop with the VMs. In order to connect the VMs to the Internet, which I only do intermittently for stuff like patching, I have to plug in to a physical network. I believe this is by design, but I don’t exactly understand the reasoning behind it. I spent a fair amount of time trying to fight this one but gave up.

    Steps

    For this generation of my home lab I followed this basic outline:

    1. Download Windows Server 2012 Datacenter Evaluation from Microsoft as a VHD. This is a pre-built virtual hard drive with the OS “installed” on it, ready to complete setup and use.
    2. Download SQL Server 2012 Enterprise Evaluation Edition as an ISO. I went this way because I do want to go through the setup steps.
    3. Extract the delivered VHD download to my external SSD disk and rename it.

    For the setup I wanted, I need a domain controller and three or four SQL Servers. SSDs are fast, but capacity is expensive, so I found I could save a lot of space by using Hyper-V’s differencing disks. This basically allows you to make a “parent” hard drive image file, and then derive multiple virtual hard drives from it, storing only the differences. Other hypervisors also have this feature. By using this technique I am able to store four servers’ hard disks in about 28 GB of space on disk. The trick is the install order, and what will maximize the shared disk space.

    Four VMs’ hard drives with Differencing:

    HyperVDifferencingDisks

    I got good results with these steps:

    Make a parent virtual hard drive for the OS

    1. In Hyper-V manager, make a virtual machine, and choose the downloaded Windows 2012 VHD file from Microsoft as the hard drive.
    2. I use two virtual switches in my Hyper-V setup, one for internal communication and one to connect out to the Internet. I set the VM up with a virtual network adapter for each.
    3. Start the VM and go through the setup process. This process will make the "parent” hard drive in the differencing disks,  storing the bulk of the bits for the OS, but I want to embed the things that all the servers will share. So:
    4. Patch the machine by configuring Windows Update while connected to the Internet.
    5. Install .NET Framework 3.5 in Server Manager > Features while connected to the internet. This is important because SQL Server requires this, but it’s not on the VHD from Microsoft, and feature setup has to download it using Windows Update.
    6. This provides a working server hard drive that will allow us to derive the others without needing as much space as a full copy of the VHD.
    7. Sysprep this server by running sysprep.exe with the “generalize” option, so we can make clones.

    Make a domain controller

    1. The first real machine is the DC. Shut down the “parent” vm above.
    2. In Hyper-V Manager, edit the properties of the virtual machine (or make a new one and specify that you’ll “attach a virtual hard disk later.”)
    3. Set the hard drive to be a new VHD file, using the differencing option. This file will use snapshot technology to retain only the blocks on the disk that are different from the shared parent disk, in a sense de-duplicating most of the hard drive content.
    4. Choose the virtual hard drive from above as the parent for the differencing disk.
    5. Boot the VM and go through the configuration tasks.
    6. The DC will want to have a static IP address. Visit Settings > Control Panel > Network and Internet > Network and Sharing Center > Change adapter settings. Right-click the network adapter and choose properties. Locate Internet Protocol Version 4. Click Properties and set the adapter to a static IP address. To keep things simple I usually use 10.10.10.1, then 10.10.10.2, then 3, 4, as I add virtual machines to the lab.
    7. From Server Manager, add roles and features, add the AD domain controller role, and follow the wizard there to set this up as the DC for the lab. I just use a domain called “arch.lab.”
    8. That gets us a little domain controller.

    Make a parent virtual hard drive for SQL Servers

    1. Create a new VM in Hyper-V Manager, using the same technique: choose the option to “assign a virtual hard drive later,” then add a new virtual hard drive using the Differencing option, and the OS parent hard drive from the first section above.
    2. Boot this new VM, and run through the OS setup
    3. Install SQL Server on this vm, by mounting the SQL Server ISO file as a DVD in the virtual “dvd drive” provided by Hyper-V.
    4. Sysprep this VM, so it, too, can be cloned.
    5. Shut this VM down. This virtual hard drive will become the parent for all the SQL servers, basically de-duplicating the SQL Server product bits, and again saving space.

    Make the actual SQL Servers

    1. Create three or four more virtual machines in Hyper-V manager, and for each make a new virtual hard drive of the differecing type, where the parent SQL Server VHD from the preceding section is the parent. Each of these machines will act like SQL Server is “preinstalled,” because of the previous steps, and the redundant bits will be stored only once in the parent VHD file.
    2. Start these and run through the configuration steps to set them up.
    3. Set a unique static IP address for these machines like the DC.
    4. Use Server Manager > Local Server > Computer Name to add each machine to the lab domain.

    Conclusion

    This was fun and educational for me, and now I have a fresh lab for talks I am giving this fall. You can, I hope, see the appeal of Azure, though. Even though I made only one Azure VM, it took a few clicks and most of the installation pain is just not there. Making the same four VMs as on my local lab, I can tell, would be fairly simple.

    Update 20 Sept 2013: I found the Express card problem (Huzzah!). It was the defaults for PCI power saving in my laptop BIOS.

  • Education, Money, Happiness

    It’s back-to-school time. The MCM program has been canceled. A good time to reflect.

    I have two little boys, you might know, Will and Robbie. Will is a third grader this year, and Robbie starts Kindergarten tomorrow. I have been through “school selection,” which, especially if you have kids, really means, “Where must we live so our kids have a decent education?” With the littlest one starting school in earnest, and the cancelation of the MCM - and the hand-wringing and worry and blogging about that throughout the SQL community – I have been mulling over this problem of education a lot.

    School and I go Way Back

    I am probably an over-educated nerd, to be honest. I have, taken together, eight years of college education. I don’t have exactly the degree you would expect from all those years, but they were all valuable – the papers I do have are for two Architecture degrees. I studied Architecture for six years, computer science for two. Before that I was in a classic New England prep school, where going to college was just an expected norm. I did an exchange program in Sweden during that time, and lived and went to a college prep program in Linköping (pronounced roughly leen sherping). Both my parents are retired teachers from U.S. public schools, my father a French teacher, my mother, a music teacher.

    I even taught, albeit as a very young visiting lecturer rather than a professor, at a prestigious university for a time. I worked alongside career academics and got a little of the flavor of that life.

    Architecture is difficult, and heavily regulated, and so has a very long and onerous program of university, internship, testing and accreditation, followed more continuing ed and more accreditation. When I went to school it took at least 8-10 years from high school graduation to become an architect. I never did, though I worked in the field for ten years. Most architects never make much money. Some stars do. The movie image of an architect is just that. Imaginary.

    What I mean to say is, I have spent my whole life in relationship to schools, to teaching, to certifications, to the measure of a framed piece of paper on the wall, and what that means in light of career, money and happiness. It was an expectation from my parents, who, as teachers, are whole-hog invested in education as a basic moral imperative. Elaborate certification was a specter during my whole time designing buildings, one that I resisted. “Certs,” as we call them in IT, are a constant topic in the SQL Server community. Now, as a parent, I have to make up my mind about how this continues with my boys.

    Consider This

    This is a very good series of short videos from T.V. personality Mike Rowe, that I found online via Andy Leonard:

    http://profoundlydisconnected.com/alternative-education/ 

    Also consider this related older thread from Andy, which is thought-provoking:

    http://sqlblog.com/blogs/andy_leonard/archive/2011/06/27/my-name-is-andy-and-i-have-an-associates-of-applied-science-degree.aspx

    Common Myths

    I don’t know much, but after long years dealing with and thinking about these issues, I know for a fact that our society carries some mythology, perpetuated by media and some institutions, that doesn’t help families make good choices about education.

    Myth: College is About Getting a Job. College is Job Training.

    There are some college programs that look very much like job training – engineering, for example, or law or medicine. It might be that a large percentage of graduates from those programs go on to work in their “chosen field.” It might even be that new entrants into some professions have to have a degree in the field. Still, and call me crazy, I am weary of hearing that education is only important to ensure young people have job training, so they can be employed, because that idea is so misguided. We sell our whole culture short by constantly reducing this issue to employment (pun intended). We sell our young people short every time someone makes a wisecrack like, “oh, Art degree – would you like fries with that? Har, har. Yuk, yuk.” Yet this is what we hear, over and over: employment as the only important rationale for education. People sometimes say, “Education can enrich your life,” and that is more true – but they are rarely taken seriously.

    Here’s the truth: if you live in fear of not having essentials like food, heat, a roof over your head, you’ll be miserable. If you live in constant worry that you will lose your job, you’ll be miserable. If you live with a horrible job that you hate, but you have to keep it to survive, you’ll be miserable.

    If you look at those statements, you might note that college is “orthogonal” – an overeducated word for “unrelated” – to any of those things. I have a good job and make a decent living in a field totally unrelated to the professional program I attended in school. It doesn’t mean the education was lost, or didn’t have value, it means the real value of that education isn’t job training.

    Myth: College Will Help You Make a lot of Money.

    Truth: C students and dropouts are often richer than we are, the nerdy A students. Anybody out in the workplace with open eyes should know that. There might be exceptions for some Ivy-league CxOs or surgeons or lawyers, but in my experience education and serious wealth are not directly related. Why is that? This could be a whole other post, but to summarize my belief on this topic:

    • Nobody makes serious, exotic-car-collection, multiple-house money by working for a wage or salary. You make money by controlling wealth-generating assets. This is because other people will not pay you great piles of money for your individual labor, they will only pay you great piles of money because you control assets like companies, labor, material, technology. There is little related to education in that equation. What it takes is the insight and skill to gain control of assets – if that’s your cup of tea.
    • You never make serious money unless the main thing you are paying attention to is making money. If your priorities are elsewhere, other people probably get the money. That’s not necessarily a bad thing, but there it is. There are a few stories of people who, for example, invented some technology and stumbled into riches, but in fact the majority of rich people are rich by focusing single-mindedly on getting rich. University education will provide you no end of distractions from getting rich.
    • Getting and keeping serious money requires being blind to, or stomaching, certain ethical dilemmas. University might well deepen those dilemmas.

    So, if you want great piles of money – and perhaps question that impulse – then don’t focus on higher learning. You need to be literate, and you need to be able do do math and such, but get control of a business instead of worrying about some Master’s degree. It’s more effective to learn business in business than in a university.

    But, I am not rich, and I work for a salary, so at the end of the day, I probably cannot speak with authority on this.

    Myth: College Will Guarantee You Make a lot of Money.

    We do our kids a disservice equally by claiming that going through a college degree program somehow guarantees you wealth. There’s the economy, there’s the job market, there’s supply and demand. There are statistics that show college graduates may be more employable as young people, but for an individual kid there’s no relationship here. If you are smart and resourceful I think good career paths are out there either way. Likewise, a college graduate can be unemployable, or fixated on a specific career for which there are no jobs, or where all the jobs have low pay. This college=rich thing is nonsense.

    Myth: Money Makes People Happy. Therefore, College is Required for Happiness.

    Here’s the double-edged myth that rolls the previous three into a nice surprise package with a bow on top. There’s a message out there, repeated endlessly: College = Money = Happiness. The problem for kids is that, with one shot and a one-way ticket through life, if you discover this is a myth when you are 35 and miserable, a lot of time has been wasted. We need to remind kids that happiness is happiness. Once you get past the basic economic necessities so that you are comfortable, once you have activities and vocation that you find rewarding, it’s a statistically proven fact that happiness is not related to wealth. I think happiness might be related to education in some ways, which I’ll talk about in a minute, but that relationship does not detour through job training and money.

    Myth: College is / is not Worth the Money.

    With all this ranting, you might have the impression I am against sending kids to college – but nothing could be further from the truth. I am only ranting against all the undue emphasis on money. Here’s the thing: college really can enrich your life, but it’s not with dollars. You get one life, and all you have, really, on the last day, is your experience. The point is to live well, and for many people, going to college helps. It helps you grow up, it helps you think about things in new ways, it expands your view of the world. It helps you in your job, but the important ways that it helps are not as job training. It helps you write, for example, or meet deadlines, or communicate. It helps you with strategies for problem solving. Even then, it’s not necessary for everyone.

    After the first one or two jobs that a young person has, it is my fervent hope that employers don’t blindly make college degrees a requirement for job candidates, because all these soft job skills can also be acquired in other ways. It’s not that you must go to college to get them. College just helps some people get those skills faster. After a few years, in many professions, the college degree just should not carry that much weight. Look at the real qualifications of the candidate.

    College is expensive. Is it worth the money? If you went, and you are miserable, then it wasn’t. If you went, and you have a better life because of the experience then it was. The constant analysis of “this degree for this job has ROI where this one doesn’t” entirely misses the point. When I think about my kids’ future, the college question really comes down to this: in their one trip through life, will college help them to be happy, productive, satisfied? That’s all. That’s whether it’s worth the money.

    English degree makes you literate, and being literate makes you happy? You have a decent job in some completely unrelated field, and that job is rewarding? That’s ROI.

    Myth: Everyone Needs a Degree.

    So, I don’t think a college degree makes everyone happy, or employable, or rich. So, while higher ed was right for me, and I am so happy to have it, I’m with Mike Rowe on this. Let’s be sensible about it, instead of forcing it on every kid, for imaginary economic reasons. We should instead, as he argues, make lots of different opportunities for lots of different kids in many fields.

    I’ve made a comfortable living, but not much money. We are not rich. I could have been smarter about what I have earned, in some ways. But my education is part of who I am, and I like knowing the things I learned. I’m happier for it.

    Certs

    That brings me around to the MCM program. I have thought a lot about going after MS certs and the MCM. I’d have a lot to learn, but I do feel like I could earn that. I personally have not gone after the lower certs, because, in the back and forth of whether they carry real weight or not over eight or so years, I always seem to land on the side where I think my experience is enough without the paper. I already have the motivation to hone my skills and learn this DBA craft on my own, without the deadlines or the tests. And with the web, and a modest book budget, the learning is mostly free.

    I have huge respect for those who have earned the MCM title, or who worked to create it, and I can imagine how disappointing it was when Microsoft made their blunder of an announcement. Counting some of the MCMs among my friends, and knowing how expert they are, I know this was a genuine, thorough, high bar for DBAs.

    That said, and not discounting their accomplishments, I think Brent Ozar had a really insightful take on this issue. It’s hazardous aligning something like lifetime educational goals, akin to the way I think about a college experience, with the very simple, hard realities of a technology corporation like Microsoft or Cisco. Those entities are going to do what they need to do economically, and teaching or certification, even while they may be well intentioned, are going to be bound to that corporate interest. I feel like it’s important to keep that in perspective. Insofar as the low-level technical detail of training and certs like this has such a short life – three years? – and all of it is bound, at the end of the day, to selling technology, then it really is the learning that matters.

    The bright side? Microsoft can remove the program, but they can’t take the learning or experience away from the the people who participated, and, at least in my view, that’s the important part the experience anyway.

  • IF NOT EXISTS ( SELECT ‘thisPost’ ) CREATE POST thisPost AS

    T-SQL deployment scripts are tedious to write and they get no love. As a result, they are very often the most horrible, error-prone step-children of scripts*. One basic principle that could help is to develop a habit of using the “IF NOT EXISTS” or “CREATE OR ALTER” patterns in these scripts. The idea is to first check the existing database before creating new objects. This is far better, in my opinion, than the common “DROP and CREATE” pattern, which can lose data and/or permissions if misapplied. Whenever possible, this type of deployment code should use the SQL Server Catalog Views, and not the deprecated ones. INFORMATION_SCHEMA has been demonstrated to be flakey by others.

    Make It So

    I like to adopt an approach where deployment scripts

    • Are re-runnable without harm. I’m calling this the “F5 F5 F5 F5 F5” principle. DROP AND CREATE is your enemy.
    • Should complete without errors when successful. (Errors should be real; none of this “ok, expect errors from the script and ignore them.” That’s just dangerous.)
    • Take whatever is there on the server, within reason, and change it to what the script says should be there. This is a declarative approach a bit like Picard’s orders to Number 1. Make It So. If an object is missing, then create it. If it’s there already, then move on. If it needs modification, then change it.

    The hard part about IF NOT EXISTS is that you need all these weird snippets of code to check system views for objects in the database. These are repetitive, but hard to figure out from scratch, and who has time for that? Well, I’m making time. I have had this post idea in the back of my head for a long time. This page will be a clearing house of code snippets to check for the existence of objects, and I’ll come back and add or adjust it in the future. You should be able to just snag what you need for that ugly deployment script and make a few edits.

    Use SQLCMD to Stop on Error and for Script Variables

    /* Optional but Recommended: use SQLCMD to set database name and stop on errors */
    
    :on error exit
    :setvar dbname myDatabase
    
    

    Notes: SQLCMD is a great addition to deployment scripts that allows one to avoid hard-coding things like database names, and to stop script execution if an error is encountered. You have to enable SQLCMD mode in SSMS to take advantage of this. Query > SQLCMD Mode

    Every Script Needs Database Context

    Somewhere you must direct the deployment script to use the correct database. If you are doing that from a calling function like a PowerShell script, it could happen there. If not, your script needs:

    USE [$(dbname)]

    This should happen one time, near the top, after SQLCMD variable assignments and comments.

    Notes: without this, chances are people will accidentally fill master with junk. Truly.

    Create a Database if it Doesn’t Exist

    /* Create the database only if it's not already present */
    USE master
    
    IF NOT EXISTS ( SELECT name FROM sys.databases WHERE name = '$(dbname)' )
    BEGIN
        CREATE DATABASE [$(dbname)] ;
        ALTER DATABASE  [$(dbname)] MODIFY FILE
        ( NAME = N'$(dbname)', SIZE = 200MB, MAXSIZE = UNLIMITED, FILEGROWTH = 200MB ) ;
        ALTER DATABASE  [$(dbname)] MODIFY FILE
        ( NAME = N'$(dbname)_log', SIZE = 50MB, MAXSIZE = 2048GB, FILEGROWTH = 50MB ) ;
    END
    GO
    
    /* Set any important database-level options */
    ALTER DATABASE [$(dbname)] SET RECOVERY { FULL or SIMPLE } ;
    ALTER DATABASE [$(dbname)] SET AUTO_SHRINK OFF ;
    ALTER DATABASE [$(dbname)] SET AUTO_UPDATE_STATISTICS ON ;
    ALTER DATABASE [$(dbname)] SET READ_COMMITTED_SNAPSHOT { ON or OFF } ;
    ALTER DATABASE [$(dbname)] SET ALLOW_SNAPSHOT_ISOLATION { ON or OFF } ;
    /* ... etc ... */
    

    Notes:

    Edit here 7 Sept 2013 based on Aaron’s comments:

    This snippet works for small, simple databases. Edit the file size and growth settings to appropriate values, but please don’t use the truly awful default values from SQL Server. Avoid relying only on auto grow to size your databases, instead, when possible, allocating a plausible size and growth increment for files. If you can’t predict the size, let the system run for a time, and monitor, until it reaches a steady state for file size. If you control the model DB (if you’re creating in-house databases, for example) then you can adjust that. If you are deploying on unknown servers, then you can’t rely on reasonable defaults. That said, auto growing at a suitable increment is a reasonable fail-safe to prevent running out of room with a hard size limit.

    This method has the advantage of using the default locations for files at the server level instead of hard-coding file paths – especially the cringe-worthy
    N'C:\Program Files\Microsoft SQL Server\…

    If your database requires more files and file groups, you might need to provide a file path, but ideally use SQLCMD variables so that it can be easily repointed.

    Create a Table if it Doesn’t Exist

    Preferred:

    IF NOT EXISTS ( 
        SELECT * FROM sys.tables t
        INNER JOIN sys.schemas s on t.schema_id = s.schema_id
        WHERE s.name = 'dbo' and t.name = 'myTable' 
    )
        CREATE TABLE dbo.myTable (
            col1 int not null,
            col2 nvarchar(50) not null
        );

    Also works:

    IF NOT EXISTS (
        SELECT * FROM sys.objects 
        WHERE object_id = OBJECT_ID(N'[dbo].[myTable]') AND type in (N'U')
    )
        CREATE TABLE dbo.myTable (
            col1 int not null,
            col2 nvarchar(50) not null
        );
    

    Notes:

    Don’t forget the schema. Use two-part names everywhere unless there is a real and compelling requirement not to.

    Management Studio can help generate these IF NOT EXISTS clauses, but the quality varies and some of the code behind that scripting is very old. Still, it can give you a working starting point. In SSMS, go to Tools > Options > SQL Server Object Explorer > Scripting and look for the setting “Check for Object Existence” or “Include IF NOT EXISTS clause,” then script out some objects with the GUI and examine what you get back. Except for procedures, these are sort of OK.

    Create a View if it Doesn’t Exist

    Preferred:

    IF NOT EXISTS (
        SELECT * FROM sys.views v
        INNER JOIN sys.schemas s on v.schema_id = s.schema_id
        WHERE s.name = 'dbo' and v.name = 'myView'
    )
        EXEC sp_executesql @statement = N'CREATE VIEW dbo.myView AS SELECT ''Placeholder'' AS Placeholder' ;
    GO
    ALTER VIEW dbo.myView AS
        SELECT col1 
        FROM dbo.myTable ;
    GO

    Also works:

    IF NOT EXISTS (
        SELECT * FROM sys.views 
        WHERE object_id = OBJECT_ID(N'[dbo].[myView]')
    )
        EXEC sp_executesql @statement = N'CREATE VIEW dbo.myView AS SELECT ''Placeholder'' AS Placeholder' ;
    GO
    ALTER VIEW dbo.myView AS
        SELECT col1 
        FROM dbo.myTable ;
    GO

    Notes:

    Don’t forget the schema.

    This looks odd at first, but is wonderful to use. It mimics the “CREATE OR ALTER” feature in some other dialects of SQL.

    Create a Procedure if it Doesn’t Exist

    Preferred:

    IF NOT EXISTS (
        SELECT * FROM sys.procedures p
        INNER JOIN sys.schemas s on p.schema_id = s.schema_id
        WHERE s.name = 'dbo' and p.name = 'myProc'
    )
        EXEC sp_executesql @statement = N'CREATE PROCEDURE dbo.myProc AS SELECT ''Placeholder'' AS Placeholder' ;
    GO
    ALTER PROCEDURE dbo.myProc AS
        SELECT col1 
        FROM dbo.myTable ;
    GO

    Also works:

    IF NOT EXISTS (
        SELECT * FROM sys.objects 
        WHERE object_id = OBJECT_ID(N'[dbo].[myProc]') AND type in (N'P', N'PC')
    )
        EXEC sp_executesql @statement = N'CREATE PROCEDURE dbo.myProc AS SELECT ''Placeholder'' AS Placeholder' ;
    GO
    ALTER PROCEDURE dbo.myProc AS
        SELECT col1 
        FROM dbo.myTable ;
    GO

    Notes:

    Don’t forget the schema.

    Like views, this rather strange one imitates “CREATE OR ALTER.” That allows you to keep one source file, in source control, for example, and update it without changing from CREATE to ALTER or even worrying about it.

    Create a Function if it Doesn’t Exist

    IF NOT EXISTS (
        SELECT * FROM sys.objects 
        WHERE object_id = OBJECT_ID(N'[dbo].[getFunky]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT')
    )
        EXEC sp_executesql 
        @statement = N'CREATE FUNCTION dbo.getFunky () RETURNS TABLE AS RETURN SELECT ''Placeholder'' AS Placeholder' ;
    GO
    ALTER FUNCTION dbo.getFunky ( @someParameter int )
    RETURNS TABLE
    AS
        RETURN
            SELECT col1 
            FROM dbo.myTable
            WHERE col1 = @someParameter ;
    GO
    

    If a Primary Key is Missing, Add it

    IF NOT EXISTS (
        SELECT * FROM sys.tables t
        INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
        INNER JOIN sys.indexes i on i.object_id = t.object_id
        WHERE i.is_primary_key = 1 
        AND s.name = 'dbo' AND t.name = 'myTable'
    ) 
        ALTER TABLE dbo.myTable
        ADD CONSTRAINT PK_myTable PRIMARY KEY /* { maybe CLUSTERED } */ (
            col1
        ) WITH ( ONLINE = ON /* or OFF */ )
        /* perhaps ON [myFileGroup] */ ;
    

    Notes:

    This snippet looks for any primary key at all. That’s because you often cannot trust the name of the primary key, if it was auto-generated by the server at some point in the past. It’s good practice to name constraints, but not everyone does. I actually have some code that says “if the primary key is present but not named {x} then sp_rename it to {x}.” But that’s a little DBA-nuts.

    If a NonClustered Index is Missing (by Name), Add it

    Preferred:

    IF NOT EXISTS (
        SELECT * FROM sys.tables t
        INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
        INNER JOIN sys.indexes i on i.object_id = t.object_id
        WHERE s.name = 'dbo' AND t.name = 'myTable' and i.name = 'myNCI'
    ) 
        CREATE NONCLUSTERED INDEX myNCI 
        on dbo.myTable ( 
            col2 
        ) WITH ( ONLINE = ON /* or OFF */ ) ;

    Also works:

    IF NOT EXISTS (
        SELECT * FROM sys.indexes 
        WHERE object_id = OBJECT_ID(N'[dbo].[myTable]') AND name = N'myNCI'
    )
        CREATE NONCLUSTERED INDEX myNCI 
        on dbo.myTable ( 
            col2 
        ) WITH ( ONLINE = ON /* or OFF */ ) ;
    GO
    

    Notes: this assumes that you trust the name of the target index, and that if present the structure of that index is assumed to be correct. It doesn’t check the columns in the index.

    Conclusion

    I do wish SSMS had better and more sophisticated support for scripting these clauses, but, lacking that, I resort to a small library of these and look them up. I’ll add to this page as I locate more of these – but contributions are certainly welcome!

    * Some people and organizations use various tools to help automate this, but my real-world experience tells me that old-fashioned scripts are still a huge proportion of database installation and patch procedures.

  • SQL Server File Layout Viewer 1.1 Release

    Edit 9 October 2013: A new version is out! Please download 1.2 from here

    Back in the Spring I released a 1.0 version of a utility that will show the structure of SQL Server files graphically. Today I have a new release that includes some fixes and updates based on user feedback, bug reports, and suggestions.

    In this release:

    1. The database scan is significantly faster because it now uses parallelism.
    2. Larger databases are supported, and files over 4GB are now read correctly – but don’t go too crazy. Really large data sets are not likely to work.
    3. SQL Server Auth is now available.
    4. Various bug fixes around pages that contained data at one time but are not currently allocated, and pages “captured” from other databases via instant file initialization.

    Fundamentals about the app are the same:

    1. The app will read a database and then present a color-coded visualization of all the data by object and by page type.
    2. Mouse over the visualization to get details about the pages in a panel on the right edge of the app window.
    3. Use this in test, not production. It’ll scan the whole database, which probably would be performance impacting.
    4. This is designed to study and learn about storage internals using modest data sets. Running it on a large warehouse, for example, is probably not going to work.

     

    SFLV_1_1Screenshot

     

    I want to thank those in the SQL community who helped with this project via Twitter, testing for me, bug reports and critiques. You know who you are, and y’all rock :-).

  • STOP! Wrong Server!

    Some time ago I wrote a post about old-time T-SQL flow control. Part of the idea was to illustrate how to make scripts abort on errors, which is surprisingly convoluted in T-SQL. Today I have a more specific example: how can I check that we are really about to execute in the right database on the right server, and halt if not. Perhaps you’ve connected to the wrong server, or the database you intend to modify doesn’t have the name your script expected. “USE database” is wonderful, but what if it didn’t work?

    As in the old article, there are two problems: 1. How do I stop inside the batch that is currently running (i.e. between “GO” batch terminators). 2. How do I stop the client from merrily sending the rest of the batches in a whole script to the SQL Server.

    There are a number of ways to do this, but there is one method I have found to be clear:

    1. Run in SQLCMD mode, in order to take advantage of :ON ERROR EXIT. This stops the client from transmitting more SQL batches after a batch has thrown an error.
    2. Wrap a check for the server name and database name in a TRY/CATCH at the top of the script. For most practical purposes this can be the first batch, only, since we are planning to halt if the test fails. For versions 2005 to 2008 R2, this has to use RAISERROR, but for 2012 (only) we can finally take advantage of THROW.
    :ON ERROR EXIT
    
    BEGIN TRY
        IF ( SELECT @@SERVERNAME ) != 'TheRightServer' RAISERROR( 'WHOA! WRONG SERVER!', 16, 1 );
        USE [master]
        IF ( SELECT DB_NAME() ) != 'TheRightDB' RAISERROR( 'WHOA! WRONG DATABASE!', 16, 1 );
        PRINT 'Rest of First Batch Ran';
    END TRY
    BEGIN CATCH
        IF ERROR_MESSAGE() = 'WHOA! WRONG SERVER!' RAISERROR( 'WHOA! Caught: WRONG SERVER!', 16, 1 );
        ELSE IF ERROR_MESSAGE() = 'WHOA! WRONG DATABASE!' RAISERROR( 'WHOA! Caught: WRONG DATABASE!', 16, 1 );
        ELSE RAISERROR( 'An exception was thrown in the first batch', 16, 1 );
    END CATCH
    GO
    
    PRINT 'Second Batch Ran';
    GO
    

    Breaking this down – if I am in the wrong server or the wrong database, what I want to have happen is that the statement

    PRINT ‘Rest of First Batch Ran’ ;

    should not execute. The way to accomplish that is to raise an error if we are in the wrong server or database, and use the TRY/CATCH to transfer execution down to the CATCH block. A sysadmin could raise a more severe error that would stop batch execution, but that’s less friendly, and we can’t assume we are all sysadmins. To me this method needs to work with error severity 16, for a typical user.

    Further, it’s important that remaining batches in the whole script are not executed. (Remember, batches are separated by “GO” and are each like a mini program sent from the client to the server.) That means that

    PRINT ‘Second Batch Ran’;

    should not execute if the first batch threw an error.

    The only practical way to do that is with the

    :ON ERROR EXIT

    directive, but that only works in SQLCMD mode.

    For SQL Server 2012, and in the future “and higher,” this example can be simplified using THROW, as:

    :ON ERROR EXIT
    
    BEGIN TRY
        IF ( SELECT @@SERVERNAME ) != 'TheRightServer' THROW 50000, 'WHOA! WRONG SERVER', 1 ;
        USE [master]
        IF ( SELECT DB_NAME() ) != 'TheRightDB' THROW 50000, 'WHOA! WRONG DATABASE!', 1 ;
        PRINT 'Rest of First Batch Ran';
    END TRY
    BEGIN CATCH
        THROW;
    END CATCH
    GO
    
    PRINT 'Second Batch Ran';
    GO
    

    Safe computing!

More Posts Next page »

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement