THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Louis Davidson

  • Temporal Tables – Part 2 – Changing history

    After my post yesterday, I had planned to take a break for Independence Day holiday, but a commenter asked a question that piqued my interest. How might you edit history? Well, the answer is simple, but the process isn’t exactly simple. Basically, you have to turn off system versioning, update history, and reapply system versioning. Not difficult, but not something you can easily do just by editing a row in history as the temporal structures protect themselves.

    select @@version
    go
    Microsoft SQL Server 2016 (CTP2.0) - 13.0.200.172 (X64)   May 21 2015 10:53:07   Copyright (c) Microsoft Corporation  Enterprise Evaluation Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: ) (Hypervisor)

    First warning, this is pre-release code, and is subject to change. I will try to make sure it works sooner or later with RTM, probably as I am doing my edits for the book.

    To see the metadata, let me create a table that I will not turn on system versioning:

    create table notTemporal
    (
        notTemporalId int primary key
    )

    Now, sys.tables has a few new columns for temporal, including temporal_type_desc and history_table_id which I will use to list the tables and their temporal usage (filtering out history tables).

    select CONCAT(schemas.name,'.',tables.name) as table_name,
           historyTableSchema.name + '.' + historyTable.name as history_table_name,
           tables.temporal_type_desc
    from   sys.tables
             join sys.schemas
                on tables.schema_id = schemas.schema_id
             left outer join sys.tables as historyTable
                join sys.schemas as historyTableSchema
                    on historyTable.schema_id = historyTableSchema.schema_id
                on historyTable.object_id = tables.history_table_id
    where tables.temporal_type_desc <> 'HISTORY_TABLE'

    go

    table_name           history_table_name                          temporal_type_desc
    -------------------- ------------------------------------------- -------------------------------------------
    dbo.company          dbo.MSSQL_TemporalHistoryFor_565577053      SYSTEM_VERSIONED_TEMPORAL_TABLE
    dbo.notTemporal      NULL                                        NON_TEMPORAL_TABLE

    The dbo.company table is the one from part 1, and you can see its history table. We will see that this contains all of the historical changes, and not the current row, still with a gap between the top two rows because I deleted the row temporarily.

    select *
    from   dbo.MSSQL_TemporalHistoryFor_565577053

    This returns:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:15:01.6419789 2015-06-30 00:16:19.1614451
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 2015-06-30 00:33:39.0978045

    And the current row is in the primary table:

    select *
    from   dbo.company

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:37:07.1375063 9999-12-31 23:59:59.9999999

    Ok, the next stuff I am going to try is to see what happens if you monkey with the data in this table as is, here in CTP 2.0. Use cases might be to remove bad rows, errored data, etc. But mischief is also kind of fun when you come upon a new technology too.)

    update dbo.company
    set   SysEndTime = '2015-06-30 00:37:08'

    Cool, it protects this data

    Msg 13537, Level 16, State 1, Line 55
    Cannot update SYSTEM_TIME PERIOD columns in table 'testTemporal.dbo.company'.

    What about the SysStartTime?

    update dbo.company
    set   SysStartTime = '2015-06-30 00:37:08'

    Same deal

    Msg 13537, Level 16, State 1, Line 62
    Cannot update SYSTEM_TIME PERIOD columns in table 'testTemporal.dbo.company'.

    Not allowed either. And modifying the history table directly is also not allowed. Here I want to make the row appear to have existed since the 15th of June, rather than the 30th:

    update dbo.MSSQL_TemporalHistoryFor_565577053
    set SysStartTime = '2015-06-15'
    where  companyId = 1
    and   sysEndTime = '2015-06-30 00:16:19.1614451'

    Wah, wah, wah….

    Msg 13561, Level 16, State 1, Line 70
    Cannot update rows in a temporal history table 'testTemporal.dbo.MSSQL_TemporalHistoryFor_565577053'.

    However, if you want to make a change to the data, or load in old history data, you can do this by turning off versioning, building the table as it is needed, and then turning versioning back on, making sure to specify the table to use.

    alter table dbo.company
        set (SYSTEM_VERSIONING = OFF);

    Now you can run your update query to set the start time back to June 15, so this row existed longer in history than it really did.

    update dbo.MSSQL_TemporalHistoryFor_565577053
    set SysStartTime = '2015-06-15'
    where  companyId = 1
    and   sysEndTime = '2015-06-30 00:16:19.1614451'

    Succeeds. Now, we turn back on system versioning, being sure to specify the history table, or you will get another new table (second repeat, but it is important or you will have a lot of these tables around). If the table is large, you may not want to do the consistency check, but I generally would myself  because structures like this can be tricky to get right, so it will be safer to let it check the structure.

    ALTER TABLE dbo.Company
        SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE=dbo.MSSQL_TemporalHistoryFor_565577053, DATA_CONSISTENCY_CHECK = ON));

    Now looking at the entire history, you see the row started at 6-15, not 6-30.

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.9999999')
    order  by SysStartTime desc

    Which returns:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:37:07.1375063 9999-12-31 23:59:59.9999999
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 2015-06-30 00:33:39.0978045
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company1                       00001         2015-06-15 00:00:00.0000000 2015-06-30 00:16:19.1614451

    Now you want to see the row as of 6-15:

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-15' as priorCompany

    It is there!

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-15 00:00:00.0000000 2015-06-30 00:16:19.1614451

    Still not on 6-14:

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-14' as priorCompany

    Very cool stuff… More to come… Standard caveat: I won’t be doing any guessing in these blogs because if I guess right and NDA information is out there that I missed, I could get in trouble accidentally. But I will try to make sure if there are big changes in syntax that I note it, or redo the blog syntax so I don’t end up with bad information out there.

  • Temporal Tables – Part 1 - Simple Single Table Example

    In my first entry in my series of posts on DB Design Book Topics, I want to start with a pretty interesting topic, Temporal Tables. It is a very cool new feature that takes something that used to be very painful (capturing history of changes to a table, allowing the user to see the data at a point in time), and makes it very easy.

    So if you have a row in a table, and it is created, updated, and then deleted, knowing how the row looked at a given period of time can be very useful. I wanted to start with a very basic example, to show how thing work, and later entries in this series will expand to multiple rows and tables.

    select @@version --Features are apt to change. Test if you are using a later CTP

    ---------------------------------------------------------------------------------------------------------------------------
    Microsoft SQL Server 2016 (CTP2.0) - 13.0.200.172 (X64)   May 21 2015 10:53:07   Copyright (c) Microsoft Corporation  Enterprise Evaluation Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: ) (Hypervisor)

    First off, we need to create a workspace. I will just call the database testTemporal:

    create database testTemporal
    go
    use testTemporal
    go

    Nothing needed to be done to allow temporal, just create a database on the 2016 instance. The table needs to have a few new things, highlighted in the next example:

    create table dbo.company
    (
        companyId    int identity(1,1) primary key,
        name        varchar(30) unique,
        companyNumber char(5) unique,
        SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL, --the time when this row becomes in effect
        SysEndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL,   --the time when this row becomes no longer in effect
        PERIOD FOR SYSTEM_TIME (SysStartTime,SysEndTime)  
    )
    WITH (SYSTEM_VERSIONING = ON); --Note that you can use a table of your own. More on that in a later blog entry…
    go

    Simple enough, and if you want to see more about the create table syntax, check BOL here (https://msdn.microsoft.com/en-us/library/ms174979.aspx) as this is a CTP edition and subject to change.

    So let’s create a row in the table:

    insert into dbo.company (name, companyNumber)
    values ('Company1','00001')

    select SCOPE_IDENTITY(); --If you don't mess up, this will be 1. We will use this in our examples
    go

    Now we change something in the table a few times to let us have a few changes to see in the example:

    update company
    set    name = 'Company Name 1'
    where  companyId = 1

    And update it again:

    update company
    set    name = 'Company Name 2'
    where  companyId = 1

    This time update with no changes:

    update company
    set    name = 'Company Name 2'
    where  companyId = 1

    To see the row exactly as it currently exists, just use a normal select statement:

    select *
    from   company
    where  companyId = 1

    You will see that looks exactly as you expect:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company Name 2                 00001         2015-06-05 02:15:32.7938411 9999-12-31 23:59:59.9999999

    To see all versions of the rows, use the FOR SYSTEM_TIME clause with CONTAINED IN (read more here about temporal and FOR SYSTEM_TIME): https://msdn.microsoft.com/en-us/library/dn935015(v=sql.130).aspx):

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.9999999')
    order by SysEndTime Desc

    This returns all of the row versions that have been created:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 9999-12-31 23:59:59.9999999
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company1                       00001         2015-06-30 00:15:01.6419789 2015-06-30 00:16:19.1614451

    There are a few things of note. The first SysStartTime value will be when the row is inserted. The last row will be to 9999-12-31 23:59:59.9999999. Note too that when we updated the row with no actual data changes, we still get a new version.

    Also when working with the times and the FOR SYSTEM_TIME clause, be careful to include the time up to the fractional seconds or you may not get what you expect. When using CONTAINED IN, if you don’t put the nines out to all seven decimal places, you won't get the current row due to roundoff:

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.999999') –Only six decimal places
    order by SysEndTime Desc

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company1                       00001         2015-06-30 00:15:01.6419789 2015-06-30 00:16:19.1614451


    The more interesting use will be to work with a row (or rows) at a certain point in time, like to get the second version from 2015-06-30 00:16:19.1614451, we will use FOR SYSTEM_TIME AS OF, which takes a datetime2 value, and returns the row where SysStartTime >= PassedValue > SysEndTime. (The PassedValue can also be a variable.)

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-30 00:16:19.1614451' as priorCompany

    Note that the time was from the SysStartTime, which also was the SysEndTime for a different row. This returns:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199

    You can also use FOR SYSTEM_TIME in a JOIN criteria and see multiple versions of the row in your query:

    select company.Name, priorCompany.Name as PriorName
    from   company
             join company FOR SYSTEM_TIME AS OF '2015-06-30 00:16:19.1614451' as priorCompany
                on company.companyId = priorCompany.companyId

    This will return:

    Name                           PriorName
    ------------------------------ ------------------------------
    Company Name 2                 Company Name 1

    I expect that may be a pattern that gets used in a later blog to calculate changes since a time period! Finally, lets take a quick look at what happens in a delete:
           
    delete from company
    where  companyId = 1

    Now, checking the data, we see that there is no 9999 row version:

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.9999999')

    --There are the same four rows, but now all rows have an end time in the same millienium as we live in:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:15:01.6419789 2015-06-30 00:16:19.1614451
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 2015-06-30 00:33:39.0978045

    So looking for a row at a past time, the row did still exist:

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-30 00:16:19.1614451' as priorCompany

    With the only difference being that the SysEndTime isn’t what is once was:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199

    But looking at the table currently, no row:

    select *
    from   company

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------

    So finally, what happens when we replace the row using the same surrogate key value? (Not discussing here if this is a good idea, or bad idea…And this has led me to wonder if we can adjust history if the delete was accidental… Ah, fodder for later)

    set identity_insert dbo.company on
    go
    insert into dbo.company (companyId, name, companyNumber)
    values (1, 'Company1','00001')
    go
    set identity_insert dbo.company off
    go

    And then look at all of the row versions that exist now?

    select *
    from   company  FOR SYSTEM_TIME CONTAINED IN ('1900-01-01','9999-12-31 23:59:59.9999999')
    order  by SysStartTime desc

    --You can see that the row now exists, but there is now a gap between the top two rows:

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:37:07.1375063 9999-12-31 23:59:59.9999999
    1           Company Name 2                 00001         2015-06-30 00:17:23.9396846 2015-06-30 00:33:39.0978045
    1           Company Name 2                 00001         2015-06-30 00:16:48.2444199 2015-06-30 00:17:23.9396846
    1           Company Name 1                 00001         2015-06-30 00:16:19.1614451 2015-06-30 00:16:48.2444199
    1           Company1                       00001         2015-06-30 00:15:01.6419789 2015-06-30 00:16:19.1614451

    Looking at the data at the current row’s SysStartTime:

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-30 00:37:07.1375063' as priorCompany

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------
    1           Company1                       00001         2015-06-30 00:37:07.1375063 9999-12-31 23:59:59.9999999

    But a very very small amount of time before, not there:

    select priorCompany.*
    from   company FOR SYSTEM_TIME AS OF '2015-06-30 00:37:07.1375062' as priorCompany

    companyId   name                           companyNumber SysStartTime                SysEndTime
    ----------- ------------------------------ ------------- --------------------------- ---------------------------

    So there is a quick look at some of the basic functionality that we can expect with temporal data in 2016.

    Note: I won’t be doing any guessing in these blogs because if I guess right and NDA information is out there that I missed, I could get in trouble accidentally. But I will try to make sure if there are big changes in syntax that I note it, or redo the blog syntax so I don’t end up with bad information out there.

    Continue to Part 2 – Changing History

  • Speaking this week at #sqlsatChattanooga and for the PASS Performance Virtual Chapter

    The topic for both sessions will be: How In-Memory Database Objects Affect Database Design and the following is the abstract:

    With SQL Server 2014, Microsoft has added a major new feature to help optimize OLTP database implementations by persisting your data primarily in RAM. Of course it isn't that simple, internally everything that uses this new feature is completely new. While the internals of this feature may be foreign to you, accessing the data that uses the structures very much resembles T-SQL as you already know it. As such, the first important question for the average developer will be how to adapt an existing application to make use of the technology to achieve enhanced performance. In this session, I will start with a normalized database, and adapt the logical and physical database model/implementation in several manners, performance testing the tables and code changes along the way.

    The PASS Performance Virtual Chapter session will be 1:00 PM Central Time on Thursday the 25th. You can see more details and register here: http://performance.sqlpass.org/

    The SQL Saturday Chattanooga session will be at 3:00 PM Eastern Time on Saturday the 27th. You can see details about the entire conference here: http://www.sqlsaturday.com/410/EventHome.aspx 

    I am pretty excited about talking about SQL Server In-Memory Technologies, but also it is a bit interesting because of how much SQL Server 2016 is going to change thing. I will do my best to make notes on how it will change the usage and design aspects of In-Memory as I go. You can see a list of how things are changing here in Aaron Bertrand’s blog entry. In 2016, it won’t change the In-Memory OLTP stuff to be general purpose, but it certainly will change how effective it can be used in certain scenarios.

    I do wish I had more time to play with 2016 before my presentations, but I don’t believe it would change any of my current opinions on usage (other than having quite a bit more data integrity built in, but I certainly will be testing out the new stuff in the coming months as I work on my set of pre-book blogs.

  • Time to Start Plotting My 2016 Database Design Book

    Note: A new version has been talked about for years, but it is not a guarantee. This post is my way of introducing the new features of SQL Server, and follow on posts will be code filled example of how the features will work. I am making no promises at this time, but when I can, I will.

    With the recent release of a CTP version of what is called SQL Server 2016, it is time for me to start thinking about what to add to the next edition of the SQL Server Design Book. In 2014, we decided that there really wasn’t enough change to merit a new edition of the book, with just a few key features being added that changed how one might design their database application. One major feature cropped up, that being In Memory OLTP databases, and while it is a game changer, it was very new and we decided to wait. I went to Kalen Delaney’s precon, as well as producing a presentation on the feature, doing my best to get a feel for how it changed database design.

    SQL Server 2016, even in the CTP2 version I am basing this blog post on, contains quite a few new features that will change how we may implement databases, including one that is going to be a tremendous change for the better. In this first post, I want to list the features I think are important to address in my book, and ask you to use the comments to tell me if I am wrong and what other features I missed.  As I progress, I will come back and try to include links to the actual posts (to make it easy for me to use this a a reference later!)

    New Features To Possibly Cover

    This should be the easiest list to create. There are some very particularly interesting features that are coming to 2016 and that were in 2014, with a brief introduction to what they are and how they fit in my vision.  Each topic will be the header of a linked list of blogs as I write new content on each feature.

    • In-Memory OLTP – Allows you to house your database in RAM, not just caching the existing structures in the way on-disk tables are, but using completely new plumbing that is lock and latch free. It is a MAJOR difference architecturally, and may turn out to be the most work making all of the terminology fit (what used to be a table will quite often need to be an “on-disk” table.) Existed in 2014, and enhanced in 2016.
    • Temporal Data – This is probably the most exiting feature I have seen. It will make the scenario of keeping up with changes in your tables over time, and automate it. Queries using the tables will be able to be executed in the context of a time period (VERY simplified explanation). New to 2016, and very exiting.
    • Columnstore Index Improvements – There is a chapter on reporting structures written by Jessica Moss that touched on Columnstore indexes, at the time a new feature that had just begun to be understood. At this point, with four more years of experience, I know I have a great admiration for what they can do. 2014 gave us Clustered Columnstores, and 2016 enhances how they work to make managing them easier.
    • Row Level Security – Allows you to limit access to rows based on the details of the query executor. Not DDL-esque, like GRANT EXECUTE TO Rows Like This, but more like a security policy, where you build a function that is bound and executed on access to the table. It works very similar to the more manual (always go through a stored procedure!) methods that have existed in the past. New for 2016.
    • JSON Support –  While I have purposefully avoided JSON in the same manner I avoided XML, I will at least need to cover the existence of this feature.
    • Stretch Database – Allows you to put some of your data on the Microsoft Azure Cloud storage (or something along these lines). Allows you to put infrequently used data in the cloud, with very active data locally. Probably not going to be covered heavily, but will bear a mention in the physical design sections of the book.
    • Security DDL Enhancements – There were a few server level permissions added to 2014 that were the bomb. CONNECT ANY DATABASE, SELECT ALL USER SECURABLES, and IMPERSONATE ANY LOGIN. The first two allow you to say: Login X can access all database and see all data. Something that a support person may be allowed. In the past, the login would have been given sysadmin rights because, you know, it is just too much work. In 2016, a few new security bits are added to support the new features.
    • Always Encrypted – Allows for SQL Server columns to be encrypted based on a key that does not reside on the SQL Server.
    • Dynamic Data Masking – A feature that makes column level security palatable by defining a masked output that occurs when a user without certain permissions accesses the column (even though they do have select rights).
    • Durability Changes - In SQL Server 2014, along with In-Memory OLTP allowed for non-durable tables. But in addition to In-Memory durability, they included a feature at the database level called delayed durability that makes log writes asynchronous.

    New Material To Add To the Entire Book

    This section is about concepts that I need to add that are not based on new features, but rather just concepts that I feel like I missed or didn’t cover enough last time.

    • Design Testing – Basically making sure that the architect of a system, as they progress through the requirements start thinking about testing before code is created, and apply the tests to the conceptual, logical, and physical models to make sure they meet the requirements. Too often you see designs that just don’t make sense because the entire process wasn’t walked through before the customer has a UI on their web browser.
    • Hierarchies – Enhance the sections about hierarchies a bit to include performance testing characteristics of hierarchy code patterns. This is what my presentation on hierarchies is based on, as I had the idea as the last edition went to print.

    Stuff to Not Cover

    Everyone needs limits, and unless I truly get more space in the book, the following topics will probably stay only the briefest of mentions in this edition.

    • Hadoop integration with SQL Server - From what I currently understand, this will allow you to access Hadoop data from SQL Server. Probably will end up in the list of features that may not fit the "Relational Database Design Book" paradigm.

    Stuff to Remove

    For now, this section will be blank. I personally hope to just add new material, and leave most of the existing stuff intact (unless it only pertains to 2005 or earlier, as they are out of support now, even by me!). But last edition, we were getting extremely close to the 700 page limit I have had for most editions of the book. If you are a reader, or have ideas even from the list above about what needn’t be covered, please hit the comments. I am very interested to know. I have ideas about how to expand the book, but it will not be easy to see too much of an expansion, and it will be very hard to shrink what is there without a LOT of editing/rewriting, which is also not a beloved idea.

    One idea I have is to remove the internals stuff about how indexes and data is stored, as other people do it better, and with In-Memory to contend with, it will really need to expand. It is however something that I find people feel like it is a selling point in that it provides a deep enough dive to give the reader a picture of internals, but not nearly as good as quite a few others do.

    I look forward to comments from you as much as I hope you look forward to the months of upcoming blog posts with queries that show how each of these features work as I figure them out myself.

    Note: fixed a few issues with formatting.. I hate HTML :).
    Update: Added Hadoop Integration as a do not cover topic. 
     
    Update: Added Delayed Durability. 
    Update: Added link to Temporal Topic 

  • ANY and ALL, Two Keywords I Had Never Taken the Time to Understand

    As I am starting to prepare for an upcoming blog series on database designs and associated query patterns (stay tuned!) I was thinking about what query keywords I don't really know well enough. ANY and ALL were two keywords that immediately popped into my mind that I had never really used (and I probably should have.)

    ANY and ALL are used as options on scalar comparison operators like =, !=, >, etc to allow them to be used with multi-row sub-queries.

    You know if you do something like the following:

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId = ( select SalesOrderId
                           from Sales.SalesOrderDetail
                           where SalesOrderId in (43659, 43660, 43661))

    You will get the following error:

    Msg 512, Level 16, State 1, Line 11
    Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

    However, you can change this to be:

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId = ANY (  select SalesOrderId
                                from Sales.SalesOrderDetail
                                where SalesOrderId in (43659, 43660, 43661))

    And now each value in the left input will be checked against the set on the right.

    Admittedly, this isn't that useful, as = ANY is basically equivalent to IN, as in the following statement. Heck, I did use IN in the example to get the 3 rows for the test for a reason:

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId in (43659, 43660, 43661)

    One big difference between IN and = ANY, is that it only works with subqueries so this will not work

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId = ANY (43659, 43660, 43661)

    Will result in:

    Msg 102, Level 15, State 1, Line 3
    Incorrect syntax near '43659'.

    The next thing you might think (I did) is that if = ANY is the same as IN, then != ANY must be the same as NOT in, right? No, because the != ANY is the operator, and you have to think about what “Does not equal ANY” means. Actually the following query will return every row in the table:

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId != ANY (select SalesOrderId
                               from Sales.SalesOrderDetail
                               where SalesOrderId in 43659, 43660, 43661))

    Why? What the “!=ANY” operator says is to return a row that the row values doesn't match any subquery row value, so if even one row doesn't match, then it will be successful. So take 43659. It does match 43659, but it doesn't match the other 2 rows. Or as this page in Technet puts it, NOT IN says != value1 and != value2 and != value3, where != ANY is != value1 or != value2 or != value3.

    In this case, you could use the ALL keyword, which says to check the value against every value returned, and the value returned must match all values (Not tremendously interesting with an equal operator, but essential to understanding the operators), but if any row matches, it doesn't match all of them. When no rows are returned by the subquery, it returns true. So the following (with the subquery negated with the 1=2 will return all rows in the table:

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId = ALL (select SalesOrderId                
                              from Sales.SalesOrderDetail
                              where SalesOrderId in (43659, 43660, 43661)
                                and 1=2)

    But, if this is the case, then != ALL should return = rows, right? Wrong.

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId != ALL (select SalesOrderId
                               from Sales.SalesOrderDetail
                               where SalesOrderId in (43659, 43660, 43661)
                                 and /* again */ 1=2)

    Both return the exact same set of rows. The problem (and why I could wrap my head around these operators) is that the operator is truly = ALL and != ALL. So = ALL says that you must match ALL values, but != ALL means that you must be different than all individual values. So in the following, it will return all rows in the table except the three values from the subquery:

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId != ALL (select SalesOrderId
                               from Sales.SalesOrderDetail
                               where SalesOrderId in (43659, 43660, 43661))

    What I particularly like about the ANY and ALL keywords, is that they are named in a very straightforward manner, once you get the idea of what they do, but as noted, it wasn’t necessarily intuitive to start with for me. On a practical side, what if we want to compare one value to all of the values in a related set. In my example (using ye olde AdventureWorks database), I want to see what orders do not have any sales order items where the UnitPrice is less than 500. A classic way of making this comparison would be to use an aggregate on the salesOrderItems to get the max UnitPrice for all items and use it as a filter:

    select *
    from Sales.SalesOrderHeader
    where SalesOrderId in (    select SalesOrderId
                               from Sales.SalesOrderDetail
                               group by SalesOrderId
                               having max(UnitPrice) <= 500)

    It is a technique I have used many times over. But, really, what would be easier would be to check each item against the scalar value, rather than doing an aggregate. Using the >= ALL operator, we can do this in a direct method. Each row is compared to the value on the left side of the operator.

    select *
    from Sales.SalesOrderHeader
    where 500 >= ALL (    select UnitPrice
                          from Sales.SalesOrderDetail
                          where SalesOrderHeader.SalesOrderId = SalesOrderDetail.SalesOrderId)

    Using AdventureWorks2014 as it is, freshly downloaded, the performance for both queries is very much evenly matched, based on the plan:

    clip_image002

    And Statistics IO and Time are very much similar as well. In my tries, the IN version took a few milliseconds more than than the >= ALL version. But note that the >= ALL query suggested an index. This is the index (with a slightly better name!):

    CREATE NONCLUSTERED INDEX unitPrice_include_salesOrderId
                          ON [Sales].[SalesOrderDetail] ([UnitPrice]) INCLUDE ([SalesOrderID])

    From there, you will see a fairly large (though not necessarily, amazing) improvement:

    clip_image004

    The CPU and reads were quite a bit better, with the >= ALL version needing:

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahea

    Table 'SalesOrderHeader'. Scan count 1, logical reads 689, physical reads 0,

    Table 'SalesOrderDetail'. Scan count 1, logical reads 96, physical reads 0, r

    SQL Server Execution Times:

    CPU time = 78 ms, elapsed time = 318 ms.

    While the IN version needed:

    Table 'SalesOrderHeader'. Scan count 1, logical reads 689, physical reads 0,

    Table 'SalesOrderDetail'. Scan count 1, logical reads 1246, physical reads 0,

    SQL Server Execution Times:

    CPU time = 125 ms, elapsed time = 361 ms.

    The elapsed times were similar, but there was an improvement over pretty much equal as well.

    To test to make sure you have the correct answer we can use the following query to look at the values that are returned. It takes the basic query and looks at the data for each of the salesOrderDetail rows for a salesOrder:

    select SalesOrderId, max(UnitPrice) as MaxUnitPrice, min(UnitPrice) as MinUnitPrice
    from sales.SalesOrderDetail
    where salesOrderId = ANY ( select SalesOrderId
                               from Sales.SalesOrderHeader
                               where 500 >= ALL (select UnitPrice
                                                 from Sales.SalesOrderDetail
                                                 where SalesOrderHeader.SalesOrderId =
                                                             SalesOrderDetail.SalesOrderId)

                                                          )
    group by salesOrderId
    order by MaxUnitPrice desc, MinUnitPrice desc

    Which will return a result set like:

    SalesOrderId MaxUnitPrice          MinUnitPrice
    ------------ --------------------- ---------------------
    46641        469.794               469.794
    46934        469.794               469.794
    46963        469.794               469.794
    47022        469.794               469.794
    47048        469.794               469.794

    73273          2.29                  2.29
    73040          2.29                  2.29
    51782         1.374                  1.374
    53564         1.374                  1.374
    65214         1.374                  1.374

    Which you can see, all of the max values are <  500 (So 500 is greater than all of the values).

    I haven't done a tremendous amount of performance testing (as you can tell), but it is clear to me that ANY and ALL have the propensity to be of some value in queries on occasion (and not just to get a blog entry posted so I don't vanish from the SQLBlog role again this month.

  • Classic Music, Attractions, and Presentations: Presenting on Database Design at SQL Saturday Atlanta 2015

    This next two weeks are going to be awesome. May 11, I get to see the Who for probably the last time ever. Saturday I am speaking on Database Design at SQL Saturday in Atlanta, then heading to Disney World.  It rarely gets even close to this good for a two week span, and SQL Saturday is a big part of that. I love going to SQL Saturdays, seeing all of the people who I usually only see as a handle in a twitter feed or blog post.

     

    As I considered this week, I couldn't help but try to find a somewhat interesting way to blog about it. The presentation I am doing is my oldest, and truly most favorite presentation. I have done it many times over, and it only changes a little bit here and there every time I give it.

     

    Baba O'Reilly, Carousel of Progress, and Database Design?

     

    The Who is celebrating 50 years, and a few songs they sing will be close to that age…and they are still relevant today and connect with the audience young and old. A (begrudgingly) favorite attraction of mine at Disney World is the Carousel of Progress, largely because it is charmingly old but the message still relevant (if dated,) particularly to a computer scientist who wants to affect technology for the future. Relational database design is a fundamental tool for producing software, even in the 30+ years since it was started. Much like you won't appreciate music and theme parks without knowing the classics, understanding relational database concepts will help you design any solution using a database.

     

    As a speaker and writer, you want to produce new and interesting stuff, but it may be the oldies that people want to hear. I love to do this presentation largely because the topic, and there are plenty of people who still need to learn the fundamentals of database design.  Of course, I won't entertain you like The Who,  and the classroom will almost certainly not rotate after each section of the presentation, but if you are hearing about design for the first or fiftieth time, I feel certain you will get something from it.

     

    If Database Design interests you, I hope to see you in my session this Saturday! If you are still on the fence, here is the abstract:

     

    Database Design Fundamentals

    Data should be easy to work with in SQL Server if the database has been organized as close as possible to the standards of normalization that have been proven for many years, but are often thought of as old-fashioned. Many common T-SQL programming "difficulties" are the result of struggling against these standards and can be avoided by understanding the requirements, applying normalization, as well as a healthy dose of simple common sense. In this session I will give an overview of how to design a relational database, allowing you to work with the data structures instead of against them. This will let you use SQL naturally, enabling the query engine internals to optimize your output needs without you needing to spend a lot of time thinking about it. This will mean less time trying to figure out why SUBSTRING(column,3,1) = 'A' is killing your performance, and more time for solving the next customer problem.

  • Off to Richmond for SQL Saturday

    I got the email prodding speakers to blog about our upcoming sessions, so I got myself up and started to write this blog. It has been such a long time since I have done much side SQL work (other than doing quite a bit of tech editing, along with doing some work for PASS leading up to speaker submissions), that my blog didn't even show up in the blogs list on sqlblog.com. My last blog was right after PASS when I had attended the Summit from my hospital bed.

    Since then, it has been quite a bumpy road. For a person who usually travels as much as I do for fun (SQL and Disney) and work, not having left the Nashville area since vacation in September has been weird. But all sorts of stuff have gotten in the way, mostly that I just haven't felt like blogging (heck, I haven't had an entry on my simple-talk blog since then either, though a few editorials were posted on sqlservercentral.com by my editor that I wrote pre-surgery).

    But now, finally, it is time to wake the heck up. I am leaving Nashville this Sunday, heading for a week of work in Virginia Beach, not coincidentally the same week as SQL Saturday Richmond where I will be talking about In-Memory OLTP tables and how they affect your database design.  It is also pretty cool that Jessica Moss will be presenting down in Virginia Beach while I am in town, so a stop at the Hampton Roads SQL Server User Group is definitely in order for most of the people I work with.

    Here is the abstract for my presentation (if you want Jessica’s go here):

     

    How In-Memory Database Objects Affect Database Design

    With SQL Server 2014, Microsoft has added a major new feature to help optimize OLTP database implementations by persisting your data primarily in RAM. Of course it isn't that simple, internally everything that uses this new feature is completely new. While the internals of this feature may be foreign to you, accessing the data that uses the structures very much resembles T-SQL as you already know it. As such, the first important question for the average developer will be how to adapt an existing application to make use of the technology to achieve enhanced performance. In this session, I will start with a normalized database, and adapt the logical and physical database model/implementation in several manners, performance testing the tables and code changes along the way.

     

    Does this mean I am fully back and over my funk? Good grief, I don't know. But I have submitted for 4 other SQL Saturdays over the rest of this year, and I have projects that are just waiting for me to get started. Some days I want to just lay down and not get up until it is time to go back to bed. Others I want to write a new book, travel to the ends of the earth and talk about SQL Server. The fact is, I am taking this one task at a time, and I look forward to talking about SQL Server for you at 9:45 on the 21st of March. And when that is over, I am going to Dollywood's opening weekend and let the Tennessee Tornado spin some sense into my head. Hope I see you there (SQL Saturday or Dollywood, either way we can have some fun!)

  • MVP/SQLPASS Summit Wrapup From a Non-Attendee

     

    (Or “How I Sort of Attended PASS This Year After All and It Wasn’t as Horrible as it Might Have Been”)

    Wow, my experience at this year’s MVP Summit and PASS Summit was not at all what I had expected it to be for me just weeks ago when I was planning my trip. Life intervened, and as my blog post last week here explained, the short of it is that I was unable to attend either summit for less than fun reasons. As a shut in who has always stayed in connection with the SQL community, particularly at the summits here in the US, it was going to be pretty hard to completely stay on the sidelines. So I had to choose: turn off my computer and live on pain meds for the week, or tune in to twitter and see what was going on (still aided by pain meds!). My eventual decision was based on the fact that when I attend the summit, I tweet about how it is going for someone out there so what the heck. I decided to tune in, and it turned out that the people I was tweeting to were people like myself this year, who can’t make it for some reason.

    I am glad I did, as it turned out to be pretty cool (once the MVP Summit had passed at least.)

    This year the MVP Summit for us SQL–ers was first up, attached to PASS. I have missed an MVP Summit before, and just like then, it sucked to be home. The information we get there is considered so NDA that we can really only discuss stuff from presentations we attend with other people that we see attending the same session. There is a great twitter account for the MVP program @MVPAward where lots of fun information is shared, but nothing of particular technical interest. Twitter feeds of other MVPs attending tend to be like this one from one of my heroes, Arnie Rowland (who is not to be confused with Rob Farley, mind you…):

    #mvpsummit Great discussion about {NDA} led by {NDA}.Some questions about {NDA} expessed {NDA}. However #HugAChuck is NOT covered by NDA.

    It is fun (and something I usually do yearly too), but when you don’t know what the {NDA} is replaced with and REALLY want/need to… well, it stinks. When you know what #HugAChuck is, it is even more fun, but we won’t go into that either.

    On Wednesday, the PASS Summit started. Since at least 50% of the people I follow and would follow me would be at or interested in PASS, the twitter feed exploded with glorious noise on the #summit14 hashtag. Keeping up was kind of fun, but still felt pretty empty. At that point I was really close to just sulking and turning off twitter. But then I happened upon a twitter conversation between @DaniSQL and @SQLServer:

    @DaniSQL: Sad and jealous I won't be at #sqlpass this year.

    @SQLServer: @DaniSQL Not to worry, we've got you covered! You can watch the #summit14 live stream here with your account info: http://www.sqlpass.org/summit/2014/PASStv.aspx 

    I had heard of PASS TV before, but never really gave it much thought in the past since I had always been at the Summit live and in person. I thought it was probably just advertisements and interviews, but it turned out to be something more. In addition to the aforementioned stuff that was there but not attention holding for hours, it turns out that they also streamed quite a few full sessions, and most of them were live-as they were happening sessions by some of the big names.

    I saw whole sessions (or partial sessions when doctors/nurses felt the need to interrupt!) from David Klee, Adam Machanic, Kalen Delaney, Kimberly Tripp, Steve Jones, Gail Shaw and Grant Fritchey… It was awesome because it was just like being there for a small slice of time… I didn’t get to see the keynotes (though I may try to one day soon since they are still available via PASSTV (everything is now on YouTube), or some of the other sessions that were transmitted from this year, but in reality, I attended as many sessions this year at PASS as I could have if I had attended in person.

    What I did miss was seeing everyone in person and getting to do most of my volunteer work. Having been an MVP for 11 years, and going to the PASS Summit for even longer, I have a lot of folks that I have seen there yearly that never make it to the events in the region I frequent.  I had scheduled time in the community zone, at a birds of the feather luncheon, and a couple of abstract writing sessions. These I just completely missed. One bright note was that there was one meeting that a sub-committee of the program committee that we had scheduled to do face-to-face that I was able to attend via phone. Even with my complex scheduling, (I had a visitor to cut short and a therapy session that I had to reschedule), I was able to participate in that meeting and feel somewhat productive. I was very appreciative to that team for letting me stay on task and not miss out.

    Taking in sessions was one half of the picture, giving me a sense of connection, but if you add to this the great community that we gave me a lot of moral support for my ordeal through the twitter channels (as well as a lovely gift sent to me from my friends at the Friends of Red-Gate program), I felt as connected as I believe I could without stepping on an airplane (something that I pretty much only do  when I get the opportunity to go out to the Microsoft area of the country for these two conferences or something else SQL Server related…well, okay, sometimes I will fly to Disney World, but that is off topic for sure).

    So here’s to the 2014 MVP and PASS Summits that were as good as I could have ever expected. And to their 2015 siblings that by golly I will be in attendance for, this time on Hip 3.0 (which is going to be hippest hip yet.)

  • Why I am not going to PASS this year

    This is going to be one of my toughest non-technical posts ever. And the reason it will be difficult will have an ancillary relationship to the Lortabs I have been taking for the last two days. It has everything to do with me being at home while the 2014 PASS Summit is going on. Previously, the only Summit I had missed was the first one in Chicago. I had just changed jobs, so they wouldn’t pay my way yet. They did send me to the Summit in London that directly followed, which wasn’t a bad trade since that is the one and only time I have travelled outside the country. 

     

    So what is the reason I won’t be there?

    This is the first year in at least 6 or 7 that I did not put in a session to speak at the PASS Summit. I wanted to enjoy the experience and not have the pressure of a session or two hanging over me from July to October. I had volunteered for the program committee, so I figured I would have plenty to do. Practicing the session over and over, getting it just right (or at least really, really close to right). My status as a speaker has nothing to do with why I am not attending.

    For the first time in many years, we will not be having a Quiz Bowl during the Welcome Reception. While I am sad to see it go, as it was a lot of fun to work all of these years with Board Member Tim Ford (@sqlagentman), this really has nothing to do with it. Not having to worry about writing questions and setting up the game is certainly not keeping me away from the Summit.

    This year, the dates for the MVP Summit overlap with the PASS Summit. I didn’t choose MVP over PASS, as PASS and Microsoft worked out a wonderful situation where we were going to be able to attend both, over 7 glorious days. The date of the MVP Summit is not why I am missing PASS this year. I am missing it too!

    In fact, no other minor details would even somewhat hold me back (not long flights, too soon after Halloween, too much salmon, not close enough to Thanksgiving, too few politics, too rainy, too long, too short, too many politics,  not enough salmon, etc). The SQL PASS Summit is one of my favorite weeks of the year come rain or shine. I made my hotel reservations the minute I heard the announcement of the dates. I blogged about how much I am looking forward to attending along with some advice for attendees here on sqlblog.com, and had several volunteer jobs set up including working at the community zone, two sessions for folks wanting to learn how to write a better abstract, and a table during the Birds of a Feather Lunch. I clearly haven’t fallen out of love with PASS.

    It certainly isn’t the people at PASS. I love to see all of my online SQL Server friends in person, and meet new ones to boot. I have met a lot of amazing people over the years, and I expected to meet a lot more this year. And while online friends you never meet in person are great, getting the chance to shake a hand, break some bread, etc, makes them seem more real when you see them online.

    So if you came to this blog looking for me to bash on PASS and the Summit, you are in the wrong place. No, as I attempted to foreshadow, I won’t be attending the PASS Summit for a medical reason. I had a total hip replacement done back in May of 2009. 2009’s PASS Summit was a big milestone in my recovery. It was where I finally stretched my legs and was walking (for me) considerable distances comfortably. At the same time, it was where I learned my limits as about midway through, I started experiencing pain and ended up needing a scooter to get around Seattle for my post-conference session and dinner.

    Fast forward to Thursday, October 21.  Something gave way in my leg, and I was unable to walk. Turned out the hip replacement hardware had broken. So, instead of PASS, I am having surgery Monday morning to repair that hip replacement device with a new one. My surgeon quickly nixed travelling the next week after surgery (in good humor… and he kind of wanted my symphony tickets for Tuesday J), so PASS is out. Luckily while the twitterverse is exploding with PASS excitement, I will be at home and I won’t really notice, thanks to my little friends I introduced in the first paragraph… Lortabs.

    Note: If you are keen to get info about how things go for me, I may tweet some at @drsql, and my wife @valdavidson will tweet occasionally as well.  Thanks!

  • PASS Summit Attendance “Advice”

    Wow, it is fall again, and in a large percentage of the minds of SQL Community members, that means that the PASS Summit is here again. I have been to all of the Summits save the first one, and honestly it has become one of the highlights of my nerd year. I have learned a tremendous amount about SQL, made a lot of real friends, and gained a taste for travelling over the years, something I never did before I started making the yearly journey to the summit. This year, I decided not to put in to speak, so instead of working on a presentation, I wrote this blog mixing my two favorite subjects.

    On the other side of my life, I have, pretty much over the same time frame, become somewhat enamored (okay, addicted) to Disney World. I have a very serious planning method for a Disney vacation, because it is a very different world where I spend thousands of dollars to be educated and entertained. It is very important to the enjoyment of a trip to the Happiest Place on Earth that you know what you are going to do and be sure you are prepared ahead of time. As I was considering the challenge of blogging about the summit this year, I realized the Summit is also a place where I spend thousands of dollars to be educated and entertained. While the similarities are fairly superficial, much of the high level advice is the same.

    So I figured I would put together a brief (at least for me, that is) set of tips about how to plan for PASS whether this is your first or fourteenth trip to PASS.

    1. Start planning early

    A hallmark of a great Disney Vacation is that planning is at least 20% of the fun. For PASS, this is no different. Picking a set of sessions you want to attend will start to give you an idea of how the conference will be for you. It will also help you discover a bunch of stuff to look into before the conference even starts. Discovering additional sessions like pre-conferences may extend your conference time as well. Finally, planning your sessions will remind you of what videos to watch after you get home (see next tip.)

    As you do your planning, take some time to do a search for the speaker's names. You may discover even more training and social interaction from most of the speakers that will help you get in the "in" crowd before you even arrive. Hint: twitter is a great tool to have at the Summit. Once you follow a few people who are seasoned attendees, you will discover a cornucopia of information about the whole experience.

    Which leads me to the next planning point. Once you have the planning done that your boss will appreciate (and a hotel room, as the good ones fill up fast), next up is your social calendar. If you are a social sort of person, you could keep yourself busy from sun up to sun down. For starters, there will be at least three PASS social events: the opening reception, the vendor event, and a party. Then there are a host of after-hours parties some of which PASS has a list of, but this is just the start. Finding out what is available will really help you make the most out of the week.

    2. Be prepared to chuck the plan

    While planning the trip is (for nerds like me) a large percentage of the fun, few of your plans will be binding. One of my first years, I had chosen a set of sessions, but wandered by this big room where someone named Kimberly something (Tripp, I think it was :), was talking about indexes. I learned a bucketful that day that I may never have known. Little did I know at the time she was a SQL celebrity :)

    If you don't just keep to yourself at lunch, you may find yourself invited to a private party you didn't even know about, or perhaps hear about a speaker that you just have to go hear. Of course, if you are more like me, you may just be tired and want to get under the covers early so you can make it to the keynote early in the AM. It is your conference, make sure you get what you want out of it.

    3. Pack like you will be bringing stuff home with you

    If you are one of the many unlucky attendees who will travel by air, you know that bringing home stuff can be a pain. On a recent Disney vacation, we didn't think we would buy anything substantial...but then we found "it" and bought "it". (and several other its, and we bought them too!) When it came time to go home, our stuff just wouldn't fit in our suitcase even though we thought we had planned to have enough space for purchases. So we bought another suitcase, making our purchases doubly expensive (though we did get an interesting piece of luggage!)

    For PASS, even if you don't purchase a thing (and there is lots of shopping in the area surrounding the conference center; for example, I just learned today that Timbuk2 has a store in the same block as the convention center. Another interesting tourist location is Pike Place Market, which is just down the street. Beyond purchasing stuff, there is swag to be had everywhere. Usually there is something from PASS at registration. Add to that anything from toys, water bottles, shirts, and books are regularly given out by vendors. I don't think there has been a year when I didn't come home with at least one new book.

    Sometimes it can be something a lot more interesting than just swag. Two years ago, on the last day of the show, the Microsoft Store was deep discounting computers and if I had had luggage space, I could have had a great tabletop computer for 1/2 price. Still reeling from that one a little bit.

    My advice here is to think about packing a soft bag that you can fill with clothes to make space in your carryon/checked bag to bring back extra stuff. Of course, it depends on the airline what you can do, so check ahead. I pretty much always fly Southwest so adding a second bag for the return flight doesn’t cost extra. Your mileage may vary.

    4. Get your dieting done before you get there

    Immediately starting at the Welcome Reception, the food at PASS is generally quite acceptable. The regular meal food at the conference is generally served buffet style. Usually pretty good steak, chicken, vegetables, desert, salad, bread, etc for lunch. Then there are generally a lot of heavy appetizers for grazing to be done at all of the receptions.

    In addition to the PASS sponsored and after hours parties, there are some great restaurants in the Seattle area, even just in the immediate walking area. I have had amazing steaks, seafood, and other great meals during the conference. If you are hungry and don't want a long hike, the Daily Grill that is in the Sheraton next door to the conference center serves a very nice meal if you find yourself with nothing scheduled for the night.

    5. Leave your turtle shell at home

    My first trip to PASS I was basically a turtle. The only time I spoke to anyone else was when I was giving my presentation. Otherwise, I kept to myself, not talking to anyone. The same was true of my first Disney World trip. In both I have learned over the years to interact with people. At Disney, the cast members (staff) who work there are extraordinarily friendly. They will help you out, give you directions or information, or just talk to you about their job. Even interacting with other guests when stuck in a line somewhere has netted me interesting information, and helped to pass the time.

    In the same way the many volunteers that work with PASS are even more friendly than anyone at Disney. Why? Because that is what they do. We all volunteer our time with PASS not because we have been compelled to, but because we love SQL Server, and we have a community of people who largely work together.

    I don't know how many people I have talked to who believed that the speakers and authors (and MVPs) who were at the conference would be unapproachable. Nothing could be further from the truth. I can't vouch for every speaker, author, and MVP who will be in attendance, but I really don't know any of them that are unapproachable. In fact, I know so many current MVPs who just started out by just talking to people at the conference, getting ideas about what to do.

    6. Dress appropriately

    Almost everyone will wear a tie most days… Wait, what? No, nothing like that. One of the main ways that a Disney Vacation is like a PASS Conference is the tons of walking. The Conference Center is not at all small and there are a bunch of sessions going on simultaneously. At the receptions and parties, there will be very little seating available, so standing for hours is also possible. So be prepared to walk and stand more than normal (especially if you are like me and telecommute.)

    The hardest part of packing clothes is that the weather in Seattle can be somewhat unpredictable. It can be chilly (50 degrees) and rainy, or it can be warmer and sunny. Carrying around an umbrella gets old, or wearing a coat gets cumbersome, but wearing wet clothes is someone more annoying. There is usually a coat check you can use to store your coat for the day before you head back to your hotel.

    7. Consider if you want to be get involved

    Ok, so this one is really quite a bit different than a vacation. While there are communities that surround the Disney experience, the fact is, the number of SQL Server devotees is far smaller than the number of Disney fans (sad but true.) The fact is, there are hundreds of ways you can get involved with the SQL Server community and make an impact on people's careers (and in almost every case, one of the prime careers that will be impacted will be your own… positively if you don't go too crazy and start spending your entire work day on non-work activities.)

    The range of ways to get involved is truly amazing. There are simple ways to get involved like as a member/leader/speaker for a local user group, virtual user group, SQL Saturday, or (and don't tell PASS I said this) one of many other non-PASS affiliated groups/events out there. Even if this volunteering business isn't your bag, you could answer questions in forums (MSDN's forums, just to name one of many), #sqlhelp hashtag on Twitter or start a small blog and share whatever you know. Don't be shy, no matter your level there are people smarter and less smarter than you, and if you have wrong ideas, you will likely be corrected but rarely treated as if you were made of charcoal. Some of my best learning came from being corrected in the forums, back when we called them NNTP newsgroups.

    Just bear in mind that while your career can be impacted in very positive manners (your community involvement becomes a powerful addendum to your formal resume), it will always behoove you to be professional. For example, if you see that one of my blogs is in error, the proper wording is: "I think you have an error here: Technically, the blah blah isn't blah until..". On the other hand, "Louis: You are an idiot. A three year old knows that blah blah is not blah until…!" See the difference? Exactly, it is important to use contractions like isn't instead of is not to save space on the Internet. Employers love that attitude.

     

    Ok, so that is my rather simple advice to you if you are heading out to the PASS Summit along with me. Disagree? Other ideas? Leave a comment or find me at the Summit and we can talk about it.

  • SQL Saturday Birmingham #328 Database Design Precon In One Week

    On *August 22, I will be doing my "How to Design a Relational Database" pre-conference session in Birmingham, Alabama. You can see the abstract here if you are interested, and you can sign up there too, naturally. At just $100, which includes a free ebook copy of my database design book, it is a great bargain and I totally promise it will be a little over 7 hours of talking about and designing databases, which will certainly be better than what you do on a normal work day, even a Friday.

    You can read the abstract , but what should you really expect?  First off, expect to not sit in one spot while I drone on and on for 7 hours of lecture  (not that I couldn't go 7 hours straight just lecturing while only inhaling every 10 minutes, which would be actually be a little bit easier to prepare, I assure you).

    Rather, there is quite a bit of interaction as a class and in small groups, giving you not only some of the foundational information, but a bit of practical experience too. (Plus, the reason I insisted in buying ebooks for everyone was to let the attendee have the “full” picture right on their laptop or tablet device after we are finished with class.)

    The day is broken up into 3 or 4 modules, with several participation exercises along the way. The sections are:

    Section 1: Introduction with a bit of history

    We start with just enough introduction to the materials, the stuff you need to do *before* you design, and introduction to the history of the craft to make sure we are all on the same page. A big part of this section is just getting it straight why we design like we do.

    Section 2: Modeling and structures

    This section will cover the fundamental building blocks of relational databases, like tables, columns, keys, etc; and how to create a data model of the constructs. This is by far the largest part of the lecture, and by the end we should all be on the same page as to what goes into the database, if not exactly "how" the final product should look.

    We will stop at this point, and I will get out my modeling camera (which sounds a LOT more glamorous than it will turn out to be) and we will do some modeling on paper, eliciting attendees to provide the parts of a simple database, and we will all decide what should go into the model.

    The document that I base this model on is VERY simple, but the complexities of translating the document to a base database design are always quite interesting to experience as a class, and I get an idea of who is going to be the outgoing class members at this point too.

    Section 3: Model Standardization (Normalization)

    In this section, we will look at the kinds of things you need to do to the model to prepare the model to be implementable by truly analyzing the structures to see if they make "sense" within the confines of the relational model. It is always interesting to me that most models are normalized to some degree for simplicity, but people think that normalizing makes things slower. And the misconceptions about the higher normal forms make even less sense…

    Once we are done with the slide material in this section, we will start a couple of exercises. The first exercise is planned as a full class exercise, where I will man the data model (first on paper, then in a modeling tool), and elicit input from the class, in a manner that make sure everyone gets a say.

    Then we will break up into small teams and build a final model on paper, which I will bring up to the projector and we will discuss the different solutions.

    Section 4: Physical Modeling Overview

    Assuming we still have time/energy, we will take the last part of the class and cover turning the model into a "real" database. Data types, domain implementations, constraints, testing, etc. will be covered.

    Due to the limitations of the ~7 hour format, and a *strong* preference of previous classes towards actually doing some design, there are topics we won't cover. But honestly, if you can get the basic design correct and make the model
    look like what the final model needs to, the rest is kind of gravy (and well documented in a lot more websites and books than mine!) I spend a lot more time figuring out the shape of the database objects than I do on the implementation aspects because the problems with database designs are almost always an inability to match the user’s requirements more than missing an index here and there (and that is why God created DBAs & Data Programmers).

    What I really love about doing all of the designs is that we really get the flavor of a real design meeting. A few differing opinions, a few ideas I hadn't planned for, and a few argumentative types who really want their own way. But none of the arguments so far have gotten out of hand so far, and they have all been very much like the typical data modeling meeting.

    I hope to see you in class!

    * Note, this originally said September 22. If you want SQL training with me on that date, you will need to join me in Orlando's version of Asia, on Expedition Everest.

  • How In-Memory Database Objects Affect Database Design: Hybrid Code

    In my first attempts at building my code, I strictly went with either native or on-disk code. I specifically wrote the on-disk code to only use features that worked in-memory. This lead to one majorly silly bit of code, used to create system assigned key values. How would I create a customer number that was unique. We can’t use the Max(value) + 1 approach because it will be very hideous with MVCC isolation levels, since 100 connections might see the same value, leading to lots of duplication. You can’t see other connections, so you would duplicate data quickly.  I was also limited to not using sequence objects because they too are not allowed in native code.

    So, I used a random number generator, like this:

    DECLARE @CustomerNumber CHAR(10)
    WHILE 1=1
    BEGIN
        SET @customerNumber = 'CU' + RIGHT('00000000' + CAST(CAST(100000000 * RAND() AS INT) AS VARCHAR(8)),8)
       
        IF NOT EXISTS (SELECT * FROM Customers.Customer WHERE CustomerNumber = @CustomerNumber)
            BREAK

    END   

    This actually worked quite well for the tens of thousand rows I generated. The likelihood of two connections getting the same random value, and getting past the NOT EXISTS block was very unlikely. Now, obviously the probability of clashes will rise greatly as you approach the limits of the 8 digit number, so you would have to monitor usage and change the CU to something else to increment the value. This is demo code, and as this blog is about, there are easier ways. But it was fun to write and test.

    Of course the worst part of this code isn’t the random number generator, or even the looping (oh, the dreaded looping), no this code is not very optimal, because of the NOT EXISTS subquery.  Best case we have to do one probe into the table to see if that value doesn’t exist.  While it was sub optimal in interpreted code, in native code, it got more silly looking because you can’t break out of a while loop, and you can’t use subqueries (nor the RIGHT function). So the code changed to the following awkward (yet operational) bit of code:

    DECLARE @customerNumber CHAR(10), @rowcount INT, @keepGoing BIT = 1, @baseNumber VARCHAR(30)
    WHILE @keepGoing = 1
    BEGIN
        SET @baseNumber = '00000000' + CAST(CAST(100000000 * RAND() AS INT) AS VARCHAR(8))
        SET @customerNumber = 'CU' + SUBSTRING(@baseNumber,LEN(@baseNumber) - 8,8)
       
        SELECT @rowcount = COUNT(*) FROM Customers.Customer WHERE CustomerNumber = @CustomerNumber
        IF @rowcount = 0
            SET @keepGoing = 0

    END   

    The inefficiency of this code is the same as the interpreted code: that query on CustomerNumber.  Of course, the more typical solution to the problem of a system generated key would be to use a SEQUENCE object (possibly as a default), and format the number somehow. But you can’t use sequences in native code, so instead of going fully native code, I am using a hybrid approach.

    First, I built a procedure that generated CustomerNumbers, it has a simple checkdigit appended to a 7 digit number (preceded by CU) (and sloughing off any customer numbers with 666 in the value from the string just to show what can be done).

    CREATE SEQUENCE Customers.Customer$CustomerNumber$Sequence
    AS INT
    START WITH 1
    go

    --use a sequence and format the output a bit to avoid lots of duplication
    CREATE  PROCEDURE Customers.Customer$getNextCustomerNumber
        @customerNumber char(10) OUTPUT
    AS

        --doing it in a loop is the simplest method when complex requirements.
        WHILE (1=1)
         BEGIN
            --Get the base account number, which is just the next value from the stack
            SET @customerNumber = 'CU' + right(replicate ('0',8) +
                        CAST(NEXT VALUE FOR Customers.Customer$CustomerNumber$Sequence as varchar(7)), 6)

            --add a check digit to the account number (take some digits add together, take the first number)
            SELECT @customerNumber = CAST(@customerNumber AS varchar(8)) +
                RIGHT(CAST(
                   CAST(SUBSTRING(@customerNumber, 3,1) AS TINYINT) +
                   POWER(CAST(SUBSTRING(@customerNumber, 5,1) AS TINYINT),2) +
                   CAST(SUBSTRING(@customerNumber, 8,1) AS TINYINT) * 3 +
                   CAST(SUBSTRING(@customerNumber, 9,1) AS TINYINT) * 2 +
                   CAST(SUBSTRING(@customerNumber, 10,1) AS TINYINT) +
                   CAST(SUBSTRING(@customerNumber, 11,1) AS TINYINT) * 3  AS VARCHAR(10)),1)

            --if the number doesn't have these character string in it (including check digit)
            if            @customerNumber NOT LIKE '%00000%'
                    AND @customerNumber NOT LIKE '%666%'
                BREAK -- we are done
         END
    GO

    Pretty simple, just loops until a good number is found. Not looping through rows, but looping through sequence values, which is very fast and can be done by many simultaneous callers with almost no contention.  If contention is an issue, you can choose your caching for a sequence object, which can avoid a couple of writes when it persists that you have burned through the previously cached values.

    So now, the code simply says:

    DECLARE @CustomerNumber CHAR(10)
    EXEC Customers.Customer$getNextCustomerNumber @CustomerNumber OUTPUT

    Rather than that loop. But I can’t use that in a natively compiled procedure, so we create an interpreted procedure that calls this procedure, then calls the native procedure:

    CREATE PROCEDURE Customers.Customer$CreateAndReturn
    @FirstName NVARCHAR(30),
    @LastName NVARCHAR(30),
    @MiddleName NVARCHAR(30),
    @EmailAddress NVARCHAR(200)
    AS

    BEGIN
        SET NOCOUNT ON

       --see if the customer exists… We don’t do updates here
        DECLARE @customerId INT = (SELECT CustomerId
                                    FROM  Customers.Customer
                                    WHERE EmailAddress = @EmailAddress) --we are assuming validation is done elsewhere

        IF @customerId IS NULL
            BEGIN
                DECLARE @customerNumber CHAR(10)
                EXEC  Customers.Customer$getNextCustomerNumber @customerNumber OUTPUT

                       
                EXEC @CustomerId = Customers.Customer$SimpleInMemCreate
                @CustomerNumber = @CustomerNumber,
                @FirstName = @FirstName,
                @LastName = @LastName,
                @MiddleName = @MiddleName,
                @EmailAddress = @EmailAddress

            END

          RETURN COALESCE(@customerId, -100)
    END
    GO

    I haven’t added error handling just yet, but this is nearly the final version. The procedure to do the actual insert is just a simple insert using native compilation:

    CREATE PROCEDURE Customers.Customer$SimpleInMemCreate
    @CustomerNumber CHAR(10),
    @FirstName NVARCHAR(30),
    @LastName NVARCHAR(30),
    @MiddleName NVARCHAR(30),
    @EmailAddress NVARCHAR(200)
    WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
    AS BEGIN ATOMIC WITH
    (
    TRANSACTION ISOLATION LEVEL =  SNAPSHOT, LANGUAGE =  N'us_english'
    )
            DECLARE @customerId int
                INSERT INTO Customers.Customer
                        ( FirstName , MiddleName ,LastName ,
                            CustomerNumber ,  EmailAddress,
                            RowCreateTime, RowLastModifiedTime
                        )
                VALUES  ( @FirstName , @MiddleName ,@LastName ,
                            @CustomerNumber ,  @EmailAddress,
                           SYSDATETIME(), SYSDATETIME()
                            )
                SELECT @customerId = SCOPE_IDENTITY()


          RETURN isnull(@customerId, -100)
        END
    GO

    So we get the benefits of the compiled procedure (if there is any in the actual case, my demo code is fairly simplistic) coupled with anything in the interpreted code that could not be done in native mode.

  • Speaking on 7/25 for the Nashville SQL Server User Group: How In-Memory Database Objects Affect Database Design

    So I have blogged about it, and I have prepared for it, and next Friday at lunch time I will be unveiling my new presentation. The location/other details can be found here: http://nashville.sqlpass.org/Home.aspx, but the abstract is:

    How In-Memory Database Objects Affect Database Design
       

    With SQL Server 2014, Microsoft has added a major new feature to help optimize OLTP database implementations by persisting your data primarily in RAM. Of course it isn't that simple, internally everything that uses this new feature is completely new. While the internals of this feature may be foreign to you, accessing the data that uses the structures very much resembles T-SQL as you already know it. As such, the first important question for the average developer will be how to adapt an existing application to make use of the technology to achieve enhanced performance. In this session, I will introduce the concepts of In-Memory Database objects, discussing how the design is affected by the new technology. As a basis for the presentation, I will start with a normalized database, and adapt the logical and physical database model/implementation in several manners, performance testing the tables and code changes along the way.

    It is not exactly what I have envisioned for the presentation for the slightly distant future, but I am pretty pleased with where it is right now. I decided that since this was such a new feature, it is very likely that people would not be well enough acquainted with the subject for me to ignore the introductory aspects. So while I originally planned to dive right in, I have added a lot of introductory material to explain the features enough first to make sure that the design aspects I will cover make sense no matter your level with the in-memory features.  I plan to use the same format with some flexibility if I do this for a SQL Saturday later this year, and certain so when I do the presentation at Devlink. Luckily at Devlink I have another 15 minutes to work with, so 15 more minutes of code comparison will hopefully fit the needs of the more programming oriented attendees at Devlink.

    Of course, I am not done blogging about a few additional aspects I have come up with, but with a week and a half to go before I present it, more writing on the presentation has been the goal.

     

  • How In-Memory Database Objects Affect Database Design: Uniqueness

    This part is part of an ongoing series of blogs I am writing while preparing to give a presentation based on the prefix of this blog's title. It is a work in progress as I explore the effects of the new in-memory model. I would love comments that tell me that I am wrong (especially if it turns out that I am!)

    The first issue that arose in my tests was with the new concurrency model. I started out with the stored procedure I will show later in the entry. It sees if a customer exists, and if not create it. The table is the customerAddress table, with the following structure:

    image

    With on-disk structures, I have a uniqueness constraint on the Address columns, along with the customerId. The customerId is the first column in the constraint declaration, so the performance was great. I don’t use a transaction or error handling (yet) because I want to simulate what a normal application might do. I put the code in stored procedure form because, well, it will just be easier to manage for me. (Plus we can change that code to be proper a lot easier, and when I build the natively compiled procedures, things will be easier to simulate.

    Basically, the gist is, look up the address using all of the address columns that are in the uniqueness constraint. If you get a customerAddressId, return it, if not, create the customer and then return it, using the alternate key values:

    CREATE PROCEDURE Customers.CustomerAddress$CreateAndReturn
    (
        @customerId INT,
        @Line1  NVARCHAR(30),
        @Line2  NVARCHAR(30),
        @City   NVARCHAR(30),
        @ZipCode VARCHAR(20),
        @USStateCode CHAR(2),
        @AddressTypeCode NVARCHAR(20)
    )
    AS
    BEGIN
        SET NOCOUNT ON
        DECLARE @customerAddressId INT = (
                                         SELECT CustomerAddressId
                                          FROM  Customers.CustomerAddress
                                          WHERE  CustomerId = @CustomerId
                                            AND  (Line1 = @Line1 OR (Line1 IS NULL AND @Line1 IS NULL))
                                            AND  (Line2 = @Line2 OR (Line2 IS NULL AND @Line2 IS NULL))
                                            AND  (City = @City OR (City IS NULL AND @City IS NULL))
                                            AND  (ZipCode = @ZipCode OR (ZipCode IS NULL AND @ZipCode IS NULL))
                                            AND  USStateCode = @USStateCode
                                            AND  AddressTypeCode = @AddressTypeCode ) --we are assuming validation is done elsewhere

        IF @customerAddressId IS NULL
            BEGIN
                INSERT  INTO [Customers].[CustomerAddress] ( [CustomerId], [Line1], [Line2], [City], [ZipCode], [USStateCode], [AddressTypeCode] )
                VALUES  ( @CustomerId, @Line1, @Line2, @City, @ZipCode, @USStateCode, @AddressTypeCode )

            SET @customerAddressId = (
                                              SELECT CustomerAddressId
                                              FROM  Customers.CustomerAddress
                                              WHERE  CustomerId = @CustomerId
                                                AND  (Line1 = @Line1 OR (Line1 IS NULL AND @Line1 IS NULL))
                                                AND  (Line2 = @Line2 OR (Line2 IS NULL AND @Line2 IS NULL))
                                                AND  (City = @City OR (City IS NULL AND @City IS NULL))
                                                AND  (ZipCode = @ZipCode OR (ZipCode IS NULL AND @ZipCode IS NULL))
                                                AND  USStateCode = @USStateCode
                                                AND  AddressTypeCode = @AddressTypeCode )

            END

          RETURN COALESCE(@customerAddressId, -100)
    END
    GO

    My test data resides in a side database, with a lot of duplicated customers and addresses. The goal is to do a good number of reads, but not make the code so complex that we start to deal with slight variances in addresses. These are the kinds of things one might do in an ETL process, or certainly as a background process in their OLTP system.  For this example, I want to avoid duplication, but only if it is 100% duplicated.

    The code is called in batches of four inter connected sets of procedures. Each returns a surrogate key via a return statement (it isn’t 100 finished, as I haven’t really handled stuff like –100 being returned, even though it really couldn’t occur with my current data.):

    DECLARE @CustomerId INT
    EXEC @CustomerId = Customers.Customer$CreateAndReturn @firstName = 'Tyler', @MiddleName = 'R', @LastName = 'David', @EmailAddress = 'Tyler@email.com'

    DECLARE @CustomerAddressId INT
    EXEC @CustomerAddressId = Customers.CustomerAddress$CreateAndReturn @customerId = @CustomerId,@Line1='9 Gordon Highway',@line2='Apt 296',@city='Buffalo',@ZipCode='81254',@USStateCode='MO',@AddressTypeCode='Office'

    DECLARE @SalesOrderId int
    EXEC @SalesOrderId = Sales.SalesOrder$Create @CustomerId=@CustomerId,@CustomerAddressId=@CustomerAddressId,@SalesOrderStatusCode='New'

    EXEC Sales.SalesOrderItem$Create @SalesOrderId=@SalesOrderId,@SalesOrderItemNumber=48904,@Quantity=3.6386,@UnitPrice=14.57,@ProductPriceId=3779
    EXEC Sales.SalesOrderItem$Create @SalesOrderId=@SalesOrderId,@SalesOrderItemNumber=98015,@Quantity=3.0596,@UnitPrice=5.31,@ProductPriceId=1043

    I have 4 files of these stored procedure batches, each file having 10,000 of batches in it. I start them all at the same time using my WaitFor Sync FrameWork I blogged about a few year’s back. When I was using the on-disk tables, not once did this code fail to work, and not once did I get an error, even though I certainly could have if a duplicated row was inserted:

    DECLARE @customerAddressId INT = (
                                         SELECT CustomerAddressId
                                          FROM  Customers.CustomerAddress
                                          WHERE  CustomerId = @CustomerId
                                            AND  (Line1 = @Line1 OR (Line1 IS NULL AND @Line1 IS NULL))
                                            AND  (Line2 = @Line2 OR (Line2 IS NULL AND @Line2 IS NULL))
                                            AND  (City = @City OR (City IS NULL AND @City IS NULL))
                                            AND  (ZipCode = @ZipCode OR (ZipCode IS NULL AND @ZipCode IS NULL))
                                            AND  USStateCode = @USStateCode
                                            AND  AddressTypeCode = @AddressTypeCode ) --we are assuming validation is done elsewhere

        IF @customerAddressId IS NULL
            BEGIN
                INSERT  INTO [Customers].[CustomerAddress] ( [CustomerId], [Line1], [Line2], [City], [ZipCode], [USStateCode], [AddressTypeCode] )
                VALUES  ( @CustomerId, @Line1, @Line2, @City, @ZipCode, @USStateCode, @AddressTypeCode )

    Time is the enemy. Time passes between the fetch of the CustomerAddressId and the insert. If another connection inserts the row in those moments between these statements, you could get a duplicate key error. Which I accepted as a possibility. If I finished this code for on-disk, I would handle that error by refetching the key. No problem.

    When I changed my table to an in-mem table (which I will talk more about the actual structure in a blog or two), I immediately got duplicated data. Why? Three reasons.

    1. The in-memory code ran 300% faster with little optimization.

    2. No UNIQUE key constraint. I put a hash index on the customerId so the WHERE clause would use a simple seek operation, but with only a PRIMARY KEY constraint, there was nothing to stop the duplicate

    3. The no locking optimistic concurrency control. (I won’t go into too much detail, but read this if you haven’t yet: http://sqlblog.com/blogs/kalen_delaney/archive/2013/10/20/sql-server-2014-in-memory-oltp-hekaton-whitepaper-for-ctp2.aspx)

    Each of these factors figured into the issue. Since the code is faster, it is more likely that we will have collisions. This was certainly a factor, as in my first tests, I forgot to index the customerId, and the code ran twice as slow as the on-disk version, and there were no collisions.

    Without a uniqueness constraint, the rows will get created with no issue, even if you accidentally get duplicates to create. The lack of constraints is one of my least favorite parts of the whole in-memory structures.

    Using Multi-Valued Concurrency Control (MVCC), even while the row is being inserted (or really, as long as the transaction has not been committed), other connections can execute the search for the address and get back no results. In the default isolation level, access is in the basic snapshot isolation level. The second connection sees how the table is when they start the transaction.  Even if I escalated to REPEATABLE READ or SERIALIZABLE, it wouldn’t eliminate duplication, since the new row’s uniqueness is defined as a surrogate key and there would technically be no collision. Is using a surrogate key the best solution? It isn’t feeling like it for these tables, but you know what, this is the norm for most people. I could easily just chuck the primary key on the natural key here, and let the identity be the possible (and far less likely) duplication point. But that seems like cheating, and what if a table has 2 natural key choices, or 3, or more. You would be stuck again with this issue (and it will not be easy to solve.)

    Fortunately, this code here is going to fail IF multiple rows are committed by the time it executes:

    SET @customerAddressId = (SELECT CustomerAddressId
                                      FROM  Customers.CustomerAddress
                                      WHERE  CustomerId = @CustomerId
                                        AND  (Line1 = @Line1 OR (Line1 IS NULL AND @Line1 IS NULL))
                                        AND  (Line2 = @Line2 OR (Line2 IS NULL AND @Line2 IS NULL))
                                        AND  (City = @City OR (City IS NULL AND @City IS NULL))
                                        AND  (ZipCode = @ZipCode OR (ZipCode IS NULL AND @ZipCode IS NULL))
                                        AND  USStateCode = @USStateCode
                                        AND  AddressTypeCode = @AddressTypeCode )

    Since a subquery can only return 1 value and not cause an error. But with MVCC, this too could be an issue if we want to put it in a transaction and rollback on an error.  Sadly however, it will not be so easy to fix because it would only fail if the transaction with the insert has been committed.

    As I start to think about how to solve this problem, the solution is going to have to be very different than with disk based tables. There, if I threw a transaction around the search and the insert (and possible a few indexing hints to block multiple readers), I could stop the duplication easily. However, in all of the MVCC isolation levels, readers are never blocked. I could use an application lock to single thread the inserts, but that would definitely not be the most efficient operation, and it would not work at all in managed code.

    In reality, there is another more common solution (as I expect a lot of people wouldn’t even have the alternate key on the address to start with). Just give in and accept the duplicates as part of high performance business. We can minimize the damage by changing the two subqueries to:

    SET @customerAddressId = (SELECT MIN(CustomerAddressId)

    So we always try to use the first one. With a low cardinality table like a customer’s address, you will only get back a few rows at most, so the aggregate isn’t going to be terribly costly. And you build cleanup routines to reallocate duplicated addresses. This is something that you will need to do with the data in any case, since it is altogether possible that the user types: '9 Gordon Hwy' instead of '9 Gordon Highway', and you get duplicates you need to clean up anyhow. My current times to enter the 40,000 rows across 4 connections using the command line SQLCMD interface (the files crashed SSMS!) is around 12 seconds.

    When I get back to refining the code tomorrow, this is going to be my new tack for this (and perhaps the customer create) procedure. Minimize duplication, but go ahead and allow it. That problem is for the ETL Architect to deal with. (Now if much of my daily job wasn’t writing ETL, that would actually be comforting.)

  • How In-Memory Database Objects Affect Database Design: Or does it?

    This part is part of an ongoing series of blogs I am writing while preparing to give a presentation based on the prefix of this blog's title. It is a work in progress as I explore the effects of the new in-memory model. I would love comments that tell me that I am wrong (especially if it turns out that I am!) 

    Before I start describing what I have discovered so far along the way, let's make a few educated guesses. I have my data model from the previous blog entry (here), and I have a plan of attack for entering data. Basically, I plan to take the "normal" way that people deal with data like this, and enter data into each table separately, outside of and overarching transaction.

    So I will create or retrieve, the Customer first, then the CustomerAddress, then I will create the SalesOrder and ever how many SalesOrderItems that the customer (or my data generation using RedGate's Data Generator!) wants. I may dress it up over time, but for my initial experiments, this is the plan.

    For a database with UNIQUE constraints on alternate keys, and FOREIGN KEY constraints on relationships, this is all pretty easy and safe. I know I can't duplicate a customer, or violate foreign key constraints. How this will play out in the code is still just a bunch of guesses, with my only goal to basically be to not have the ETL architect trying to decide if running me over with a car is punishment enough for me introducing more bad data to deal with.

    But the title of the session ends in "...Database Design". The code isn't database design. Rather, as an architect, it is important to realize that the database design affects the code. So will we need different data structures or will the normal normal model suffice? My initial feeling is that the logical model doesn't change. The actual implementation details (such as the lack of foreign, unique, and check constraints) will be a difference, and the hardware/implementation layer changes immensely, and this affects your code in ways you had best understand.

    I "think" that what I determine will be that the basic data architects output remains rather constant. The rules of normalization (briefly described here), don't change at all. You still design the tables and columns based on the same factors of cardinality you would before. The physical model will be the same, and if your data modeling tool supports the in-memory structures, it is just a matter of choosing which tables should be in-mem and which don't really merit it, followed by a lot of testing (a lot).

    For my first tests, I only made the four "hot" tables in-mem:

     

    This design will work fine when dealing with interop code, even if I want to implement data integrity checks to the domain tables. If I want to use native code, then all of the tables used will need to be in-memory. Are data integrity checks that important? Well, they are to me, but not so much to a lot of folks who trust their other layers to get it right. My scenario, where all of the domain tables are "static" enable this scenario to work just fine. As long as ProductType never changes, and the other code layers have only the right values, you can easily say "this works" (as long as it does... and your code has been tested for what happens if anything crashes on any given statement to the database...which is not an easy task.).

    In my next post, I will share at least one of the effects the change to in-memory code has had on the code I am using, and what I am considering doing about it.

More Posts Next page »

This Blog

Syndication

Links to my other sites

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement