THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Enjoy Another Sandwich -- Kent Tegels

Yummy slices of SQL Server between slices of .NET and XML

  • Recommended readings: 2008 update

    In class this week, I got asked a few times about the technical books I'd recommed for fulks interested in the data space. While I've had a list up on Amazon for while, I thought I'd post an updated list here.

    I'd love to hear your suggestions, too!

    Business Intelligence

    • Data Analysis Using SQL and Excel: Gordon S. Linoff
    • Data Mining & Statistical Analysis Using SQL: John N. Lovett, Robert P. Trueblood
    • Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis : Culleen McCue
    • Decision Support and Business Intelligence Systems: Efraim Turban, Jay E Aronson, Ting-Peng Liang, Ramesh Sharda
    • Foundations of SQL Server 2005 Business Intelligence: Lynn Langit
    • Hitchhiker's Guide to SQL Server 2000 Reporting Services: Peter Blackburn, William R. Vaughn
    • Microsoft SQL Server 2005 Integration Services: Kirk Haselden
    • Microsoft SQL Server 2005 Reporting Services 2005: Brian Larson
    • Mining the Web: Discovering Knowledge from Hypertext Data: Soumen Chakrabarti
    • Report Builder & Report Models in Microsoft SQL Server 2005: Gerald Schinagl
    • Text Mining: Predictive Methods for Analyzing Unstructured Information: Shulom M. Weiss, Nitin Indurkhya, Tong Zhang, Fred Damerau
    • The Rational Guide to Scripting SQL Server 2005 Integration Services: Donald Farmer

    General data and programming topics

    • Beautiful Code: Leading Programmers Explain How They Think: Andy Oram, Greg Wilson
    • Code Complete: A Practical Handbook of Software Construction: Steve McConnell
    • Database in Depth: Relational Theory for Practitioners: C.J. Date
    • Joe Celko's Trees and Hierarchies in SQL for Smarties: Joe Celko
    • Learning WCF: A Hands-on Guide: Michele Bustamante
    • LINQ Pocket Reference: Joseph Albahari, Ben Albahari
    • Programming Cullective Intelligence: Building Smart Web 2.0 Applications: Toby Segaran
    • Programming Microsoft ADO.NET 2.0 Applications: Advanced Topics: Glenn Johnson
    • Querying XML, : XQuery, XPath, and SQL/XML in context: Jim Melton
    • The Pragmatic Programmer: From Journeyman to Master: Andrew Hunt, David Thomas
    • XQuery: Priscilla Walmsley

    GIS and spatial topics

    • A to Z GIS: An Illustrated Dictionary of Geographic Information Systems: Shelly Sommer, Tasha Wade
    • Designing Geodatabases: Case Studies in GIS Data Modeling: David Arctur, Michael Zeiler
    • GIS for Web Developers: Adding 'Where' to Your Web Applications: Scott Davis
    • Measuring Up: The Business Case for GIS: Christopher Thomas, Milton Ospina
    • Modeling Our World: The ESRI Guide to Geodatabase Design: Michael Zeiler
    • The ESRI Guide to GIS Analysis: Vulume 2: Spatial Measurements and Statistics: Andy Mitchell
    • The ESRI/University of Redlands Culloquium CD Set: Karen K Kemp
    • The Geospatial Web: How Geobrowsers, Social Software and the Web 2.0 are Shaping the Network Society: Arno Scharl, Klaus Tochtermann
    • Unlocking the Census with GIS: Alan Peters, Heather MacDonald

    SQL Server

    • A Developer's Guide to SQL Server 2005: Bob Beauchemin, Dan Sullivan
    • Accelerated SQL Server 2008: Rob Walters
    • Applied Microsoft Analysis Services 2005: And Microsoft Business Intelligence Platform: Teo Lachev
    • Dissecting SQL Server Execution Plans: Grant Fritchey
    • Expert SQL Server 2005 Integration Services: Brian Knight, Erik Veerman
    • Inside Microsoft SQL Server 2005: Query Tuning and Optimization: Kalen Delaney, Sunil Agarwal, Craig Freedman, Ron Talmage, Adam Machanic
    • MCITP Developer: Microsoft SQL Server 2005 Database Sulutions Design: Victor Isakov
    • Pro SQL Server 2005 Service Broker: Klaus Aschenbrenner
    • Professional SQL Server 2005 XML: Scott Klein
    • SQL Server 2005 Practical Troubleshooting: The Database Engine: Ken Henderson
    • The Rational Guide to SQL Server 2005 Service Broker : Roger Wulter
  • Come learn more about SQL Server 2008 Integration Services

    On 6 August 2008, I will be giving a presentation to the Omaha SQL/BI User Group on the new features in SQL Server 2008 Integration Services. We will talk about improvements in scripting, how the redesigned pipeline improves performance and how to leverage the new Cache Transform to improve package performance. Quinn Jones from Farm Credit Services America will also be giving us a great SQL tip. The talk will start around 1800 local time. We will be meeting at the Creighton University West Campus, located at 11111 Mill Valley Road (roughly 41.267786° north, 96.086289° west).

     The "3P requirements" for a successful meeting -- Pizza, Pop and Prizes --- will be available as usual.

    Please leave a comment if you would like more information.

  • Spatial Support Table

    Curious about how SQL Server 2008's Spatial bits compare with those in MySQL and PostGIS? Browse over to the Boston GIS web site review this neat comparsion table.

     http://www.bostongis.com/PrinterFriendly.aspx?content_name=sqlserver2008_postgis_mysql_compare

  • Hey SSIS team, help us out!

    Time to share a lesson learned. While working with new GEOMETRY and GEOGRAPHY data types in SQL Server 2008 there nested set of issues.

    1. Unless you have spatial data to work with, the use-cases for these data types are limited.
    2. Loading data from typical data sources, such as shape files, is probably out of the reach of the typical developer or database administrator unless you have a specialty tool for it.
    3. As helpful as SQL Server Integration Services typically is, it becomes a choke point in this scenario.

    What exactly is it the issue? Let us say that you have been tasked with developing a database to support decisions about where a company should build new storefront locations. A previous analysis of sales has shown that storefronts built in areas where a minimum population and a minimum percentage of the area’s population are within a 25-mile radius. How would you solve this problem?

    Clearly you need geographically-bound data and demographic data. In the United States, our Census Bureau provides such data in a relatively easy to consume format – if you are using traditional Geographic Information System (GIS) tools. For example, you can go the Bureau’s Web site and download one set of files that define the geographic boundaries of census areas and download another dataset that has the desired demographics. A straight-forward process for extracting the demographic data into comma-separated values makes preparing the data for consumption by SSIS exists.

    The Shapefile data is a different story. In SQL Server 2008 there is no out-of-the-box SSIS Data Source for reading them and I doubt we will see one before RTM. So you are left to find a third-party solution or write your own. While it is possible to write your own, this is not an option that should be considered lightly especially if the need to consume is one-off or few-off. Currently there is a relative dearth of open source/free software solutions available. One that I used with good success is Morten Nielsen’s stack off tools (see http://www.iter.dk/page/SQL-Server-2008-Spatial-Tools.aspx). Note, however, that this tool is not yet usable with RC0. When it comes doing serious spatial ETL, there is really one choice as far as I am concerned: Safe Software’s FME (http://www.safe.com/aboutus/news/2007/106/). I particularly like the fact they have gone to the work of writing SSIS data sources, transformations and transformations.

    Yet even that solution has a significant issue, its price tag. Serious GIS shops might not bat at an eye at the acquisition cost for licensing FME, but I can imagine the looks of sticker shock when other types of business owners see the price. This is not Safe’s fault – they have a good product and they have obviously made a substantial investment in.

    So what is the best solution? I hope the SSIS team realizes that the uptake of the spatial features in SQL Server is largely a function of how easy it is it to acquire and use data. This is really an unparalleled opportunity for the SSIS team. I cannot think of another pairing of SQL Server features where one team could so dramatically improve the usability of another -- and thus empower us more as customers -- than this. To do this, I hope the team does not take a “chicken and egg” mentality to the problem. That is, they need to do more than say “well, those features aren’t being used, so it doesn’t make any sense for us to invest much in them.” What this thinking fails to realize is that the features may not being used much is because of the acquisition problem discussed above. I am not saying “if they build it, the users will come.” But I am saying that “if they do not build, the SQL Team should not expect users to use these features.”

    If you happen to agree with me on this topic, I'd appeciate your vote on Connect issue 357045 (https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=357045

     

  • Call For Action: Spatial "geeks," please speak up!

    I've had a lot of "fun" working the with new spatial types in SQL Server 2008. Fun like your first root canal sometimes, fun like a great first date other times.

    One of the "root canal moments" for me has been around Geographic Markup Language (GML) support. I had spent a good chunk of time generating GML for use in class to subsequently learn that SQL Server's support for GML is "limited." Isaac Kunen was kind enough to point me to http://schemas.microsoft.com/sqlserver/profiles/gml/.

    Folks, please learn from my mistake -- understand that schema before you go about generating or consuming GML for the construction of geometry or geography instances.

    Speaking of best practices, another frequent pendant on the  MSDN Forums/SQL Server Katmai/SQL server Katmai Spatial forum has started a thread (http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=3586982&SiteID=1) addressing the differences between geometry and geography types.  

    I think he's off to a great start but I'd like to have the "the rest of us" who are interested in the spatial bits chime in on the tread.I'll even come out and say it. Please.

    July is going to be a busy month for me. On week of the 20th, I'll be teaching our Essential SQL Server class in Boston (see http://www.develop.com/us/training/course.aspx?id=180) and then doing a private engagement in the Sacramento area the next week. If you know of any user group/PASS group meetings in those areas around those areas, please let me know.

  • Upcoming talks on SQL Spatial and SSIS

    Yesterday I drove down from Sioux Falls to Omaha so that I could catch a talk being given by Sudhir Gajre to the Omaha SQL Server/BI Interest Group. I first got to know Sudhir when he, I and Luke Schollmeyer were restarting a SQL Server Users Group in Omaha. Sudhir is a stud at SQL Server performance tuning and helped write one of the best papers on it (see Microsoft SQL Server 2005 Tuning Tips for PeopleSoft8.x.

    I am the next scheduled speaker for that group, so I took a couple of minutes last night to ask them what they would like me to talk about. Response was a little slow, so I suggest that I could give my Spatial Computing with SQL Server talk. Almost no reaction.

    Somebody in the group suggest talking about the new MERGE statement. I liked that said, "What if I wrapped that into a talk about "What is new in Integration Services 2008." The room came back to life.

    So here's an initial outline of what that talk will probably cover:

    • The changes in the SSIS pipeline architecture for back-pressure and thread scheduling The scripting environment changes
    • Working with the Cache transformation
    • Using T-SQL MERGE with SSIS
    • Using Change Data Capture with SSIS
    • (time permitting) Using the Data Profiler task

    We clustered about for a while following Sudhir's talk discussing just how much BI is taking off for Microsoft and how useful a talk like this would be. I have to chuckle every time I get into a conversation like this: I my opinion, SSIS isn't a BI tool that developers just happen to be interested in, it's a Developer's tool that just happens to very helpful to the BI specialist. Really understanding and applying many of the changes for SSIS 2008 is made easier, I think, if you have a Developer mindset about it.

    I am going to submit the "Whats new in SSIS 2008" for the Heartland Developer's Conferences to be held later in the year in Omaha and Minneapolis too.

    If you are a user group leader and would be interested in having me come visit and give a talk, please feel free to contact me through this site.

  • The thing I hate about missing TechEd is getting the eye candy late

    But this eye candy is worth waiting for.  What I am talking about?

    I have spend that last weeks nose down in the Spatial bits for SQL Server 2008. I really believe this is the darkhorse super feature of the product. But a big problem with it has been visualizing the results. Unless you good at mentally visualing spatial relationships, it can be pretty hard to make sense of some the spatial based queries you are likely to want to use. Since CTP5, I have been using Morten Nielsen's SQL Server 2008 Spatial Tools quite happily.

    In Eye Candy of the Highest Order, IsaacK introduces the Spatial Results pannel for SSMS. In short, you no longer have to suffer with this:

    Because you render the data like this:

    Yum! The downside is that this won't be in RC0 but, per Isaac, should be in RTM.

    Issac has also started talking about some new features that will make constructing geo* instances easier with the new Builder APIs. Check out Our Upcoming Builder API for the details.

  • Missed it by *that* much: UDAs, Deployment, VS90SP1B1 and SQL Server 2008 CTP6 (Feb'08)

    Happy Spring everybody!

    As of late I've been heads down working on a new course on SQL Server 2008 and its been a lot of fun. So the other way when the Visual Studio team dropped SP1 Beta1 for Visual Studio 2008 (see here) I was all over it. And it took me longer than I expected to run into... well, its not so much a bug I think as it is a misimplementation of a new feature.

    One of the new features in SQL Server 2008 I am using quite a bit multi-parametered SQL/CLR User-Defined Aggregates. These are helpful in the statistical and spatial areas. For example, I just crufted up an UDA that takes a paired vector of latitudes and longitudes and constructs a LineString Geometry from them (see the code here). It is good use case where the ability to have a Large Value Type (e.g., a user-defined type or aggregate that persistence size of more than 8,000 bytes) and the ability to take in multiple parameters helps solve an otherwise cumbersome problem.

    But the "deploy project" feature chokes on those features. Specifically, it doesn't seem yet to know yet that we use a MaxByteSize of -1 to indicate that the are using an LVT ('MaxByteSize' property specified was not found), nor does it seem to know how to deploy an multiple parameter UDA (The Accumulate method in user defined aggregate "DM.Examples.LineStringBuilder" must have exactly one parameter.)

    I have posted these to Connect (See Item ID 344093) and if you agree with me that they should be fixed, please go vote here.

  • SQL Server 2005 Integration Services using Visual Studio 2005: A Beginner's Guide

    In short: This is an example of a good idea in the wrong format -- a good book if you are starting at absolute zero and need specific step-by-step procedures. Being printed hurts more than it helps.

     

    Let me begin by saying Kshipra Singh, from PACKT Publishing send me an e-mail via my blog with a simple request. In return for a complimentary copy of this book, would I post a review of it on my blog? I am hardly one to turn down a free book and I have been looking for one that I could recommend to my students and others who are just getting started with SQL Server Integration Services (SSIS). This, then, is my keeping of that bargain.

     

    Jayaram Krishnaswamy's book is one example of a growing class of books that I believe should not have been printed. It is not a bad book, but rather, the content of it would probably be a lot better suited to its targeted reader had it been an e-book or as a DVD screen capture with a video and voice-over by the author. As many technical books that I have seen of the years -- and like ones that will admit, I have written -- the text is step-by-step procedural and makes up volume with screenshots. Knowledge presented like this I found is best done live (where interaction encourages questions), next done best by recorded demonstration (to maintain step-sequential context) lastly in a scrollable, random access method like an e-book. Print books really do not allow those types of interactions

     

    Something else is missing too. Let me make an analogy. Let us say that you know very little about how to prepare a cake. You might already familiar with the hardware required: a stove, a pan and mixing bowl. You may also be familiar with the software needed: a cake mix, some eggs, some oil and some water. You can follow the directions in a cookbook (or the back of the mix box) and produce a cake. Just like the directions from the cookbook or box-back, this book is fair treatment of how to use the package designer to accomplish a given set of data integration tasks if that's need to do.

     

    However, what did really learn about "making a cake" from the process?

     

    Based on my personal experiences and those related to me by my students, the hardest part of "getting" SSIS is not how to accomplish a certain task, its understanding why the parts of it do what it does. Going back to our cake analogy, the cookbook does very little to explain why should mix the batter in a separate bowl before pouring into the baking pan. It does not bother to explain the delicate chemical interactions between the egg proteins, the salts, sugar, fats, water and heat that take place during the mixing and baking process. In just the same sense, this book comes up short when explaining essentials such as the dataflow pipeline, buffers and what many of the tasks and components actually do or can be used for.

     

    Nor is there much here that would help go from "baking a cake" to "baking bread." These are similar processes but with some different tools used and some different steps used. This book will help you do the tasks it talks about, but it offers little more than that, especially when it comes to analyzing a problem and synthesizing a different solution. In my opinion, those are the two most important things to master about SSIS and that is, ideally, what a beginner's book would cover.

     

    Of course, that might seem like I am saying that you should have to have a course in organic chemistry and thermodynamics before you can make a cake. Well, obviously, that is not the case. After you have baked a few cakes, learned from your mistakes and maybe even read a book like Shirley Corriher's Cookwise: The Secrets of Cooking Revealed you get enough down of science to be effective at baking cakes, breads and other pastries. If there is a parallel to Cookwise for SSIS, it is Kirk Haselden's Microsoft SQL Server 2005 Integration Services.

     

    So would I recommend this book? In certain cases, yes. If you need to do something which this book particularly addresses quickly and with a minimum of cost, it is a good fit. However, if need to invest your time and money in really learning SSIS because it is a key part of your project or job then no, I would not recommend it. There are more appropriate books to be had.

  • Shredding XML into tables

    Hey folks! Yes, I am alive, but as you can imagine, I am spending my time chewing on SQL Server 2008 and have limited time to post much. But today, I received an email that I wanted to post on since this is a common question.

    "How do I shred an XML document into a table."

    The big issue here is that there really is not one best way to do it. If you have a relatively small document and you want a simple solution, what I'm going to show here is a reasonably well suited. However, if you have a large document, or a large number of small documents, this approach is relatively inefficient and you probably should look at using an SSIS, BizTalk or custom developed solution. If you have schematically complex XML, BizTalk or a custom develop solution is probably going to be needed at some point.

    Then there is always the question "should you?" My answer is typically yes if your primary concern is query performance under load. The more transactional processing you need to do on the data encapsulated in the document, the more sense shredding makes. Conversely, if there is only limit query of the XML and reassembling the data into an XML format would be expensive, then you are probably better off leaving it "as is."

    Following are two example of bringing a small amount of XML into a table using SQL Server 2005 and SQL Server 2008. In the case of SQL Server 2005, the example simply inserts data into the target table without any handling or attempts to update existing matching rows.

    use scratch
    go
    if not OBJECT_ID('dbo.books') is null
     drop table dbo.books;
    go
    create table dbo.books(
     id tinyint not null primary key,
     sku nvarchar(15) not null,
     skuType nvarchar(10) not null,
     title nvarchar(100) not null,
     listPrice money not null);
    go
    set nocount on
    declare @booklist xml;
    select @booklist = bulkcolumn from openRowSet(bulk 'c:\booklist.xml',single_blob) as b;
    select @booklist;
    -- Yukon-style insert, you could do update similarly.
    insert into dbo.books
    select t.c.value('./id[1]','tinyint')
      , t.c.value('./sku[1]','nvarchar(15)')
      , t.c.value('./skuType[1]','nvarchar(10)')
      , t.c.value('./title[1]','nvarchar(100)')
      , t.c.value('./listPrice[1]','money')
    from @booklist.nodes('//book') as t(c);
    go
    select * from dbo.books;
    go 

    Transact-SQL for SQL Server 2008 adds a very useful new statement called MERGE. This statement allows us synchronize two data sources. While that may not seem to have much use in a case like this, we can actually use it like an "UPSERT" where existing data is update and new data is inserted into a destination table.

    declare @booklist xml;
    select @booklist = bulkcolumn from openRowSet(bulk 'c:\booklist.xml',single_blob) as b;
    with b2(id,sku,skuType,title,listPrice) as (
    select t.c.value('./id[1]','tinyint')
      , t.c.value('./sku[1]','nvarchar(15)')
      , t.c.value('./skuType[1]','nvarchar(10)')
      , t.c.value('./title[1]','nvarchar(100)')
      , t.c.value('./listPrice[1]','money')
    from @booklist.nodes('//book') as t(c))
    merge into dbo.books as b1
    using b2
    on(b1.id = b2.id)
    when matched then
     update set
      b1.sku = b2.sku
      , b1.skuType = b2.skuType
      , b1.title = b2.title
      , b1.listPrice = b2.listPrice
    when target not matched then
     insert (id,sku,skuType,title,listPrice)
     values (b2.id,b2.sku ,b2.skuType,b2.title,b2.listPrice);
    go
    select * from dbo.books;
    go

    The syntax of the MERGE statement takes a bit of getting used to, but once you have the hang of it, it's very useful. 

    Please feel free to send me a message using the link below if you are interested in downloading the example code and data. 

  • What the SSIS Masters Know, Part 3

    Shunryu Suzuki, a Zen master who came America in 1959, once wrote that "in the beginner's mind there are many possibilities, but in the expert's mind there are few." This definitely seems to be to of SQL Server 2005 Integration Services (SSIS). But not always in a good way. If you are coming to Integration Services from SQL Server 2000's Data Transformation Services there are a number of such "moments of Zen" waiting for you. One of the most significant ones is that the two products are designed around two different patterns. SSIS is designed to support the Extract, Transform and Load (ETL) pattern where and DTS is geared towards Extract, Load and Transform (ELT).

    The difference in these two is much more than simple word order. In simple terms, the ELT pattern relies on the power of a database store to do the work transforming extracted data from the source schema to the destination schema. The ETL pattern, conversely, normally uses a specialized engine to stream data from the source schema, transform it then load it to a destination database. Both patterns have different strengths and weakness. For example ELT maximizes the data store but normally lacks the ability to difficult transformations easily. ETL engines have more sophisticated transformation capabilities but tend to heavily tax system resources while running.

    This difference in patterns tends to frustrate those who are making the transition.

    But as most any Zen practitioner will tell you, Zen functions in non-duality. So to really understand the Zen of SSIS, we must understand that it functions in non-duality as well. True, SSIS is designed around the ETL pattern but that does not mean that you must -- or even should -- avoid using the ELT pattern with it sometimes. In this particular project, we have an example of that. One of the things that SSIS does not currently have a multi-value collection (e.g., a sorted key-value dictionary) that can be passed between tasks. In cases where we want that, it makes sense to leverage a database table.

    A hallmark of DTS is as being a simple tool for getting simple tasks done quickly. However, its limitations make doing complex tasks fairly hard. SSIS conversely makes doing complex things easier but because of the ramp up it takes to learn the new pattern and new tools, it is fair to say that it makes doing easy thing initially harder.

    I believe one the key things that the SSIS masters understand is this Zen. They know the duality of when to use what and how while understanding when not use something and why. They seek to understand the strengths and weakness of the patterns and the tools. At the same time, they seek to integrate these understandings into a highly effective suchness.

  • What The SSIS Masters Know, Part 2

    One of the first tasks we must do well as SSIS developers is really understand our source data. That seems rather obvious but it when we fail to do this task well, we frequently end up having to re-plan, re-work and re-do the packages we have designed. Generally speaking there are five facets to data that we need to have correct understanding of before building a package:

    • Schema and the sub-facets of cardinality and ordinality
    • Parse-ability
    • Volume

    Schema that is what entities we have, what attributes those entities have and positional relationship do the attributes have. When we think about source data entities, we natural gravitate towards understanding them in terms of their persistence medium. We have things like Comma Separated Value files, database tables and so on. I think that is somewhat dangerous -- data is more that its container. The important thing to understand here what whole thing is is that persisted trying to represent. Any given entity may have many attributes and there is no guarantee that any one persistence has all or even most of the attributes. In such cases, we must be especially careful about how we design our packages since integrating streams of entities from multiple data sources can become a very expensive process when done incorrectly.

    Attributes normally represent scalar facets of an entity. Attributes themselves will have many properties like name, value and the data type chosen to express that value in. A key but many times easily overlooked property of an attribute is if any given instance of that attribute is allowed to be undefined (null in database terms). A difficult and hard problem to solve occurs when the source data is allowed to have undefined values for a given attribute but its destination is not.

    For all but the most trivial of data integration projects, I feel it absolutely essential to gather and build a correct understanding of both the logical and physical data model of the sources and destinations before attempting to construct a package. Once this process is done, it becomes easier to see how entities and their attributes map. Do this at the logical level first, then at the physical level. This avoids not seeing the forest for the trees.

    Two other aspects of the Schema deserve special attention: Cardinality and Ordinality.

    By cardinality I ask what are the restrictions on distinct values occurring with an attribute or attribute set as well as what are the relations between entities. For example, most database products allow for constraints requiring unique values with one or more columns within the context of the table itself and between tables. If a destination does not allow for repeated attribute value but a source does, we know we will need to address this somehow in our work.

    A less commonly considered aspect of data, especially when persisted, is it ordinality. By this I mean what order the persisted values are in and especially if they are a given sort order. Consider a source -- simple CSV file -- that represents a key-value pair. One of the things we definitely want to know about that source is if the keys have been written in any order and, if so, is it ascending or descending. Of course, the persisted data may be written in such a way that preserves order by many attribute values. Certain SSIS transformations require sorted data. If a data source is already in order, then this one less expensive task we need to do in the transformation process.

    The importance of being able to parse a persisted data set properly should be obvious, yet one non-obvious thing about SSIS is that it really struggles with dynamic data sources. In cases where the data source where the schema is unknown in advance or varies within the persistence, we have to plan to do additional work. SSIS optimizes the data flow pipeline assuming a series of statically defined buffers are passed (in a sense) between components. It is ultimately up to us bridge the gap between a dynamic data source or destination and the static nature of the data flow engine.

    The sheer volume data must be considered. The default settings for memory and CPU allocation are based on source data load of one gigabyte and small number of transformational steps. Specific transformations like sorting and aggregation can easily and rapidly consume memory to the point of exhaustion for larger data sources. Identifying such potential bottlenecks before coding the solution can make the package development process go faster.

    I have four data sources from my demonstration:

    • Two XML files containing information about known postal codes in the United States, one of the files suffers from truncated data for the Postal Code.
    • A Comma Separated Value formatted file representing a list of Postal Codes, City, State and County names.
    • A list of State names derived by query from the AdventureWorks database

    The destination is a SQL Server database and this Postal Code data will be loaded into three tables:

    • States, key by the State's abbreviation and containing the name of given state
    • Cities, keyed by a CityID and having the name of the city and the abbreviation for the matching state
    • ZipCodeList, key by the Postal Code and having a CityID and the Latitude and Longitude

    The States table is populated by query during the construction of the database using the AdventureWorks Database. The other two tables will be populated from the CSV and XML files. However, there are two problems found in that data:

    • In one of the XML files, the Postal Code for some locations were written without a leading zero
    • There are more Postal Codes known in the XML files than in the CSV file, so we will need to invent a way discover the city and state that is most local to that Postal Code. In the presentation, I will show a way to do this using Microsoft's MapPoint 2006 via its COM APIs.
  • What The SSIS Masters Know, Part 1

    Hello world and yes, I am alive. I have finally caught up enough to begin regularly blogging about SQL Server again. It is good to be back!

    As you may know, I will be giving a talk at this year's Heartland Developers' Conference. (http://www.heartlanddc.com) My presentation in entitled "What the SSIS Masters Know." I picked this topic since I feel that many developers are not aware of power that the SQL Server Integration Services (SSIS) offers them. I have also seen many developers struggle with learning curve. As with any powerful tool, there is bound to be a fair amount of complexity. In upgrading a number of SQL Server 2000 DTS packages to SSIS and reading the major books available on the topic I came to realize just how amazing this tool really can be.

    A few months ago, I was having dinner with a friend. She asked, "So what do the SSIS masters know that the rest of us don't?" No answer came immediately to mind so I told her I would have to think about that. She smiled and asked me just to talk. About an hour later -- after our meals had come and gone -- she had written down these bullet points for me:

    • The Zen of SSIS: ELT vs. ETL
    • Understanding Control Flows and Data Flows
    • Static pipeline, dynamic flow
    • Understanding asynchronous vs. synchronous components
    • How to Optimize Data Flows
    • Duct-Tape Scripting
    • Using Configurations

    She asked a few more questions, not really wanting to know the answers for herself, but because she knew it would get me thinking about this list. What did I mean by "Zen?" Why does the difference in the types of flows really matter? What problems does the design of the Data Flow present? Where do performance problems come from? Is there a technique for analyzing A Data Flow for potential issues? Why did I get so exciting about scripting? What do the developers ignore but the Database Administrators fret about most?

    Thankfully, she is very attentive. And incredibly patient when I go rambling off.

    The next day I sent this list in with a little garnish describing the talk as topic submission. The talk got the green light and started thinking about how to present these topics to that audience.

    The problem here is that these topics can be abstract. I loathe giving talks like that. Like good writing, a good presentation should show, not tell. It should demonstrate in concrete terms why these things mater to the SSIS developer. In the next post, we will talk about data set and database design that will help us put the seven things that master SSIS developers know into context.

    Since many of you will not be able to join us that that conference, I thought I would write a series of blog posts covering the essential parts of that presentation. At the end of this series, I will post a URL where you can down load the presentation and the demo files.

  • Enjoying Another Sandwich

    It is hard for me to believe that 1,366 days have past since writing this post in my now abandonned SQLJunkies blog. Now as then, this explains the title of this blog and gives you a peek at my non-technical life. Enjoy it. I'm going to try to keep the signal to noise ration much stronger here. That said...

    Why "Enjoy every sandwich?" -- posted 2003-09-10

    Because you never know when you won't get your next one.

    Last Sunday, Warren Zevon, passed away with nearly year long battle with lung and liver cancer. He said that he made a tactical mistake by not seeing physician for more than twenty years.

    I'm not going to on about how wonderful of a musician he was, how smart or edgy his work was. None of that mattered to me. What did matter is that his music spoke to me. I really identified with "The French Inhaler" and "Excitable Boy." These days, "Keep Me in your Heart for a While" seldom leaves my mind these days. I also like to think that some of Warren's dark humor has been part of my own soul.

    When Zevon was asked if he has any insight about death now that others might, he simply and honestly replied "not unless its... how much it is you are supposed to enjoy every sandwich." Having lost my mother to cancer a couple of years ago, I can say that once again, Warren's words mattered to me. Death isn't about the process of dying. Death is about the process of living. Mom's amazingly strong, positive spirit, non-stop love and sense of peace helped all us live a better life. It is the simple things that matter most in life. It is not what you accomplish or accumulate, it is how much you enjoy doing those things that matter most in the end.

    So the next time you've got a sandwich in your hand, stop. Close your eyes and think about all the sandwiches you have had with your loved ones. Remember the good times of simple pleasures shared. Then enjoy that sandwich with a smile on your face.

    Thanks for reading and remember, Enjoy Every Sandwich.

More Posts « Previous page
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement