|
|
|
|
Rob Farley - Owner/Principal with LobsterPot Solutions (a MS Gold Partner consulting firm), Microsoft MVP (SQL Server) and leader of the SQL User Group in Adelaide, Australia. Rob is also a Director of PASS, and runs training courses around the world in SQL Server and BI topics.
-
Some people question the ethics of writing about my company on my blog. But I don’t have a problem with it. Is it ethical to ignore the concerns of others and to just do what I want? Anyway – it’s relevant for this month’s T-SQL Tuesday, which Chris Shaw is hosting on the topic of ethics, so I’m going to write a few things about integrity and ethics. When I hire people to work at LobsterPot Solutions, the main thing I look for is a high level of integrity. I can handle a certain level of technical unfamiliarity – that’s something that can be learned. But I can’t teach integrity in the same way. I can foster it, and I can ensure that it’s an established ‘value’ in the company. It’s the number one thing. I know a sysadmin company (it wasn’t LobsterPot) that recently lost a bunch of client data – they asked us to help them recover some of the lost data. It was a bad technical mistake they made, but that’s forgivable. They tried really hard to get the data back, and managed to salvage a lot of it, which demonstrated a better level of integrity. When I look at ethics and integrity, it’s not about mistakes, it’s more about the response to mistakes. Their integrity was brought into question over putting the client into a situation where they could’ve lost data – but this wasn’t a question of integrity, it was an honest mistake. The integrity question comes about their response. I like to look at the knee-jerk reaction. If you make a mistake, do you cover it up, or do you try to resolve it? Unfortunately, I think there’s often an element of both, but if the motivation is to protect the other people involved, then I would expect that the integrity level is okay. Ethics is about good and bad behaviour, and of course, once that definition has been made, choosing good. I’m not going to try to do the definition thing – I think people generally have a feel for what is good and bad (even with data it should be clear – you don’t get to steal client data, etc). The application of ethics comes down to integrity. At LobsterPot, we try to do the right thing all the time. This doesn’t mean that we don’t make mistakes – but our first reaction is always to do the right thing by the other parties involved, even if it costs us money. It has to. We don’t get a choice on this. And if I don’t think that integrity is your driving influence for every decision you make, then I’m sorry, but I’m just not going to hire you.
|
-
I don’t tend to find that anything about me gets described as “Fabulous”. It’s not a word I ever use myself, so I was slightly amused to see it here.  It’s going to be an interesting week (in May, Monday 14th to Saturday 19th), in which I have two stints in classrooms (the three day course in downtown Chicago and this pre-con), plus two presentations at the main SQLSaturday event. I’ll be in a room freshly vacated by Jes Borland (I suspect you'll probably be still able to smell the ‘squee’), and from the look of things, I’ll be delivering a solid 2.5 hours of material, with an intermission of 15 minutes. Mind you, with people in the other rooms like Argenis, Ted and Erin, I’m not sure I’ll have much of a crowd. It might be more like “An Intimate Afternoon with Rob Farley” with whoever’s left. The course will be the highlight of my week. I love teaching this course – it’s a great time to be able to get people in a room for a few days and go through ways to make queries better. More effective. It has “Advanced T-SQL” in the title, but I really try to focus on the “Effectiveness” aspect. Yes, we’ll look at a bunch of advanced features, and your T-SQL arsenal will grow, but the idea is to arm you with the information you need to be able to have more effective T-SQL. Advanced is only better when it’s more effective. It’s going to be a really fun few days, as I stretch your thinking and make you look at T-SQL in a new way.  The pre-con is going to be very different. We’ll be going through the syllabus of the new 70-461 exam, teaching you about all the ins and outs of the various features, leaving you in a position to be able to confidently take the exam. I’ve sat this exam in beta, but of course I can’t use any inside knowledge I gained from that to teach this. There’s a lot of stuff to get through. Each of the four sections has four or five bullet points underneath, and even more sub-points under that. We’ll be pushing through a lot of things, and a lot of the more basic stuff will certainly be skimmed through – but we’ll be looking deeper into a lot of the new things, and making sure that you get all the concepts on the exam. I can’t offer a proper guarantee that you’ll pass – some people just take exams badly. But as we’ll also be looking at a bunch of exam technique aspects, I think you'll be fine. The two sessions that I’m doing at the SQLSaturday #119 are two of my favourite talks. One of them is on SARGability. I remember doing this talk with a bunch of MVPs in the room (and a certain cloudy Microsoft employee), and even they said “Ooh – I didn’t know that” afterwards. SARGability – the ability to use indexes effectively – is such a significant aspect of querying, and a pet topic of mine (as regular readers will realise). SQL Server even provides a bunch of methods you can use to improve the SARGability, even if you can’t tweak the queries themselves. Very cool stuff. And did I mention I won’t have slides? The other is on Analytic Functions (a talk which I’m currently doing around Australia and New Zealand at SQLSaturday 135, 136, 138, 139 and 140). There are eight new Analytic Functions in SQL 2012, plus some new enhancements to the OVER clause. I’ll be running through these, and I’m sure you’ll leave the room with new ideas to try to enhance your reporting and data analysis. See you there?
|
-
This coming Tuesday sees a midweek SQL Saturday hit Adelaide. LobsterPot’s a sponsor, as are a bunch of other companies. An all day event, with two tracks featuring some of the best material you’ve ever seen presented. I’m presenting too. The thing I really want to draw your attention to is that we have two sessions from Paul White. If you haven’t heard of Paul, click the link to have a look at his blog. When you’ve picked your jaw up and some of the mind-blowing information he likes to write about, imagine yourself sitting in sessions by him. I’ve just got back from Wellington where I heard him for an hour, and knew that giving him two sessions was completely the right choice. Everyone left the room wishing that he could’ve gone on longer, and I suspect Paul will be somewhat caught up for questions for the rest of the day, as people try to pick his brain about some of the Query Optimizer things he can teach. He’s been scheduled for the morning so that attendees can have plenty of opportunity to see him around for the rest of the day. It’s his first time ever to Australia, so it’s completely brilliant to have him come to Adelaide for this event. I shouldn’t suggest that the other sessions won’t be excellent though. Peter Ward’s session about the $10000 question has been very well received at SQLSaturday events so far, as have the sessions by Roger Noble and Paul te Braak. Combining these with Microsoft’s Raja N presenting about the Database Consolidation Appliance, some excellent local speakers, a terrific sponsor-session from Fusion-io (and a couple of spots from me), and I’m sure you’ll agree that this event is definitely worth getting to. Numbers are limited and being a free event it may well sell out. So get yourself registered (but I’d recommend joining PASS first, which gives you a bunch of extra benefits and there’s no extra effort involved), and I’ll see you there. It’s in less than a week!
|
-
SQL 2012 brings us a bunch of new analytic functions, together with enhancements to the OVER clause. People who have known me over the years will remember that I’m a big fan of the OVER clause and the types of things that it brings us when applied to aggregate functions, as well as the ranking functions that it enables. The OVER clause was introduced in SQL Server 2005, and remained frustratingly unchanged until SQL Server 2012. This post is going to look at a particular aspect of the analytic functions though (not the enhancements to the OVER clause). When I give presentations about the analytic functions around Australia as part of the tour of SQL Saturdays (starting in Brisbane this Thursday), and in Chicago next month, I’ll make sure it’s sufficiently well described. But for this post – I’m going to skip that and assume you get it. The analytic functions introduced in SQL 2012 seem to come in pairs – FIRST_VALUE and LAST_VALUE, LAG and LEAD, CUME_DIST and PERCENT_RANK, PERCENTILE_CONT and PERCENTILE_DISC. Perhaps frustratingly, they take slightly different forms as well. The ones I want to look at now are FIRST_VALUE and LAST_VALUE, and PERCENTILE_CONT and PERCENTILE_DISC. The reason I’m pulling this ones out is that they always produce the same result within their partitions (if you’re applying them to the whole partition). Consider the following query: SELECT YEAR(OrderDate), FIRST_VALUE(TotalDue) OVER (PARTITION BY YEAR(OrderDate) ORDER BY OrderDate, SalesOrderID RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), LAST_VALUE(TotalDue) OVER (PARTITION BY YEAR(OrderDate) ORDER BY OrderDate, SalesOrderID RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY TotalDue) OVER (PARTITION BY YEAR(OrderDate)), PERCENTILE_DISC(0.95) WITHIN GROUP (ORDER BY TotalDue) OVER (PARTITION BY YEAR(OrderDate)) FROM Sales.SalesOrderHeader ; This is designed to get the TotalDue for the first order of the year, the last order of the year, and also the 95% percentile, using both the continuous and discrete methods (‘discrete’ means it picks the closest one from the values available – ‘continuous’ means it will happily use something between, similar to what you would do for a traditional median of four values). I’m sure you can imagine the results – a different value for each field, but within each year, all the rows the same. Notice that I’m not grouping by the year. Nor am I filtering. This query gives us a result for every row in the SalesOrderHeader table – 31465 in this case (using the original AdventureWorks that dates back to the SQL 2005 days). The RANGE BETWEEN bit in FIRST_VALUE and LAST_VALUE is needed to make sure that we’re considering all the rows available. If we don’t specify that, it assumes we only mean “RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW”, which means that LAST_VALUE ends up being the row we’re looking at. At this point you might think about other environments such as Access or Reporting Services, and remember aggregate functions like FIRST. We really should be able to do something like: SELECT YEAR(OrderDate), FIRST_VALUE(TotalDue) OVER (PARTITION BY YEAR(OrderDate) ORDER BY OrderDate, SalesOrderID RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) FROM Sales.SalesOrderHeader GROUP BY YEAR(OrderDate) ; But you can’t. You get that age-old error: Msg 8120, Level 16, State 1, Line 5 Column 'Sales.SalesOrderHeader.OrderDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. Msg 8120, Level 16, State 1, Line 5 Column 'Sales.SalesOrderHeader.SalesOrderID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Hmm. You see, FIRST_VALUE isn’t an aggregate function. None of these analytic functions are. There are too many things involved for SQL to realise that the values produced might be identical within the group. Furthermore, you can’t even surround it in a MAX. Then you get a different error, telling you that you can’t use windowed functions in the context of an aggregate. And so we end up grouping by doing a DISTINCT. SELECT DISTINCT YEAR(OrderDate), FIRST_VALUE(TotalDue) OVER (PARTITION BY YEAR(OrderDate) ORDER BY OrderDate, SalesOrderID RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), LAST_VALUE(TotalDue) OVER (PARTITION BY YEAR(OrderDate) ORDER BY OrderDate, SalesOrderID RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY TotalDue) OVER (PARTITION BY YEAR(OrderDate)), PERCENTILE_DISC(0.95) WITHIN GROUP (ORDER BY TotalDue) OVER (PARTITION BY YEAR(OrderDate)) FROM Sales.SalesOrderHeader ;  I’m sorry. It’s just the way it goes. Hopefully it’ll change the future, but for now, it’s what you’ll have to do. If we look in the execution plan, we see that it’s incredibly ugly, and actually works out the results of these analytic functions for all 31465 rows, finally performing the distinct operation to convert it into the four rows we get in the results.  You might be able to achieve a better plan using things like TOP, or the kind of calculation that I used in http://sqlblog.com/blogs/rob_farley/archive/2011/08/23/t-sql-thoughts-about-the-95th-percentile.aspx (which is how PERCENTILE_CONT works), but it’s definitely convenient to use these functions, and in time, I’m sure we’ll see good improvements in the way that they are implemented. Oh, and this post should be good for fellow SQL Server MVP Nigel Sammy’s T-SQL Tuesday this month.
|
-
April is here, and this weekend, SQL v11.0 (previous known as Denali, now known as SQL Server 2012) reaches general availability. And so I thought I’d share some news about what’s coming next. I didn’t hear this at the MVP Summit earlier this year (where there was lots of NDA information given, but I didn’t go), so I think I’m free to share it. I’ve written before about CTEs being query-scoped views. Well, the actual story goes a bit further, and will continue to develop in future versions. A CTE is a like a “temporary temporary view”, scoped to a single query. Due to globally-scoped temporary objects using a two-hashes naming style, and session-scoped (or ‘local’) temporary objects a one-hash naming style, this query-scoped temporary object uses a cunning zero-hash naming style. We see this implied in Books Online in the CREATE TABLE page, but as we know, temporary views are not yet supported in the SQL Server.   However, in a breakaway from ANSI-SQL, Microsoft is moving towards consistency with their naming. We know that a CTE is a “common table expression” – this is proving to be a more strategic than you may have appreciated. Within the Microsoft product group, the term “Table Expression” is far more widely used than just CTEs. Anything that can be used in a FROM clause is referred to as a Table Expression, so long as it doesn’t actually store data (which would make it a Table, rather than a Table Expression). You can see this is not just restricted to the product group by doing an internet search for how the term is used without ‘common’. In the past, Books Online has referred to a view as a “virtual table” (but notice that there is no SQL 2012 version of this page). However, it was generally decided that “virtual table” was a poor name because it wasn’t completely accurate, and it’s typically accepted that virtualisation and SQL is frowned upon. That page I linked to says “or stored query”, which is slightly better, but when the SQL 2012 version of that page is actually published, the line will be changed to read: “A view is a stored table expression (STE)”. This change will be the first of many. During the SQL 2012 R2 release, the keyword VIEW will become deprecated (this will be SQL v11 SP1.5). Three versions later, in SQL 14.5, you will need to be in compatibility mode 140 to allow “CREATE VIEW” to work. Also consistent with Microsoft’s deprecation policy, the execution of any query that refers to an object created as a view (rather than the new “CREATE STE”), will cause a Deprecation Event to fire. This will all be in preparation for the introduction of Single-Column Table Expressions (to be introduced in SQL 17.3 SP6) which will finally shut up those people waiting for a decent implementation of Inline Scalar Functions.  And of course, CTEs are “Common” because the Table Expression definition needs to be repeated over and over throughout a stored procedure. ...or so I think I heard at some point. Oh, and congratulations to all the new MVPs on this April 1st. @rob_farley
|
-
In May, I’ll be in the US. I have board meetings for PASS at the SQLRally event in Dallas, and then I’m going to be spending a bit of time in Chicago. The big news is that while I’m in Chicago (May 14-16), I’m going to teach my “Advanced T-SQL Querying and Reporting: Building Effectiveness” course. This is a course that I’ve been teaching since the 2005 days, and have modified over time for 2008 and 2012. It’s very much my most popular course, and I love teaching it. Let me tell you why. For years, I wrote queries and thought I was good at it. I was a developer. I’d written a lot of C (and other, more fun languages like Prolog and Lisp) at university, and then got into the ‘real world’ and coded in VB, PL/SQL, and so on through to C#, and saw SQL (whichever database system it was) as just a way of getting the data back. I could write a query to return just about whatever data I wanted, and that was good. I was better at it than the people around me, and that helped. (It didn’t help my progression into management, then it just became a frustration, but for the most part, it was good to know that I was good at this particular thing.) But then I discovered the other side of querying – the execution plan. I started to learn about the translation from what I’d written into the plan, and this impacted my query-writing significantly. I look back at the queries I wrote before I understood this, and shudder. I wrote queries that were correct, but often a long way from effective. I’d done query tuning, but had largely done it without considering the plan, just inferring what indexes would help. This is not a performance-tuning course. It’s focused on the T-SQL that you read and write. But performance is a significant and recurring theme. Effective T-SQL has to be about performance – it’s the biggest way that a query becomes effective. There are other aspects too though – such as using constructs better. For example – I can write code that modifies data nicely, but if I haven’t learned about the MERGE statement and the way that it can impact things, I’m missing a few tricks. If you’re going to do this course, a good place to be is the situation I was in a few years before I wrote this course. You’re probably comfortable with writing T-SQL queries. You know how to make a SELECT statement do what you need it to, but feel there has to be a better way. You can write JOINs easily, and understand how to use LEFT JOIN to make sure you don’t filter out rows from the first table, but you’re coding blind. The first module I cover is on Query Execution. Take a look at the Course Outline at Data Education’s website. The first part of the first module is on the components of a SELECT statement (where I make you think harder about GROUP BY than you probably have before), but then we jump straight into Execution Plans. Some stuff on indexes is in there too, as is simplification and SARGability. Some of this is stuff that you may have heard me present on at conferences, but here you have me for three days straight. I’m sure you can imagine that we revisit these topics throughout the rest of the course as well, and you’d be right. In the second and third modules we look at a bunch of other aspects, including some of the T-SQL constructs that lots of people don’t know, and various other things that can help your T-SQL be, well, more effective. I’ve had quite a lot of people do this course and be itching to get back to work even on the first day. That’s not a comment about the jokes I tell, but because people want to look at the queries they run. LobsterPot Solutions is thrilled to be partnering with Data Education to bring this training to Chicago. Visit their website to register for the course. @rob_farley
|
-
A few days after the end of 24HOP, I find myself reflecting on it. I’m still waiting on most of the information. I want to be able to discover things like where the countries represented on each of the sessions, and things like that. So far, I have the feedback scores and the numbers of attendees. The data was provided in a PDF, so while I wait for it to appear in a more flexible format, I’ve pushed the 24 attendee numbers into Excel.  This chart shows the numbers by time. Remember that we started at midnight GMT, which was 10:30am in my part of the world and 8pm in New York. It’s probably no surprise that numbers drooped a bit at the start, stayed comparatively low, and then grew as the larger populations of the English-speaking world woke up. I remember last time 24HOP ran for 24 hours straight, there were quite a few sessions with less than 100 attendees. None this time though. We got close, but even when it was 4am in New York, 8am in London and 7pm in Sydney (which would have to be the worst slot for attracting people), we still had over 100 people tuning in. As expected numbers grew as the UK woke up, and even more so as the US did, with numbers peaking at 755 for the “3pm in New York” session on SQL Server Data Tools. Kendra Little almost reached those numbers too, and certainly contributed the biggest ‘spike’ on the chart with her session five hours earlier. Of all the sessions, Kendra had the highest proportion of ‘Excellent’s for the “Overall Evaluation of the session” question, and those of you who saw her probably won’t be surprised by that. Kendra had one of the best ranked sessions from the 24HOP event this time last year (narrowly missing out on being top 3), and she has produced a lot of good video content since then. The reports indicate that there were nearly 8.5 thousand attendees across the 24 sessions, averaging over 350 at each one. I’m looking forward to seeing how many different people that was, although I do know that Wil Sisney managed to attend every single one (if you did too, please let me know). Wil even moderated one of the sessions, which made his feat even greater. Thanks Wil. I also want to send massive thanks to Dave Dustin. Dave probably would have attended all of the sessions, if it weren’t for a power outage that forced him to take a break. He was also a moderator, and it was during this session that he earned special praise. Part way into the session he was moderating, the speaker lost connectivity and couldn’t get back for about fifteen minutes. That’s an incredibly long time when you’re in a live presentation. There were over 200 people tuned in at the time, and I’m sure Dave was as stressed as I was to have a speaker disappear. I started chasing down a phone number for the speaker, while Dave spoke to the audience. And he did brilliantly. He started answering questions, and kept doing that until the speaker came back. Bear in mind that Dave hadn’t expected to give a presentation on that topic (or any other), and was simply drawing on his SQL expertise to get him through. Also consider that this was between midnight at 1am in Dave’s part of the world (Auckland, NZ). I would’ve been expecting just to welcome people, monitor questions, probably read some out, and in general, help make things run smoothly. He went far beyond the call of duty, and if I had a medal to give him, he’d definitely be getting one. On the whole, I think this 24HOP was a success. We tried a different platform, and I think for the most part it was a popular move. We didn’t ask the question “Was this better than LiveMeeting?”, but we did get a number of people telling us that they thought the platform was very good. Some people have told me I get a chance to put my feet up now that this is over. As I’m also co-ordinating a tour of SQLSaturday events across the Australia/New Zealand region, I don’t quite get to take that much of a break (plus, there’s the little thing of squeezing in seven SQL 2012 exams over the next 2.5 weeks). But I am pleased to be reflecting on this event rather than anticipating it. There were a number of factors that could have gone badly, but on the whole I’m pleased about how it went. A massive thanks to everyone involved. If you’re reading this and thinking you wish you could’ve tuned in more, don’t worry – they were all recorded and you’ll be able to watch them on demand very soon. But as well as that, PASS has a stream of content produced by the Virtual Chapters, so you can keep learning from the comfort of your desk all year round. More info on them at sqlpass.org, of course.
|
-
Session 11 is on as I write this – Ami Levin presenting about Primary Keys. It’s a good session. But actually, they’ve all been excellent so far, not just Ami’s. I’ve heard only good things about the content. So if you’re reading this and 24HOP is still on, then tune in and take part. If it’s finished, get yourself over to http://sqlpass.org/24hours and see if the sessions have been made available on-demand. Yes – you should be able to watch the sessions when you want to for a year. Watching live is best, because you can ask questions and have them answered during the session, but if there are ones you just couldn’t make, then watching them on-demand is a good option. Numbers have been “not bad”. At the moment it’s still the middle of the night for most Americans – about 6:30am in New York, and yet we’ve had well over a hundred at all the sessions so far, getting up to well over 300 for some sessions. And when I look through the list of names, I see a bunch of names that suggest we’re reaching people from all around the world. I’m seriously looking forward to seeing the stats about which countries have been represented in the audiences. There have been a few comments about the platform. Everyone seems to consider IBTalk an improvement on LiveMeeting, but the closed captioning has met a mixed reception. Some people are loving it, whereas other people are finding the translations leave quite a bit of space for improvement. If you have feedback on this, please feel free to drop me an email (my name with an underscore at hotmail.com, or with a dot at sqlpass.org should reach me just fine, or Twitter, etc). I don’t know how many of the sessions I’ll get to watch overnight – but I’m looking forward to seeing how things go as the day progresses. Big thanks to everyone who’s involved – the sponsors, PASS HQ team and the IBTalk folk who have stayed up overnight to facilitate, plus the moderators, the people doing the live captioning, and of course the speakers and attendees. I love how the SQL Community gets behind things like this. Earlier, the Adelaide SQL Server User Group gathered and watched Denny Lee’s session on BigData, and everyone in the group agreed that it worked really well. I took a picture of our cinema room, although you could only see a small section of the audience. @rob_farley 
|
-
There’s a bunch of stuff going on at the moment in the SQL world, so if you’ve missed this particular piece of news, let me tell you a bit about it. Twice a year, the SQL community puts on its biggest virtual event – 24 Hours of PASS. And the next one is tomorrow – March 21st, 2012. Twenty-four sessions, back-to-back, featuring a selection of some of the best presenters in the SQL world, speakers from all over the world, coming together in an online collaboration that so far has well over thirty thousand registrations across the presentations. Some people are signed up for all 24 sessions, some only one. Traditionally, LiveMeeting has been used as the platform for this event, but this year we’re going with a new platform – IBTalk. It promises big, and we’re hoping it won’t let us down. LiveMeeting has been great, and we thank Microsoft for providing it as a platform for the past few years. However, as the event has grown, we’ve found that a new idea is necessary. Last year a search was done for a new platform, and IBTalk ticked the right boxes. The feedback from the presenters and moderators so far has been overwhelmingly positive, and we’re hoping that this is going to really enhance the user experience. One of my favourite features of the platform is the language side. It provides a pretty good translation service. Users who join a session will see a flag on the left of the screen. If they click it, they can change the language to one of 15 on offer. Picking this changes all the labels on everything. It even translates the text in the Q&A window.  What this means is that someone from Brazil can ask their question in Portuguese, and the presenter will see it in English. Then if the answer is typed in English, the questioner will be able to see the answer, also in Portuguese. Or they can switch to English to see it as the answerer typed it. I know there’s always the risk of bad translations going on, but I’ve heard good things about this translation service. But there’s more – IBTalk are providing staff to type up closed captioning live during the event. So if English isn’t your first language, don’t worry! Picking your language will also let you see subtitles in your chosen language. I’m hoping that this event is the start of PASS being able to reach people from all corners of the world. Wouldn’t it be great to find that this event is successful, and that the next 24HOP (later in the year, our Summit Preview event) has just as many non-English speakers tuning in as English speakers? If you haven’t been planning which sessions you’re going to attend, you really should get over to sqlpass.org/24hours and have a look through what’s on offer. There’s some amazing material from some of the industry’s brightest, covering a wide range of topics, from classic SQL areas to the brand new SQL 2012 features. There really should be something for every SQL professional. Check the time zones though – if you’re in the US you might be on Summer time, and an hour closer to GMT than normal. Massive thanks must go to Microsoft, SQL Sentry and Idera for sponsoring this event. Without sponsors we wouldn’t be able to put any of this on. These companies are helping 24HOP continue to grow into an event for the whole world. See you tomorrow! @rob_farley | #24hop | #sqlpass
|
-
It’s a phrase I use often, especially when teaching, and I wish I had realised the concept years earlier. (And of course, fits with this month’s T-SQL Tuesday topic, hosted by Argenis Fernandez) When I’m sick enough to go to the doctor, I see a GP. I used to typically see the same guy, but he’s moved on now. However, when he has been able to roughly identify the area of the problem, I get referred to a specialist, sometimes a surgeon. Being a surgeon requires a refined set of skills. It’s why they often don’t like to be called “Doctor”, and prefer the traditional “Mister” (the history is that the doctor used to make the diagnosis, and then hand the patient over to the person who didn’t have a doctorate, but rather was an expert cutter, typically from a background in butchering). But if you ask the surgeon about the pain you have in your leg sometimes, you’ll get told to ask your GP. It’s not that your surgeon isn’t interested – they just don’t know the answer. IT is the same now. That wasn’t something that I really understood when I got out of university. I knew there was a lot to know about IT – I’d just done an honours degree in it. But I also knew that I’d done well in just about all my subjects, and felt like I had a handle on everything. I got into developing, and still felt that having a good level of understanding about every aspect of IT was a good thing. This got me through for the first six or seven years of my career. But then I started to realise that I couldn’t compete. I’d moved into management, and was spending my days running projects, rather than writing code. The kids were getting older. I’d had a bad back injury (ask anyone with chronic pain how it affects your ability to concentrate, retain information, etc). But most of all, IT was getting larger. I knew kids without lives who knew more than I did. And I felt like I could easily identify people who were better than me in whatever area I could think of. Except writing queries (this was before I discovered technical communities, and people like Paul White and Dave Ballantyne). And so I figured I’d specialise. I wish I’d done it years earlier. Now, I can tell you plenty of people who are better than me at any area you can pick. But there are also more people who might consider listing me in some of their lists too. If I’d stayed the GP, I’d be stuck in management, and finding that there were better managers than me too. If you’re reading this, SQL could well be your thing. But it might not be either. Your thing might not even be in IT. Find out, and then see if you can be a world-beater at it. But it gets even better, because you can find other people to complement the things that you’re not so good at. My company, LobsterPot Solutions, has six people in it at the moment. I’ve hand-picked those six people, along with the one who quit. The great thing about it is that I’ve been able to pick people who don’t necessarily specialise in the same way as me. I don’t write their T-SQL for them – generally they’re good enough at that themselves. But I’m on-hand if needed. Consider Roger Noble, for example. He’s doing stuff in HTML5 and jQuery that I could never dream of doing to create an amazing HTML5 version of PivotViewer. Or Ashley Sewell, a guy who does project management far better than I do. I could go on. My team is brilliant, and I love them to bits. We’re all surgeons, and when we work together, I like to think we’re pretty good! @rob_farley
|
-
Hooray – we’re ready to announce the details of the 24 Hours of PASS event that’s coming to an internet connection near you on March 21st! Read on for the day’s schedule, some of my thoughts on the 24 Hours of PASS concept, information about the platform, and an announcement which I think is really quite a big deal and worth making a fuss over. 24 Hours Straight So – March 21st. No overlap with any other days, just 24 Hours of PASS, squeezed back into a single day. The last few events have been split over two days, running from noon to midnight (GMT), which conveniently fits in daylight hours of the Eastern United States. To help with the current push for internationalisation (and there’s more on that later in this post), we’re starting at midnight GMT (which is a perfectly reasonable 7pm in New York), and running through to the following midnight GMT (which is the next 7pm in New York, strangely enough). I’d like to be able to give prizes to people who attend all twenty-four, but I’m not quite that keen. Tell you what though – I’ll cheer on Twitter and my blog for anyone who tells me they’ve made it through all of them. I know people have done it before, and hopefully plenty will again. VC involvement The Virtual Chapter portfolio isn’t mine. It’s Denise McInerney’s. She’s doing a brilliant job, and we both see a strong correlation between 24HOP and the VCs. There are likely to be lots of people who tune in to 24HOP who haven’t really noticed that there are virtual chapters that meet, providing excellent online meetings, several times every month! We’ve roped in many of the VC leaders to help choose the sessions, host the sessions, and more. In fact, I suspect that many of the sessions that were submitted but not chosen could well be getting selected for the Virtual Chapters in the weeks and months to come. The Virtual Chapters are one of the best things about PASS, and it’s great that 24HOP can be showcasing them this time around. Most of the VCs are represented, including Performance, Data Architecture, PowerShell, Professional Development, and the larger ones like BI and AppDev. Even the newly forming BigData VC. The Oracle VC isn’t. Sorry, Scott. Platform LiveMeeting is such a great product. If you haven’t tried it out, you really should. It’s been our platform of choice here at 24HOP for ages now. Unfortunately, 24HOP seems to have outgrown it. People who have tuned in to recent events will have noticed that the video hasn’t always streamed as nicely as we’d like, and although we really do love LiveMeeting, we’re going to try a different platform – called IBTalk. This thing looks really impressive, and has some really cool features too. Most importantly, it should scale well. And record nicely. And allow eval forms nicely. And make coffee* (*feature list may not be completely accurate). Internationalisation I know you’ve already read about how the event is going to be 24 hours straight, ignoring the fact that numbers may well be lower while the US sleeps. But that’s not the only way in which 24HOP is becoming more international... During the event, the IBTalk people are going to provide Live Closed Captioning. So you can get subtitles during the event! But there’s more... This captioning will be available in fifteen different languages! English, French, Spanish, Portuguese, Italian, Dutch, German, Russian, Turkish, Arabic, Traditional Chinese, Simplified Chinese, Japanese, Korean, and Thai. You want subtitles in Thai? No problem. Watching with a Turkish friend? Sure, no worries. Even if you’re just more comfortable in Dutch than English... PASS is serious about trying to reach people around the world. With these languages, I think most people are covered. There are a few gaps, such as not having Swedish or Bangla – but apparently just about everyone in IT in those areas speaks English anyway (so I’m told). The platform also has an amazing feature that translates questions, so if you're more comfortable asking a question in Russian or Portguese, you can do that. The presenter will see the question in their own language, and the attendee will read the answer in theirs. The Schedule! As well as what you can see below, the schedule is live over at http://www.sqlpass.org/24hours/spring2012/SessionsbySchedule.aspx and http://www.sqlpass.org/24hours/spring2012/SessionsbyTrack.aspx. Please head over to the PASS site to register for the sessions you’re interested in. But in the meantime, check out this list! I’m sure you’ll recognise plenty of the names, but probably not all. You’ll also notice there are plenty of speakers from different parts of the world, including the UK (like Mark and Neil), Australia (like Julie), and Israel (like Ami). It’s going to be a massive event. Get registering soon! [Edit: Most of this post also appears at: http://www.sqlpass.org/Community/PASSBlog/entryid/410/24-Hours-of-PASS.aspx]
|
-
This is not the big announcement that I’ve promised about 24 Hours of PASS (which starts in five weeks from now!) – that’s coming later this week (and for those of you who haven’t picked up on it – 24HOP is the portfolio I’ve been given since becoming an elected director of PASS). The announcement that is coming later in the week will contain the full schedule (which is being finalised as we speak), and will also include another piece of news that I’m very excited to be able to announce. (Although by now I’m curious about whether this will end up being an anti-climax for those of you expecting me to announce that we’re going to be broadcasting from the moon or something) But in keeping with this month’s T-SQL Tuesday (which feels like one of the only things that gets me taking time out to write – hard to think that it’s two years since I hosted), I thought I’d mention what’s happening on the Big Data side with 24 Hours of PASS. If you haven’t been paying attention, Big Data is the Big Topic du jour. Microsoft is running with the Apache Hadoop platform – which is hugely significant when you consider the power of open source here. I’m not going to suggest that you’re missing out if you’re not already jumping in with both feet – but I think this area is set to grow significantly. Anyway – I’m going to leak the details for two of the 24 Hours of PASS sessions. You’ll be able to register for these in a few days’ time. At 02:00 - 03:00 GMT (what is this in your time zone?), we have: Denny Lee: Tier-1 BI in the Age of Bees and Elephants In this age of Big Data, data volumes become exceedingly larger while the technical problems and business scenarios become more complex. This session dives provides concrete examples of how these can be solved. Highlighted will be the use of Big Data technologies including Hadoop (elephants) and Hive (bees) with Analysis Services. Customer examples including Klout and Yahoo! (with their 24TB cube) will highlight both the complexities and solutions to these problems. And at 15:00 - 16:00 GMT (what is this in your time zone?), we have: Cindy Gross: Fitting Microsoft Hadoop into your Enterprise BI Strategy What is this Big Data thing and why should you care? Learn about Microsoft's Hadoop connectors, our Hive add-in for Excel, what Pig and Hive are, and more. Big Data is an exciting foray into the world of data previously too big to load and query in an affordable manner. With the new Big Data tools you can expand your BI reach be the hero who helps your company make better business decisions. Both these sessions are going to be great. If you’ve heard either Denny or Cindy present before, you know that they’re both technically excellent and fantastic presenters too. What’s more, with these sessions being roughly 12 hours apart, hopefully you can get to at least one, no matter where in the world you are. (In fact, Denny’s session is in the timeslot that the Adelaide SQL Server User Group uses – so we’ll be showing this on the big screen at our regular event).
|
-
In my last post, I showed a technique for dealing with working columns when writing T-SQL. The idea was around using APPLY to be able to push values from the existing set through calculations (but preferably not scalar functions, of course), producing new columns which can be used further down the query, even in the WHERE and GROUP BY clauses. Useful stuff indeed. But there is a limitation which I didn’t cover, and I feel that this is worth writing about for this month’s T-SQL Tuesday. The theme allows people to write about previous Tuesday topics – so I’m going to revisit topics 17 and 25 as I go a little further into APPLY. These working columns can only be applied if they are calculations on a single row of data. The resulting set might be bigger or smaller based on how many rows are affected, but the input (at least logically) should be considered on a row-by-row basis. Consider the following query: SELECT p.Name, r.RevName FROM Production.ProductSubcategory AS s CROSS APPLY (SELECT REVERSE(s.Name) AS RevName) AS r CROSS APPLY ( SELECT TOP (1) * FROM Production.Product AS p WHERE p.ProductSubcategoryID = s.ProductSubcategoryID ORDER BY p.ListPrice DESC ) AS p ORDER BY s.ProductSubcategoryID; You see two CROSS APPLYs here – one creating a simple working column in REVERSE(s.Name), the other even involving another table to do a lookup. This second CROSS APPLY doesn’t necessarily produce any rows – if there is no matching Product, the resultset won’t contain any rows for that ProductSubcategory. I’ve shown some of its results below:  But what about the following query, which gives the same 37 rows on the AdventureWorks database: SELECT ( SELECT TOP (1) p.Name FROM Production.Product AS p WHERE p.ProductSubcategoryID = s.ProductSubcategoryID ORDER BY p.ListPrice DESC ), REVERSE(s.Name) AS RevName FROM Production.ProductSubcategory AS s ORDER BY s.ProductSubcategoryID; This is very similar, but you’ll notice that instead of developing the result set in the FROM clause, I’ve used the calculations directly in the SELECT clause to produce the same result. Except that it’s not the same. Here, my TOP sub-query can only produce a single value. You’ll notice I change my query to fetch only a single column now – any more would give an error. And I’m lucky I’ve selected TOP(1), not TOP(2), or TOP (1) WITH TIES – which would both be legal in my APPLY system. You’ll see that APPLY provides additional flexibility here. So now consider the following query: SELECT p.Name, p.ProductSubcategoryID, COUNT(*) OVER (PARTITION BY p.ProductSubcategoryID) AS SubcatCnt FROM Production.Product AS p ORDER BY p.ProductSubcategoryID, p.Name; You will see that both look at the list of Products, and produce a count of the number of products which are in the same Subcategory. I’m sure you’re all very familiar with the OVER clause and the use of the windowing technique provided by the PARTITION BY clause. If we run this first query and look at some of the results, you’ll see the marvellous windowing technique, which I’ve highlighted using ZoomIt. You’ll see that the SubcatCnt column has the value 3 for the Subcategory with 3 items, and 6 for the one with 6, and so on.  Excellent stuff. But you’ll probably also be aware that you can’t use windowing functions like this in the WHERE clause – they’re applied only in the SELECT clause. Having seen my working columns trick though, you might think otherwise. Check this out – it’s perfectly legal! SELECT p.Name, p.ProductSubcategoryID, sc.SubcatCnt FROM Production.Product AS p CROSS APPLY ( SELECT COUNT(*) OVER (PARTITION BY p.ProductSubcategoryID) AS SubcatCnt ) AS sc ORDER BY p.ProductSubcategoryID, p.Name; Amazing stuff. I can now use SubcatCnt in the WHERE clause. Job done! But I’m having you on. It’s a trick. It’s rubbish. You see, APPLY logically works on a single row at a time, as I said earlier. Look at the same block of results for this second query.  Loads of 1s. Not exactly helpful. It’s accurate though – there is a count of one row in each partition of the set that is logically passed in each time. APPLY works out the result for each row individually. It doesn’t do it for the whole set. So if you’re wanting working columns that need to be applied to multiple rows at once, then you need to think of another solution. APPLY is good, but not that good. Know its limitations and you’ll be better off.
|
-
T-SQL Tuesday again and this month is on T-SQL Tips (thanks Allen!). In some ways it’s a tough topic, because there are things I don’t really consider tips that other people do, and vice-versa. This one’s legitimate though: Using CROSS APPLY for working columns in calculations. Let me give you an example.  Back in 2009, I wrote a blog post on Julian (YYDDD) dates. http://msmvps.com/blogs/robfarley/archive/2009/03/25/converting-to-and-from-julian-format-in-t-sql.aspx. Someone asked me recently about using this for time periods, for example, someone’s age when they play a football match. For example, Cesc Fàbregas holds the record as the youngest Arsenal player, aged 16 years and 177 days. Given someone’s birthdate and date of interest, this seems like it should be quite easy to work out, but can get frustratingly tricky. The idea is quite straight forward – you just count the number of days from their most recent birthday. So let’s have a think. Counting the number of years between the startdate and the enddate is a nice place to start. I could count the number of days, but that might give me rounding error based on leap years. But counting the years simply compares the year component, such as 1987 and 2003, and the player might not have had his birthday this year yet. In that case, we’d want to go with the year before. Anyway – once we’ve figured out how old the player is, we can figure out when their last birthday was and count the days since then. Easy. But so easy to make a mistake somewhere. And this is where CROSS APPLY can come into its own, by allowing us to use working columns. Let’s start with a few players. Some young ones, and Thierry Henry, just because. I’m using the VALUES method, but you could have a Players table just as easily. select * from ( values ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date)) ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date)) ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date)) ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date)) ) p (Name, DoB, Debut)  First, I want to make it clear which column is my startdate and which is my enddate. It’s too easy to be thinking “startdate” and pick up “Debut” here, because this sounds very much like the same thing. There’s the startdate of their playing career, and the startdate of the calculation. To avoid confusion, I’m going to do some simple column-renaming. This gives me more reusable code, and APPLY even means I never have to worry about whether I already have these column names referring to something else, because they’re going to get their own table alias too. select * from ( values ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date)) ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date)) ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date)) ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date)) ) p (Name, DoB, Debut) CROSS APPLY (select p.DoB as startdate, p.Debut as enddate) as working  Might seem like a bit of a waste to you, but it means so much to me. Really. Now I want to count how many years there are between my startdate and enddate, and work out when the startdate is this year. I could do this in one step, sure, but I want to be able to check my working really thoroughly. select * from ( values ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date)) ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date)) ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date)) ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date)) ) p (Name, DoB, Debut) CROSS APPLY (select p.DoB as startdate, p.Debut as enddate) as working CROSS APPLY (select datediff(year,working.startdate,working.enddate) as YearsDifferent) yd CROSS APPLY (select dateadd(year,yd.YearsDifferent,working.startdate) as StartDateThisYear) sdty
 Now I can easily test to see if I need to subtract a year or not. select * from ( values ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date)) ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date)) ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date)) ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date)) ) p (Name, DoB, Debut) CROSS APPLY (select p.DoB as startdate, p.Debut as enddate) as working CROSS APPLY (select datediff(year,working.startdate,working.enddate) as YearsDifferent) yd CROSS APPLY (select dateadd(year,yd.YearsDifferent,working.startdate) as StartDateThisYear) sdty CROSS APPLY (select case when sdty.StartDateThisYear > working.enddate then -1 else 0 end as YearOffset) yo CROSS APPLY (select dateadd(year,yd.YearsDifferent + yo.YearOffset,working.startdate) as AdjustedYear) ay
 I’m sure you can see where this is going. I now have their latest birthday before the date I’m looking for, and I can easily turn this into a Julian Date format. select p.*, jd.JulianDiff from ( values ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date)) ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date)) ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date)) ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date)) ) p (Name, DoB, Debut) CROSS APPLY (select p.DoB as startdate, p.Debut as enddate) as working CROSS APPLY (select datediff(year,working.startdate,working.enddate) as YearsDifferent) yd CROSS APPLY (select dateadd(year,yd.YearsDifferent,working.startdate) as StartDateThisYear) sdty CROSS APPLY (select case when sdty.StartDateThisYear > working.enddate then -1 else 0 end as YearOffset) yo CROSS APPLY (select dateadd(year,yd.YearsDifferent + yo.YearOffset,working.startdate) as AdjustedYear) ay CROSS APPLY (select (yd.YearsDifferent + yo.YearOffset) * 1000 as YearComponent) yc CROSS APPLY (select datediff(day,ay.AdjustedYear,working.enddate) as DayComponent) dc CROSS APPLY (select yc.YearComponent + dc.DayComponent as JulianDiff) as jd ;  The great thing about this is that I can be very confident of my working, being able to check each step of the calculation along the way. I can even turn it into a simple TVF, whilst keeping the calculations just as verifiable: create function dbo.YYDDD(@startdate date, @enddate date) returns table as return ( select jd.JulianDiff FROM (values (@startdate, @enddate)) working (startdate, enddate) CROSS APPLY (select datediff(year,working.startdate,working.enddate) as YearsDifferent) yd CROSS APPLY (select dateadd(year,yd.YearsDifferent,working.startdate) as StartDateThisYear) sdty CROSS APPLY (select case when sdty.StartDateThisYear > working.enddate then -1 else 0 end as YearOffset) yo CROSS APPLY (select dateadd(year,yd.YearsDifferent + yo.YearOffset,working.startdate) as AdjustedYear) ay CROSS APPLY (select (yd.YearsDifferent + yo.YearOffset) * 1000 as YearComponent) yc CROSS APPLY (select datediff(day,ay.AdjustedYear,working.enddate) as DayComponent) dc CROSS APPLY (select yc.YearComponent + dc.DayComponent as JulianDiff) as jd ) ; select p.*, jd.JulianDiff from ( values ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date)) ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date)) ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date)) ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date)) ) p (Name, DoB, Debut) CROSS APPLY dbo.YYDDD(DoB, Debut) jd;
I know it’s really simple, but it’s an amazingly useful tip for making sure that you do things right. Oh, and as for the plan – SQL just rolls it all up into a single Compute Scalar as if we’d written it out the long way, happily showing us that doing it step-by-step ourselves is no worse at all.  @rob_farley
|
-
Little Bobby Tables’ mother says you should always sanitise your data input. Except that I think she’s wrong. The SQL Injection aspect is for another post, where I’ll show you why I think SQL Injection is the same kind of attack as many other attacks, such as the old buffer overflow, but here I want to have a bit of a whinge about the way that some people sanitise data input, and even have a whinge about people who insist on using stored procedures for SSRS reports. Let me say that again, in case you missed it the first time: I want to have a whinge about people who insist on using stored procedures for SSRS reports. Let’s look at the data input sanitisation aspect – except that I’m going to call it ‘parameter validation’. I’m talking about code that looks like this: create procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as begin /* First check that @eomdate is a valid date */ if isdate(@eomdate) != 1 begin select 'Please enter a valid date' as ErrorMessage; return; end /* Then check that time has passed since @eomdate */ if datediff(day,@eomdate,sysdatetime()) < 5 begin select 'Sorry - EOM is not complete yet' as ErrorMessage; return; end /* If those checks have succeeded, return the data */ select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales from Sales.SalesOrderHeader where OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID order by SalesPersonID; end Notice that the code checks that a date has been entered. Seriously??!! This must only be to check for NULL values being passed in, because anything else would have to be a valid datetime to avoid an error. The other check is maybe fair enough, but I still don’t like it. The two problems I have with this stored procedure are the result sets and the small fact that the stored procedure even exists in the first place. But let’s consider the first one of these problems for starters. I’ll get to the second one in a moment. If you read Jes Borland (@grrl_geek)’s recent post about returning multiple result sets in Reporting Services, you’ll be aware that Reporting Services doesn’t support multiple results sets from a single query. And when it says ‘single query’, it includes ‘stored procedure call’. It’ll only handle the first result set that comes back. But that’s okay – we have RETURN statements, so our stored procedure will only ever return a single result set. Sometimes that result set might contain a single field called ErrorMessage, but it’s still only one result set. Except that it’s not okay, because Reporting Services needs to know what fields to expect. Your report needs to hook into your fields, so SSRS needs to have a way to get that information. For stored procs, it uses an option called FMTONLY. When Reporting Services tries to figure out what fields are going to be returned by a query (or stored procedure call), it doesn’t want to have to run the whole thing. That could take ages. (Maybe it’s seen some of the stored procedures I’ve had to deal with over the years!) So it turns on FMTONLY before it makes the call (and turns it off again afterwards). FMTONLY is designed to be able to figure out the shape of the output, without actually running the contents. It’s very useful, you might think. set fmtonly on exec dbo.GetMonthSummaryPerSalesPerson '20030401'; set fmtonly off
Without the FMTONLY lines, this stored procedure returns a result set that has three columns and fourteen rows. But with FMTONLY turned on, those rows don’t come back. But what I do get back hurts Reporting Services.  It doesn’t run the stored procedure at all. It just looks for anything that could be returned and pushes out a result set in that shape. Despite the fact that I’ve made sure that the logic will only ever return a single result set, the FMTONLY option kills me by returning three of them. It would have been much better to push these checks down into the query itself. alter procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as begin select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales from Sales.SalesOrderHeader where /* Make sure that @eomdate is valid */ isdate(@eomdate) = 1 /* And that it's sufficiently past */ and datediff(day,@eomdate,sysdatetime()) >= 5 /* And now use it in the filter as appropriate */ and OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID order by SalesPersonID; end Now if we run it with FMTONLY turned on, we get the single result set back. But let’s consider the execution plan when we pass in an invalid date. First let’s look at one that returns data. I’ve got a semi-useful index in place on OrderDate, which includes the SalesPersonID and TotalDue fields. It does the job, despite a hefty Sort operation.  …compared to one that uses a future date:  You might notice that the estimated costs are similar – the Index Seek is still 28%, the Sort is still 71%. But the size of that arrow coming out of the Index Seek is a whole bunch smaller. The coolest thing here is what’s going on with that Index Seek. Let’s look at some of the properties of it.  Glance down it with me… Estimated CPU cost of 0.0005728, 387 estimated rows, estimated subtree cost of 0.0044385, ForceSeek false, Number of Executions 0. That’s right – it doesn’t run. So much for reading plans right-to-left... The key is the Filter on the left of it. It has a Startup Expression Predicate in it, which means that it doesn’t call anything further down the plan (to the right) if the predicate evaluates to false.  Using this method, we can make sure that our stored procedure contains a single query, and therefore avoid any problems with multiple result sets. If we wanted, we could always use UNION ALL to make sure that we can return an appropriate error message. alter procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as begin select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales, /*Placeholder: */ '' as ErrorMessage from Sales.SalesOrderHeader where /* Make sure that @eomdate is valid */ isdate(@eomdate) = 1 /* And that it's sufficiently past */ and datediff(day,@eomdate,sysdatetime()) >= 5 /* And now use it in the filter as appropriate */ and OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID /* Now include the error messages */ union all select 0, 0, 0, 'Please enter a valid date' as ErrorMessage where isdate(@eomdate) != 1 union all select 0, 0, 0, 'Sorry - EOM is not complete yet' as ErrorMessage where datediff(day,@eomdate,sysdatetime()) < 5 order by SalesPersonID; end
But still I don’t like it, because it’s now a stored procedure with a single query. And I don’t like stored procedures that should be functions. That’s right – I think this should be a function, and SSRS should call the function. And I apologise to those of you who are now planning a bonfire for me. Guy Fawkes’ night has already passed this year, so I think you miss out. (And I’m not going to remind you about when the PASS Summit is in 2012.) create function dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) returns table as return ( select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales, '' as ErrorMessage from Sales.SalesOrderHeader where /* Make sure that @eomdate is valid */ isdate(@eomdate) = 1 /* And that it's sufficiently past */ and datediff(day,@eomdate,sysdatetime()) >= 5 /* And now use it in the filter as appropriate */ and OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID union all select 0, 0, 0, 'Please enter a valid date' as ErrorMessage where isdate(@eomdate) != 1 union all select 0, 0, 0, 'Sorry - EOM is not complete yet' as ErrorMessage where datediff(day,@eomdate,sysdatetime()) < 5 );
We’ve had to lose the ORDER BY – but that’s fine, as that’s a client thing anyway. We can have our reports leverage this stored query still, but we’re recognising that it’s a query, not a procedure. A procedure is designed to DO stuff, not just return data. We even get entries in sys.columns that confirm what the shape of the result set actually is, which makes sense, because a table-valued function is the right mechanism to return data. And we get so much more flexibility with this. If you haven’t seen the simplification stuff that I’ve preached on before, jump over to http://bit.ly/SimpleRob and watch the video of when I broke a microphone and nearly fell off the stage in Wales. You’ll see the impact of being able to have a simplifiable query. You can also read the procedural functions post I wrote recently, if you didn’t follow the link from a few paragraphs ago. So if we want the list of SalesPeople that made any kind of sales in a given month, we can do something like: select SalesPersonID from dbo.GetMonthSummaryPerSalesPerson(@eomonth) order by SalesPersonID;
This doesn’t need to look up the TotalDue field, which makes a simpler plan. select * from dbo.GetMonthSummaryPerSalesPerson(@eomonth) where SalesPersonID is not null order by SalesPersonID; This one can avoid having to do the work on the rows that don’t have a SalesPersonID value, pushing the predicate into the Index Seek rather than filtering the results that come back to the report. If we had joins involved, we might see some of those being simplified out. We also get the ability to include query hints in individual reports. We shift from having a single-use stored procedure to having a reusable stored query – and isn’t that one of the main points of modularisation? Stored procedures in Reporting Services are just a bit limited for my liking. They’re useful in plenty of ways, but if you insist on using stored procedures all the time rather that queries that use functions – that’s rubbish. @rob_farley
|
|
|
|
|
|