THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a freelance data mangler in London

Is the SSIS Catalog going to be enough?

Over the past year or so I have written at length about the SSIS Catalog that will be provided in the upcoming version of SQL Server Integration Services (SSIS 2012) and the capabilities that it will bring; one of the biggest benefits (in my opinion) is that logging is now something that is taken care of on your behalf.

Many SSIS developers have built their own logging solutions (often referred to as "frameworks") over the past few years and it occurred to me that many of those developers may have a need to carry on running their own logging solutions alongside that which the SSIS Catalog provides. To that end I have raised two Connect submissions requesting enhancements to SSIS that will make it easier for developers to do just that:

One of those (the one about the ExecutionID variable) has already been implemented and will be available in the final release of SQL Server 2012 however this blog post isn't about drumming up support for Connect submissions. I am more interested in knowing whether you think you will need to log information over and above what the SSIS Catalog provides and if so, what exactly? I can envisage a number of scenarios in which additional logging may be necessary:

Do you plan to use your own logging solutions alongside the SSIS Catalog? If so, why? Let me know in the comments, I'm really interested to discover how folks plan to use this.

@jamiet

 

Published Wednesday, February 22, 2012 10:00 AM by jamiet

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Koen Verbeeck said:

At first sight, the built-in reports deliverd within SSMS already give quite some information. I guess I have to dive real deep in an SSIS 2012 project at a client to find out what's missing.

The first thing that pops to mind is rowcounts affected by Execute SQL Tasks.

February 22, 2012 4:53 AM
 

Mike Davis said:

The problem with the new built in logging is the fact that it is not centralized. If you have 10 servers, you will have to visit each server to look at your ssis logs. So custom built logging or tools like BIxPress will continue to be needed.

February 22, 2012 8:04 AM
 

jamiet said:

Hi Mike,

Interesting point. Why would you have 10 SSIS servers?

Regards

Jamie

February 22, 2012 8:14 AM
 

Peter Schott said:

Good point - totally depends on the tasks, but I know that our current implementation captures row counts from stored proc calls or ad-hoc SQL calls where appropriate. We've been keeping things somewhat simple for the time being, so we wouldn't have an immediate need to extend other logging functionality. I can definitely see it for larger and more complex packages, though.

We've mostly switched to BIxPress for our auditing and generally find that pretty useful. That may or may not change when we start using 2012. Our main reasons for using the software were for the real time monitoring and much easier reporting than our current custom logging. If SSIS duplicates all/most of that in 2012, we'd need to see some significant benefits to keep using that tool.

-Peter

February 22, 2012 8:44 AM
 

jbooker said:

Jamie,

I was thnking the same thing, what if I need to log a variable value OnPostExecute and report that value with the duration and result of a task?  We use custom logging based on your popular event handler example which logs Insert, Update, Delete counts OnPostExecute using variables.  As you mention it would be nice to be able to capture row counts from SQL tasks.

Is it possible to log the read\write variables of a task in the [catalog].[event_message_context] table?  That may do the trick.  Like if I set @[user::UpdateCount] using sproc output parameter in a SQL task, will it log that value in [catalog].[event_message_context]?  It would not be ideal to query those values from a related table compared to having columns in [catalog].[executable_statistics].  For that reason it may be best to add our own tables.

It would be cool if there were a way to extend the new logging by adding custom stored procs that can fire before & after the built-in ones to populate custom tables without the need to add custom event handlers to packages.  I suppose this would be easy enough by simply adding a SQL task to the event which could select from catalog to get duration, but that brings another question...What about the order of events?  Can we be sure the [catalog].[executable_statistics] is written before the OnPostExecute Event Handler is fired for the same task?

I agree it would be nice to specify the server and db name of SSISDB so two servers could share one catalog.

February 22, 2012 9:45 AM
 

jamiet said:

Hi Josh,

"if I set @[user::UpdateCount] using sproc output parameter in a SQL task, will it log that value in [catalog].[event_message_context]?"

Good question but no, I'm pretty sure it doesn't do that. I stand to be corrected however - I haven't done too much investigation into that view.

"It would be cool if there were a way to extend the new logging by adding custom stored procs that can fire before & after the built-in ones to populate custom tables without the need to add custom event handlers to packages."

Yeah I like that idea a lot. In other words, it would be nice if there were some hooks that we just plug our own sprocs into so that they get executed when a particular event occurs, right?

I can't see that ever happening mind you, for starters SQL Server doesn't have a good plug-in model like (say) .Net does with MEF (http://mef.codeplex.com/).

"I agree it would be nice to specify the server and db name of SSISDB so two servers could share one catalog."

Yeah, I've been thinking about this idea a lot myself. I'm envisaging the concept os a SSIS Catalog "agent" running on a remote server. The agent can execute packages just like the proper SSIS Catalog can but the agent would log back to the SSIS Catalog that it was attached to. Very interesting idea that.

regards

Jamie

February 22, 2012 10:04 AM
 

jamiet said:

Josh,

In fact, I like the idea of a SSIS Catalog Agent so much that I have suggested it: https://connect.microsoft.com/SQLServer/feedback/details/726102/ssis-catalog-agent

Regards

Jamie

February 22, 2012 10:20 AM
 

Eric Wisdahl said:

I'm fairly certain I'll keep my framework around, although I will have to play with the new features more before deciding they don't provide everything I need.  Currently, we log the package start / end times, what versions they are, the descriptions associated, what tables are affected by the package and how many records were extracted / inserted / updated / deleted.  Whether the package was successful, if it was restarted, what the parent process was that called it, which "processing group" the package belongs to, etc.  I also do not like that you can no longer control exactly what events are being logged.  You only have the four or five presets.  So, if you want to find the PipelineComponentTime or PipelineExecutionTrees you get every other event under the sun (verbose)???  gah.  Oh well, I guess for the majority of people this will be a good thing.

February 22, 2012 11:01 AM
 

jamiet said:

Eric,

Yeah, not being able to configure the 4 presets (well, 3 really cos the 4th is "None") is a problem. Some people will want to configure, some won't care. One of those "you can't please all the people all the time" situations.

Of the things you listed *I think* all of the following can be captured in the SSIS Catalog:

-Package start/end times

-Versions (i.e. Major/Minor/Version)

-How many records extracted/updated/deleted in a dataflow (although perhaps not recorded in the manner that you would like)

-Whether the package was successful

-Parent process that called the package

JT

February 22, 2012 11:11 AM
 

jbooker said:

RE:"Yeah I like that idea a lot. In other words, it would be nice if there were some hooks that we just plug our own sprocs into so that they get executed when a particular event occurs, right?

I can't see that ever happening mind you, for starters SQL Server doesn't have a good plug-in model"

I'm thinking simple hook like this at the end of the built-in sp which logs to executable_statistics.  We create a sproc called 'sp_after_log_executable_statistics'and if it exists it's fired.  Of course how does it get paramenters from SSIS variables is beyond me.

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[catalog].[sp_after_log_executable_statistics]') AND type in (N'P', N'PC'))

EXEC [catalog].[sp_after_log_executable_statistics]]

GO

RE:"I also do not like that you can no longer control exactly what events are being logged."

You mean you can't disable event handling on certain containers and tasks and you cant specify not to log OnInformation events for Basic level, for example?  That is a bummer.

February 22, 2012 2:27 PM
 

Carla Sabotta said:

The Catalog.packages view shows the major & minor version of a package. The catalog.executions view shows the package start & end times, and the package status (created, running, canceled, failed, pending, successful etc.).

The catalog.execution_data_satistics view shows the number of rows sent from the source component. You can use the catalog_add_data_tap & catalog.add_data_tap_by_guid stored procedures to capture a copy of the data for a specific data flow path.

February 22, 2012 2:54 PM
 

jamiet said:

Hi Carla,

Thanks for contributing here - good to see someone from the product team adding their two-penneth worth.

Regarding your comment "The Catalog.packages view shows the major & minor version of a package", I think I am right in saying that Catalog.packages is a list of all the packages currently stored in the catalog. That is useful but it doesn't provide what I think Eric wanted which is the major/minor/version of a package that gets executed.

Regards

Jamie

February 22, 2012 4:58 PM
 

Mark Wojciechowicz said:

We presently log the product version.  This helps to identify issues that can arise when machines are running different versions of SSIS.

February 23, 2012 3:44 PM
 

jbooker said:

Hey Jamie,

Looks like the release documentation for view [catalog].[event_message_context] shows a context_type field having possible value:

70 | Variable | Indicates that the value of a variable is captured.

http://msdn.microsoft.com/en-us/library/hh479590(v=sql.110).aspx

Hopefully this means variable values can be captured in the catalog.  Is it true that this is only captured for OnError events?

If so, it's another example of beng shame that we can't control when events are logged.  We should be able to capture context OnPostExec for selected tasks.  Bottom line, built-in logging should obey the logging properties set on each container\task if desired.

Josh

February 24, 2012 11:05 AM
 

Bill said:

Definitely would love this:

>> Row counts captured using the Execute SQL Task.

Otherwise I'll have to develop my own framework.  Yuck.

June 19, 2012 10:05 AM
 

Onkar Belgamwar said:

Hi,

When we deploy the reports on the SQL Server (SSIS Catalog) and ran it, some reports are created automatically. I need row counts (Extracted, Inserted and updated) in those reports. Is there any property, so that i can see the row counts on the reports itself?

I don't want to go for the Custom Logging, rather than I want to go with those reports. But, for logging, i need row count should be shown for each package.

Please reply.

Thanks,

Onkar

August 8, 2012 3:39 AM
 

jamiet said:

Hello Onkar,

I'm a little confused about what you mean when you say:

 "When we deploy the reports on the SQL Server (SSIS Catalog)"

What reports are you deploying?

Rowcounts are logged in [catalog].[execution_data_statistics] although only when LOGGING_LEVEL=Verbose*. Read more at:

SSIS Logging in Denali (http://sqlblog.com/blogs/jamie_thomson/archive/2011/07/16/ssis-logging-in-denali.aspx)

Unfortunately I don't know of any built-in reports that surface this information. I provide a suite of reports that DO surface this information - take a look at http://sqlblog.com/blogs/jamie_thomson/archive/tags/SSIS+Reporting+Pack/default.aspx for more info.

hope that helps

JT

*This information is correct as at 08/08/2012. It may not be correct in the future.

August 8, 2012 3:48 AM
 

Onkar Belgamwar said:

Hey Jamie,

Sorry, its my mistake. I mean to say Packages, not reports. :) And thanks for quick Reply.

I have read the Carla's post above, where she mention the use of the [catalog].[execution_data_statistics] table. I looked into it, but it just gave us the Sent_Row_Count.

But, for the other counts, such as Insert and Update Count, do we need to go just for the Custom Logging? I'm also looking at the Custom Reports, if it helps me.

Thanks,

Onkar Belgamwar

August 8, 2012 6:33 AM
 

jamiet said:

Hi Onkar,

"But, for the other counts, such as Insert and Update Count, do we need to go just for the Custom Logging?"

Let me clarify one thing, how are you doing these insertions/updates? Using the Dataflow or an Execute SQL Task?

You are correct that [catalog].[execution_data_statistics] gives you rows sent however the number of "inserted rows" (e.g. rows inserted by an OLE DB Destination) then you simply have to count the number of rows that get sent to that component. All that information is within [catalog].[execution_data_statistics].

JT

August 8, 2012 6:42 AM
 

Onkar Belgamwar said:

Thanks Jamie,

Actually, i have some packages which just extract the Source data and load the Staging tables, in such cases i'm using the DFT. But, when i load the dimensions, i'm either using the SCD in DFT OR Merge Statement in the Execute SQL Task (In this case, i replace DFT with Merge Statement) for incremental load. So i need to capture those counts in both the cases.

I have used the Row Count Transformation to capture the Rows and store it in the variable, and in case of Merge, i have used the @Action of Merge Statement.

But, when i want to go for SSIS Catalog Dashboards, i don't get such counts. So i'm looking for any work around to get those counts.

-Onkar

August 8, 2012 7:11 AM
 

jamiet said:

Hi Onkar,

OK, I understand now.

Rowcounts for Execute SQL Tasks do not get captured (which makes sense - how would SSIS know what the affected rowcount is by the SQL statement in an Execute SQL Task given that you can put whatever you want in there?).

If you are using the SCD Ccomponent within a dataflow however there should be no problems capturing rowcounts there. Its not the SCD component that does the insertion/update remember, its the OLE DB Destination or OLE DB Command that are downstream of that component. hence, simply look in [catalog].[execution_data_statistics] to discover how many rows rows were sent to those components - and there are your rowcounts.

regards

Jamie

August 8, 2012 7:29 AM
 

Onkar Belgamwar said:

Hi Jamie,

Thanks, Now I got the point and understood the whole functionality of the [catalog].[execution_data_statistics] table.

So, for the Execute SQL, I need to write code in the SQL itself.

Regards,

Onkar Belgamwar

August 8, 2012 7:53 AM
 

Johan Dahlin said:

Hi Jamie,

First of all, thanks for a great blog!

I’ m currently trying to decide if I should stop using my old SSIS framework and rely entirely in the 2012 built in logging but I have one major issue. Hopefully it is there and I just don’t see it. Here it goes:

Today I’m using the PackageExecutionGUID to store information in all my tables about which package execution that inserted or modified the record. I can’t seem to find this in the new SSIS catalog. Using the ExecutionId is not really enough for me as I want to be able to distinguish between two executions of the same package within the same batch (ExecutionId).

Any ideas? Is there a way to map the PackageExecutionGUID (or maybe another variable available in SSIS) to a record in the [internal].[executable_statistics] table

Thanks

September 18, 2012 4:49 AM
 

jamiet said:

Hi Johan,

Check out fields:

[catalog].[event_messages].[package_path]

[catalog].[event_messages].[execution_path]

[catalog].[event_messages].[package_name]

[execution_path] is a good one because it uniquely identifies which executable is raising the event. It even distinguishes different executions of the same executable within a loop.

Do they provide what you need?

September 18, 2012 5:08 AM
 

Johan Dahlin said:

Sorry, I meant the System::ExecutionInstanceGUID variable in the post above.

And I guess I would need this to be able to create my own logging of affected records by T-SQL statements as well. How do I connect my custom logging to the [internal].[executable_statistics] table?

Regards

September 18, 2012 5:09 AM
 

jamiet said:

"Sorry, I meant the System::ExecutionInstanceGUID variable in the post above."

So you want a unique identifier for each instance of a package executing, correct? Yeah, the only place where that is identified is [internal].[executable_statistics].[statistics_id] and that value is not accessible by the package.

I wish there were a way that we could refer to the information therein. I thought I had requested this on Connect before but I just had a search and evidently I have not - hence I have raised this:

[SSIS] Enable 3rd party logging mechanisms to refer to ssisdb.internal.executable_statistics

https://connect.microsoft.com/SQLServer/feedback/details/763236/ssis-enable-3rd-party-logging-mechanisms-to-refer-to-ssisdb-internal-executable-statistics

I would appreciate you voting and leaving a comment

Is that the sort of thing that you were after?

regards

Jamie

September 18, 2012 5:36 AM
 

Johan Dahlin said:

Thanks a lot for you quick answer. You are correct, that is exactly what I want. I will vote.

I wrote a new post before I saw your answer:

The information in the internal.event_messages table is great but my problem is really that I don’t seem to have enough information available in the package system variables in order to connect any custom logging to the catalog logging.

Let’s say for example that I would like to create my own logging of affected records by T-sql statements. I have a package that makes an insert and an update. The insert uses a regular OLEDB connection and the row count is nicely logged in the [internal].[execution_data_statistics] table. The update is performed using an Execute sql task.

The package is executed twice within a batch (ExecutionId). The first run it inserts 10 records and updates 5. The second time it inserts 20 records and updates 7.

What do I store in my custom logging table in order to follow this information? If I only use the package name and the Execution Id I would only know that 12 records were updated by the package in during the batch, not by which execution (row in [internal].[execution_data_statistics])

September 18, 2012 6:20 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement