Microsoft Business Intelligence consultant from the Netherlands
Quite often one or more sources for a data warehouse consist of flat files. Most of the times these files are delivered as a zip file with a date in the file name, for example FinanceDataExport_20100528.zip
Currently I work at a project that does a full load into the data warehouse every night. A zip file with some flat files in it is dropped in a directory on a daily basis. Sometimes there are multiple zip files in the directory, this can happen because the ETL failed or somebody puts a new zip file in the directory manually. Because the ETL isn’t incremental only the most recent file needs to be loaded. To implement this I used the simple code below; it checks which file is the most recent and deletes all other files.
Usage is quite simple, just copy/paste the code in your script task and create two SSIS variables:
- SourceFolder (type String): The folder that contains the (zip) files
- DateInFilename (type Boolean): A flag, set it to True if your filename ends with the date YYYYMMDD, set it to false if creation date of the files should be used
Note: In a previous blog post I wrote about unzipping zip files within SSIS, you might also find this useful: SSIS – Unpack a ZIP file with the Script Task
Public Sub Main()
'Use this piece of code to loop through a set of files in a directory
'and delete all files except for the most recent one based on a date in the filename.
'File name example:
Dim rootDirectory As New DirectoryInfo(Dts.Variables("SourceFolder").Value.ToString) 'Set the directory in SSIS variable SourceFolder. For example: D:\Export\
Dim mostRecentFile As String = ""
Dim currentFileDate As Integer
Dim mostRecentFileDate As Integer
Dim currentFileCreationDate As Date
Dim mostRecentFileCreationDate As Date
Dim dateInFilename As Boolean = Dts.Variables("DateInFilename").Value 'If your filename ends with the date YYYYMMDD set SSIS variable DateInFilename to True. If not set to False.
If dateInFilename Then
'Check which file is the most recent
For Each fi As FileInfo In rootDirectory.GetFiles("*.zip")
currentFileDate = CInt(Left(Right(fi.Name, 12), 8)) 'Get date from current filename (based on a file that ends with: YYYYMMDD.zip)
If currentFileDate > mostRecentFileDate Then
mostRecentFileDate = currentFileDate
mostRecentFile = fi.Name
Else 'Date is not in filename, use creation date
'Check which file is the most recent
For Each fi As FileInfo In rootDirectory.GetFiles("*.zip")
currentFileCreationDate = fi.CreationTime 'Get creation date of current file
If currentFileCreationDate > mostRecentFileCreationDate Then
mostRecentFileCreationDate = currentFileCreationDate
mostRecentFile = fi.Name
'Delete all files except the most recent one
For Each fi As FileInfo In rootDirectory.GetFiles("*.zip")
If fi.Name <> mostRecentFile Then
File.Delete(rootDirectory.ToString + "\" + fi.Name)
Dts.TaskResult = ScriptResults.Success
Since my last blog post about a SSIS package design pattern I’ve received quite some positive reactions and feedback. Microsoft also added a link to the post on the SSIS portal which made it clear to me that there is quite some attention for this subject.
The feedback I received was mainly about two things:
1. Can you visualize the process or make it clearer without the whole technical story so it's easier to understand.
2. How should the Extract phase of the ETL process be implemented when source tables are used by multiple dimensions and/or fact tables.
In this post I will try to answer these questions. By doing so I hope to offer a complete design pattern that is usable for most data warehouse ETL solutions developed using SSIS.
SSIS package design pattern for loading a data warehouse
Using one SSIS package per dimension / fact table gives developers and administrators of ETL systems quite some benefits and is advised by Kimball since SSIS has been released. I have mentioned these benefits in my previous post and will not repeat them here.
When using a single modular package approach, developers sometimes face problems concerning flexibility or a difficult debugging experience. Therefore, they sometimes choose to spread the logic of a single dimension or fact table in multiple packages. I have thought about a design pattern with the benefits of a single modular package approach and still having all the flexibility and debugging functionalities developers need.
If you have a little bit of programming knowledge you must have heard about classes and functions. Now think about your SSIS package as a class or object that exists within code. These classes contain functions that you can call separately from other classes (packages). That would be some nice functionality to have, but unfortunately this is not possible within SSIS by default.
To realize this functionality in SSIS I thought about SSIS Sequence Containers as functions and SSIS packages as classes.
I personally always use four Sequence Containers in my SSIS packages:
- SEQ Extract (extract the necessary source tables to a staging database)
- SEQ Transform (transform these source tables to a dimension or fact table)
- SEQ Load (load this table into the data warehouse)
- SEQ Process (process the data warehouse table to the cube)
The technical trick that I performed - you can read about the inner working in my previous post - makes it possible to execute only a single Sequence Container within a package, just like with functions in classes when programming code.
The execution of a single dimension or fact table can now be performed from a master SSIS package like this:
1 - [Execute Package Task] DimCustomer.Extract
2 - [Execute Package Task] DimCustomer.Transform
3 - [Execute Package Task] DimCustomer.Load
4 - [Execute Package Task] DimCustomer.Process
The package is executed 4 times with an Execute Package Task, but each time only the desired function (Sequence Container) will run.
If we look at this in a UML sequence diagram we see the following:
I think this sequence diagram gives you a good overview of how this design pattern is organized. For the technical solution and the download of a template package you should check my previous post.
How should the Extract phase of the ETL process be implemented when a single source table is used by multiple dimensions and/or fact tables?
One of the questions that came up with using this design pattern is how to handle the extraction of source tables that are used in multiple dimensions and/or fact tables. The problem here is that a single table would be extracted multiple times which is, of course, undesirable.
On coincidence I was reading the book “SQL Server 2008 Integration Services: Problem – Design - Solution” (which is a great book!) and one of the data extraction best practices (Chapter 5) is to use one package for the extraction of each source table. Each of these packages would have a very simple dataflow from the source table to the destination table within the staging area.
Of course this approach will be more time consuming than using one big extract package with all table extracts in it but fortunately it also gives you some benefits:
- Debugging, sometimes a source has changed, i.e. a column’s name could have been changed or completely deleted. The error that SSIS will log when this occurs will point the administrators straight to the right package and source table. Another benefit here is that only one package will fail and needs to be edited, while the others can still execute and remain unharmed.
- Flexibility, you can execute a single table extract from anywhere (master package or dim/fact package).
I recently created some solutions using this extract approach and really liked it. I used 2 SSIS projects:
- one with the dimension and fact table packages
- one with only the extract packages
I have used the following naming conventions on the extract packages: Source_Table.dtsx and deployed them to a separate SSIS folder. This way the packages won’t bother the overview during development.
A tip here is to use BIDS Helper; it has a great functionality to deploy one or more packages from BIDS.
Merging this approach in the design pattern will give the following result:
- The dimension and fact table extract Sequence Containers will no longer have data flow tasks in it but execute package tasks which point to the extract packages.
- The Extract Sequence Container of the master package will execute all the necessary extract packages at once.
This way a single source table will always get extracted only one time when executing your ETL from the master package and you still have the possibility to unit test your entire dimension or fact table packages.
Drawing this approach again in a sequence diagram gives us the following example with a run from the master package (only the green Sequence Containers are executed):
And like this with a run of a single Dimension package:
Overall, the design pattern will now always look like this when executed from a master package:
I think this design pattern is now good enough to be used as a standard approach for the most data warehouse ETL projects using SSIS. Thanks for all the feedback! New feedback is of course more than welcome!
I recently had a chat with some BI developers about the design patterns they’re using in SSIS when building an ETL system. We all agreed in creating multiple packages for the dimensions and fact tables and one master package for the execution of all these packages.
These developers even created multiple packages per single dimension/fact table:
- One extract package where the extract(E) logic of all dim/fact tables is stored
- One dim/fact package with the transform(T) logic of a single dim/fact table
- One dim/fact package with the load(L) logic of a single dim/fact table
I like the idea of building the Extract, Transform and Load logic separately, but I do not like the way the logic was spread over multiple packages.
I asked them why they chose for this solution and there were multiple reasons:
To me these are good reasons, running the E/T/L phases separately is a thing a developer often wants during the development and testing of an ETL system.
Keeping the loading window on the source system as short as possible is something that’s critical in some projects.
Despite the good arguments to design their ETL system like this, I still prefer the idea of having one package per dimension / fact table, with complete E/T/L logic, for the following reasons:
All the logic is in one place
Perform unit testing
If there is an issue with a dimension or fact table, you only have to make changes in one place, which is safer and ore efficient
You can see your packages as separate ETL “puzzle pieces” that are reusable
It’s good from a project manager point of view; let your customer accept dimensions and fact tables one by one and freeze the appropriate package afterwards
The overview in BIDS, having an enormous amount of packages does not make it clearer ;-)
Simplifies deployment after changes have been made
Changes are easier to track in source control systems
Team development will be easier; multiple developers can work on different dim/fact tables without bothering each other.
So basically my goal was clear: to build a solution that has all the possibilities the aforesaid developers asked for, but in one package per dimension / fact table; the best of both worlds.
The solution I’ve created is based on a parent-child package structure. One parent (master) package will execute multiple child (dim/fact) packages. This solution is based on a single (child) package for each dimension and fact table. Each of these packages contains the following Sequence Containers in the Control Flow:
Normally it would not be possible to execute only the Extract, Transform, Load or (cube) Process Sequence Containers of the child (dim/fact) packages simultaneously.
To make this possible I have created four Parent package variable configurations, one for each ETL phase Sequence Container in the child package:
Each of these configurations is set on the Disable property of one of the Sequence Containers:
Using this technique makes it possible to run separate Sequence Containers of the child package from the master package, simply by dis- or enabling the appropriate sequence containers with parent package variables.
Because the default value of the Disable property of the Sequence Containers is False, you can still run an entire standalone child package, without the need to change anything.
Ok, so far, so good. But, how do I execute only one phase of all the dimension and fact packages simultaneously? Well quite simple:
First add 4 Sequence Containers to the Master package. One for each phase of the ETL, just like in the child packages
Add Execute Package Tasks for all your packages in every Sequence Container
If you would execute this master package now, every child package would run 4 times as there are 4 Execute Package Tasks that run the same package in every sequence container.
To get the required functionality I have created 4 variables inside each Sequence Container (Scope). These will be used as parent variable to set the Disable properties in the child packages. So basically I’ve created 4 variables x 4 Sequence Containers = 16 variables for the entire master package.
Variables for the EXTRACT Sequence Container (vDisableExtract False):
Variables for the TRANSFORM Sequence Container (vDisableTransform False):
The LOAD and PROCESS Sequence Containers contain variables are based on the same technique.
Run all phases of a standalone package: Just execute the package:
Run a single phase of the ETL system (Extract/Transform/Load/Process): Execute the desired sequence container in the main package:
Run a single phase of a single package from the master package:
Run multiple phases of the ETL system, for example only the T and L: Disable the Sequence Containers of the phases that need to be excluded in the master package:
Run all the child packages in the right order from the master package:
When you add a breakpoint on, for example, the LOAD Sequence Container you see that all the child packages are at the same ETL phase as their parent:
When pressing Continue the package completes:
This parent/child package design pattern for loading a Data Warehouse gives you all the flexibility and functionality you need. It’s ready and easy to use during development and production without the need to change anything.
With only a single SSIS package for each dimension and fact table you now have the functionality that separate packages would offer. You will be able to, for example, run all the Extracts for all dimensions and fact tables simultaneously like the developers asked for and still have the benefits that come with the one package per dimension/fact table approach. Of course having a single package per dimension or fact table will not be the right choice in all cases but I think it is a good standard approach.
Same applies to the ETL phases (Sequence Containers). I use E/T/L/P, but if you have different phases, which will be fine, you can still use the same technique.
Download the solution with template packages from the URL’s below. Only thing you need to do is change the connection managers to the child packages (to your location on disk) and run the master package!
Download for SSIS 2008
Download for SSIS 2005
If you have any suggestions, please leave them as a comment. I would like to know what your design pattern is as well!
ATTENTION: See Part-2 on this subject for more background information!
How to: Use the Values of Parent Variables in a Child Package: http://technet.microsoft.com/en-us/library/ms345179.aspx
Recently I had to create a fact table with a lower grain than the source database. My source database contained order lines with a start- and end date and monthly revenue amounts.
To create reports that showed overall monthly revenue per year, lowering the grain was necessary. Because the lines contained revenue per month I decided to blow out the grain of my fact table to monthly records for all the order lines of the source database. For example, an order line with a start date of 1 January 2009 and an end date of 31 December 2009 should result in 12 order lines in the fact table, one line for each month.
To achieve this result I exploded the source records against my DimDate. I used a standard DimDate:
The query below did the job; use it in a SSIS source component and it will explode the order lines to a monthly grain:
- SELECT OL.LineId
- FROM OrderLine OL
- INNER JOIN DimDate DD
- ON DD.Month
- WHERE DD.DayOfMonth = 1
Some explanation about this query below:
· I always want to connect a record to the first day of the month in DimDate, that’s why this WHERE clause is used:
· Because I want to do a join on the month (format: YYYMM) of DimDate I need to format the start and end date on the same way (YYYYMM):
The source, order lines with a start and end date:
The Result, monthly order lines:
The excel connection manager scans every first 8 rows to determine the data type for a column in your SSIS source component. So if an Excel sheet column has integers on the first 8 rows and a string value on the 9th row, your data flow task will crash when executed because SSIS expects integers.
Fortunately you can change the number of rows that Excel will scan with the TypeGuessRows registry property.
1. Start Registry Editor by typing "regedit" in the run bar of the Start menu.
2. Search the register (CTRL-F) on "TypeGuessRows".
3. Double click "TypeGuessRows" and edit the value.
Todd McDermid (MVP) commented the following useful addition:
"Unfortunately, that reg key only allows values from 1 to 16 - yes, you can only increase the number of rows Excel will "sample" to 16."
Robbert Visscher commented:
"The reg key also allows the value 0. When this value is set, the excel connection manager scans every row to determine the data type for a column in your SSIS source component."
Thanks Robbert, I think setting it to 0 can be very powerful in some scenario's!
So the conclusion of the comments of Todd and Robbert is that a value from 0 to 16 is possible:
- TypeGuessRows 0: All rows will be scanned. This might hurt performance, so only use it when necessary.
- TypeGuessRows 1-16: A value between 1 and 16 is the default range for this reg key, use this in normal scenario's.
A while ago I needed to unpack a couple of zip files from SSIS. There is no Microsoft SSIS task that contains this functionality so I searched the Internet. It seems that there are quite some third party tools that offer this functionally. It's also possible to download custom SSIS tasks. I personally always try to avoid third party tools and custom tasks so I searched on.
It seemed there is a way to unzip files from SSIS with the Script Task. With some Visual Basic code using the Visual J# Library you can do the job. In this blog post I will use a Foreach Loop Container to loop through a folder that contains multiple zip files and unzip them one-by-one.
Make sure you have the Microsoft Visual J# Redistributable Package installed because a reference to vjslib.dll (Visual J# Library) is needed in the Script Task. Download it here for free.
Drag and drop a Foreach Loop Container on the Control Flow and create three variables with scope on the Foreach Loop Container:
Now configure the Foreach Loop Container:
- Enumerator: Foreach File Enumerator
- Files: *.zip
- Retrieve file name: Name and extension
Next click on the + next to Expressions add the following expression to connect the SourceFolder variable to the Directory property of the Foreach Loop Container:
Now go to the Variable Mappings and select the FileName variable on Index 0. Doing this we will be able to access the current file name when the Foreach Loop Container enumerates the zip files.
Now drag and drop a Script Task on the Control Flow, inside the Foreach Loop Container:
Open the Script Task Editor and do the following:
- Set the ScripLanguage on: Microsoft Visual Basic 2008
- Select our three ReadOnlyVariables using the new SSIS2008 Select Variables window:
Now click Edit Script and copy/paste the following script:
Public Sub Main()
Dim strSourceFile As String
Dim strDestinationDirectory As String
'MsgBox("Current File: " & Dts.Variables("FileName").Value.ToString)
strDestinationDirectory = Dts.Variables("DestinationFolder").Value.ToString
strSourceFile = Dts.Variables("SourceFolder").Value.ToString & Dts.Variables("FileName").Value.ToString
Dim oFileInputStream As New java.io.FileInputStream(strSourceFile)
Dim oZipInputStream As New java.util.zip.ZipInputStream(oFileInputStream)
Dim bTrue As Boolean = True
Dim sbBuf(1024) As SByte
While 1 = 1
Dim oZipEntry As ZipEntry = oZipInputStream.getNextEntry()
If oZipEntry Is Nothing Then Exit While
If oZipEntry.isDirectory Then
If Not My.Computer.FileSystem.DirectoryExists(strDestinationDirectory & oZipEntry.getName) Then
My.Computer.FileSystem.CreateDirectory(strDestinationDirectory & oZipEntry.getName)
Dim oFileOutputStream As New java.io.FileOutputStream(strDestinationDirectory.Replace("\", "/") & oZipEntry.getName())
While 1 = 1
Dim iLen As Integer = oZipInputStream.read(sbBuf)
If iLen < 0 Then Exit While
oFileOutputStream.write(sbBuf, 0, iLen)
Catch ex As Exception
Throw New Exception(ex.Message)
Now only one thing needs to be done, add a reference to vjslib.dll (Visual J# Library):
Your unzip solution is ready now! For testing purposes you can uncomment the following line in the script to see the file name of each processed zip file in a message box at runtime:
'MsgBox("Current File: " & Dts.Variables("FileName").Value.ToString)
You can use this solution in many ways, for example, I used it in the solution below where I download multiple zip files from an FTP. These zip files contain CSV's that are used as source for the loading of a data warehouse.
A while ago I figured out that the lookup transformation is case sensitive.
I used a lookup to find dimension table members in for my fact table records. This was done on a String business key like ‘AA12BB’. I attached a table for the error output and after running the package I found one record in this table.This record had a business key like ‘Aa12BB’. I searched the dimension table for this missing record and it surprised me, it DID exist but with the following business key: ‘AA12BB’. It seemed the lookup transformation is case sensitive. Next thing I tried was a T-SQL query in the management studio of SQL Server 2005. In the WHERE clause I referred to the business key: ‘Aa12BB’. The query returned the record with business key ‘AA12BB’. Conclusion: SQL Server is not case sensitive but the SSIS lookup component IS case sensitive… Interesting.
After some research I found a few solutions for this interesting feature of the lookup transformation. Before I explain these solutions you must know something about the inner working of the lookup component.
A lookup transformation uses full caching by default. This means that the first thing it does on execution, is loading all the lookup data in its cache. When this is done it works as expected, but with case sensitivity.
The solution is to set the CacheType property of the lookup transformation to Partial or None, the lookup comparisons will now be done by SQL Server and not by the SSIS lookup component.
Another solution is to format the data before you do the lookup. You can do this using the T-SQL LOWER() or UPPER() functions. These functions can be used in a query or for example in a derived column SSIS component.
Often I see developers on different forums asking how they can clear the SSAS cache. You can achieve this by restarting SSAS, which is done quite often.
Restarting is not necessary though, it's possible to clear the cache of an SSAS database with an XMLA script. Use the script below and replace "YourSsasDatabaseId" with the ID of your SSAS database and "YourSsasCubeId" with the ID of your SSAS cube.
Please note that you must use the ID's of the database/cube and not the name! You can find the ID's of these objects at the properties in BIDS or in SSMS. As you can see on the screenshot below, the cube/database name is not always the same as the ID. This particular cube has ‘Finance' as its name and ‘DW' as its ID!
You can run this script manually from SSMS or automatically using SSIS. To run it from SSMS, right click your SSAS database and choose New Query > MDX. Although this is an XMLA script and not MDX it can be executed as MDX script.
Running this script from SSMS is useful but I think most developers would want to clear the cache automatically. Fortunately SSIS has a ‘Analysis Services Execute DDL Task' that can execute this script!
Just configure the task for your SSAS instance and copy/paste the script in the SourceDirect box as shown below.
With the possibilities SSIS offers you can now clear the cache anytime you want. For example you could run it right after you have processed the cube. A lot of developers restart the server after processing but with this script that is no longer necessary!
Thanks to Jamie Thomson for sharing this script on his blog!
Recently I faced some problems with the performance of SSAS dimensions. The cube users were using a large dimension with more than 100.000 members that didn't perform well.
They tried to add a leaf dimension member on an Excel 2007 pivot table. When dragging this dimension member onto the rows they had to wait very long before the members returned from SSAS and showed on the screen.
After some mailing with Chris Webb he thought this could have something to do with the default cube measure. It seems that when you query dimension members without picking a measure, SSAS takes the first measure from the first measure group as its default measure. So even when you only query a dimension, SSAS is still using a measure!
You can find out which measure SSAS will take with the following query:
SELECT [Measures].DefaultMember ON 0 FROM [YourCube]
In this case the default measure that SSAS picked was from a measure group that was not connected to the dimension that was giving the performance problems. This, plus the fact that returning 100.000 times NULL is faster then returning some big float number, explained the performance issue.
Chris advised me to use a NULL calculation as default measure, as he explains on his blog. The only problem here is that you can't select a calculation as default measure in the cube properties in BIDS (only normal measures are allowed):
Fortunately pasting this MDX statement in the calculations script (use the script view on the calculations tab) did the trick. Just paste it right under the CALCULATE command (or somewhere else):
CREATE MEMBER CURRENTCUBE.MEASURES.UseAsDefaultMeasure
VISIBLE = 1;
ALTER CUBE CURRENTCUBE UPDATE DIMENSION Measures,
DEFAULT_MEMBER = [Measures].UseAsDefaultMeasure;
When you return to the form view it should look like this:
The next time you deploy the cube, SSAS will use the [UseAsDefaultMeasure] calculation as its default measure.
I now had a nice opportunity to test the cube performance with and without the NULL calculation as default cube measure. I cleared the cache before each query to get a good comparison.
In SSMS I used the following query:
SELECT [Measures].DefaultMember ON 0,
[Dimension].[AttributeHierarchy].Members ON 1
The results are shown in the bar-chart below:
- The query with the NULL default measure took 13 seconds.
- The query without the NULL default measure took 4 minutes and 35 seconds.
Conclusion: Using a NULL default measure can be much faster when querying dimension members without selecting a measure. The result was shown more than 20 times faster in this particular case.
Thanks to Chris Webb for helping me out on this issue!
A lot of developers have SQL Server and its different services running on their PC or notebook to develop or test BI solutions. Unfortunately this slows down your system quite a lot. To speed things up when not using SQL Server, I used to stop and start each service manual quite often.
Recently I found out that it's possible to start and stop all services at once with a simple batch file. It now only takes a couple of seconds instead of a few minutes and some annoying steps.
Copy/paste the following in a .txt file and rename it to .bat to make it a batch file, execute it by double clicking the file.
NET START "SQL Server Agent (MsSqlServer)"
NET START "MsSqlServer"
NET START "MsSqlServerOlapService"
NET START "ReportServer"
NET START "SQL Server Integration Services"
NET STOP "SQL Server Agent (MsSqlServer)"
NET STOP "MsSqlServer"
NET STOP "MsSqlServerOlapService"
NET STOP "ReportServer"
NET STOP "SQL Server Integration Services"
Replace the multiple "Lookup Error Output" Derived Columns and Union All's with a single Derived Column and get a performance boost...
Almost every BI developer needs to perform lookups while loading the Data Warehouse. When loading the fact tables for example, lookups are used to receive the (surrogate) primary keys of the dimension tables that are connected to the fact table.
I, and I think most developers, perform the DWH fact-dimension lookups in the following way:
1. Each fact record contains business keys to the different dimensions.
2. Each dimension record contains a business key and a surrogate key (integer).
3. Each dimension contains one unknown member, with a surrogate key of value 0.
4. Match the dimension business key in the fact record to the business key in the dimension to receive the dimension surrogate key, using a Lookup Component. If the lookup gained no match, point to the unknown member in the dimension (surrogate key 0).
5. Store the fact record in the DWH fact table with the gained surrogate keys of the dimensions.
Loading a fact table in the way described above would typically look like this:
I have tested this solution with 313.341 records. This took 1 minute and 23 seconds.
This way of loading the fact table contains a lot of semi-blocking components: the Union All's. Read more about semi-blocking components in this blog, posted by me a little while ago. These components cause an unnecessary negative impact on the performance, as you can read in my other blog.
There is a much more efficient way to load a fact table in a datawarehouse:
1. Set the all the Lookup Error Outputs to "Ignore Failure".
2. Delete all the Union All and Derived Column components.
3. Add a Derived Column component to the end of the flow and add the unknown surrogate keys like this:
The data flow now looks like this:
The loading of exactly the same amount of records (313.341) now took just 45 seconds! This is a performance boost of almost 40%. Loading your fact table like this does not only decrease your loading time, it also takes less development time. So a win-win situation!
A minor downside might be that you can't easily see how much records have failed a particular lookup component, but it's not so hard to make a simple (SSRS) report which gives you this overview. It might be a good opportunity to include the information of all your fact tables in one single report, it will give you a much better overview of lookup-failures during the load of your fact tables then all the separate SSIS dataflow tasks.
My last blog is from a few months ago because I have been quite busy lately. Fortunately I have quite some topics that I want to blog about, starting with this short blog about the latest Microsoft certification I've gained.
In March of this year I passed the 70-556 “Microsoft Office PerformancePoint Server 2007” exam. This makes me one of the 486(September, 2007) Microsoft Certified Technology Specialists worldwide! (Nr. of MCP's worldwide)
It wasn't a tough exam compared to the previous exams I took. Time was no problem, and there were only 44 questions that were all of the multiple choice type where only 1 answer is correct. The exam contained almost no focus on PerformancePoint Planning, so make sure you focus on the Monitoring and Analyzing parts of PPS while studying for this exam.
I used the Rational Guides to PPS books to study for the exam. Next to the books I followed a course and together this was enough to complete the exam successfully.
Find more information here:
Microsoft 70-556 Exam - MCTS Performance Point
On the MCTS Exam for PerformancePoint Server 2007
Preparation Guide for Exam 70-556
My Microsoft Transcript is now:
Microsoft Certification Status
Certification / Version
Microsoft Certified IT Professional
Business Intelligence Developer
Mar 28, 2008
Microsoft Certified Technology Specialist
Office PerformancePoint Server 2007, Applications
Jun 19, 2008
Microsoft Certified Technology Specialist
Microsoft® SQL ServerTM 2005 Business Intelligence Development
Oct 19, 2007
Microsoft Certification Exams Completed Successfully
TS: Microsoft Office PerformancePoint Server 2007, Application Development
Jun 19, 2008
PRO: Designing a Business Intelligence Infrastructure by Using Microsoft® SQL ServerTM 2005
Mar 28, 2008
TS: Microsoft SQL Server 2005 Business Intelligence-Implementation and Maintenance
Oct 19, 2007
What if you want a dynamic list of values in a matrix but with a maximum of 5 rows. How do you create a matrix like this? I thought this should be an easy job but I found out it was not really simple…
I tried to create a matrix like this for a dynamic list of countries. In this blog I will explain how you can achieve this with a few simple steps.
1. You need to create an MDX(I used a SSAS datasource) query that returns the list of countries with a numbering:
2. Next thing you need to do is create a matrix:
3. Next and last thing you need to do is the following:
- Use the following expression for the row group: =(Fields!Country_Number.Value - 1) Mod 5
- Use the following expression for the column group: =Floor((Fields!Country_Number.Value - 1) / 5)
How do you create a static column header centered above your dynamic columns? One way to try achieving this is to place a textbox above your dynamic columns. One thing is for sure, the textbox will never be on the perfect centered location and what if the number of dynamic columns grow or shrink?
Thing you need to do is to create a static column group. You do this by adding a new column group to the matrix and give it a static expression, for example: =”static”
Now make it the top group by clicking Up for the static column group on the Groups tab of the matrix’s properties. You can also achieve this by just dragging the column group up in the layout view.
The result, a centered and perfect aligned column header with the text “YTD” above some dynamic columns containing years:
The script below is really nice if you need to convert String dates that come in different (unexpected) formats.
Just copy and paste the code in your Script Task (paste the function outside Main()) and it works right away.
Supported date formats:
Of course it's possible to add code for more date formats yourself. If you want to, copy and paste your code in a comment. I will then add the code to this blog.
Public Shared Function GetDateFromString(ByVal stringDate As String) As DateTime
Dim datetimeResult As DateTime
Dim centuryToAdd As Integer = 1900
If (Convert.ToInt32(stringDate.Substring(0, 2)) < 80) Then
centuryToAdd = 2000
If (stringDate.Length = 6) Then
'Format is: YYMMDD
datetimeResult = New DateTime((centuryToAdd + Convert.ToInt32(stringDate.Substring(0, 2))), Convert.ToInt32(stringDate.Substring(2, 2)), Convert.ToInt32(stringDate.Substring(4, 2)), 0, 0, 0)
If (stringDate.Length = 8) Then
If (stringDate.IndexOf("-") > 0) Then
'Format is: YY-MM-DD
datetimeResult = New DateTime((centuryToAdd + Convert.ToInt32(stringDate.Substring(0, 2))), Convert.ToInt32(stringDate.Substring(3, 2)), Convert.ToInt32(stringDate.Substring(6, 2)), 0, 0, 0)
'Format is: YYYYMMDD
datetimeResult = New DateTime(Convert.ToInt32(stringDate.Substring(0, 4)), Convert.ToInt32(stringDate.Substring(4, 2)), Convert.ToInt32(stringDate.Substring(6, 2)), 0, 0, 0)
If (stringDate.Length = 10) Then
'Format is: YYYY-MM-DD
datetimeResult = New DateTime(Convert.ToInt32(stringDate.Substring(0, 4)), Convert.ToInt32(stringDate.Substring(5, 2)), Convert.ToInt32(stringDate.Substring(8, 2)), 0, 0, 0)
Catch e As Exception
'No date format found: Return unknown(1/1/1900)
datetimeResult = New DateTime(1900, 1, 1, 0, 0, 0)
If you want to convert a String SSIS variable and load it into a DateTime SSIS variable, use the following code in your Script Task:
Dts.Variables("someDateTimeVariable").Value = GetDateFromString(Dts.Variables("someStringVariable").Value.ToString)