cdc small business finance
![]()
Not that the tool is a control flow, not a data flow task, but nonetheless requires a data connections. In this demo we are going to look at customer data and output our findings to an xml file. This task has several panes:
![]()
The general tab allows you to specify where the profiling data ends up, in this case NewCustomerProfile.xml. BTW the quick profile button here allows you to run a profiling job on the fly with default settings.
The Profile Requests pane is where you setup what you want to analyse..
![]()
Here I have three different kinds of analysis and I have focused on the last one to look at Value distribution. Note the connection and table name can be individually specified.
The task can then be run to generate the file, and then you can take a look at the answers. I am not to good at quickly making sense of xml and so it’s good to know there is a special data profile viewer go to ..
All Programs -> Microsoft SQL Server 2008 -> Integration Services -> Data Profile Viewer.
Looking at the distribution of values for gender …
![]()
you can see that we have three M F and Male, something we might want to clean up as part of the ETL process. The other interesting thing to do is look at NULL distribution as there is little po9int in loading in a little used column no matter how important it is.
Armed with this tool you can go back to the business and show them how clean there data is and then collaborate on how to resolve each issue.
To use this you need to tell the inevitable wizard what connection, table and columns you want to cache…
![]()
Note the 1 in the index position for the SalesTerritoryAlternateKey row to indicate this is indexed (at least one column must be indexed).
So that’s how you populate the cache, now it can be used in the standard lookup transformation by selecting cache connection manager on the general tab for the lookup
![]()
and then putting in the cache location on the connection tab (You will be warned to do this). You can then use the columns form the cache in the same way as for a normal lookup. As ever I’ve done this in AdventureWorks so you should be able to follow this.
This is going to be useful in applying the same (conformed) dimension to multiple fact tables, particularly if the dimension doesn’t change that often. I would also point out that there has been a lot of work behind the scenes to improve the parallelism in integration services, which will kick in when you upgrade.
Day 13 of my virtual advent calendar, about stuff I like in SQL Server 2008..
The SQL Group by clause has been around since I can remember, but in SQL Server 2008 we know have grouping sets. Imagine you wanted to sum sales amount from the adventure works reseller sales table by year, by Product Category and by Year AND Product Category perhaps as the first stage in writing a report. You also need to have all the results in one query so you end up doing something like this in SQL Server 2005:
SELECT
NULL AS [year],
dpc.EnglishProductCategoryName AS Category,
SUM(frs.salesamount)
FROM FactResellerSales frs
INNER JOIN DimProduct dp ON dp.ProductKey = frs.ProductKey
INNER JOIN DimP