SQL Server 2016 CTP 2 Preview Now Available

Back in the beginning of May, Microsoft announced SQL Server 2016 and the massive amount of new features that are going to be released. Combining this with all of the Power BI enhancements and Azure enhancements that are being released, it is a great time to be a Microsoft BI professional. This afternoon the SQL Server team has now released the first publicly available preview for SQL Server 2016, as referenced in this blog post! The download is available here.

If you’ve forgotten all of the amazing new features that are scheduled for release with this version, be sure to check out the data sheet.

What are you waiting for? Download your copy today!

SQL Server 2016 is coming

Yesterday at the keynote for the Microsoft Ignite conference Satya Nadella announced the next version of SQL Server, which will be available next year and in public preview later this summer. Rather than hitting all the marketing jumbo that’s been repeated elsewhere, I’d like to focus on the three biggest enhancements for BI and why you should already start thinking about upgrading to the next version.

R Integration directly into SQL Server

Much like we’ve had the ability to write DMX queries against SSAS Multidimensional models, having the ability to write R code directly into SQL Server is going to open SQL Server up to a much broader user base, and empower more people to analyze their data in different ways.

Enhanced SSIS

A feature I myself overlooked the first time I read the data sheet yesterday, but the ability to select the specific version of SSIS you want to build to and support for Power Query is going to be some serious awesome-sauce. Chris Webb (b|t) is showing us some awesome stuff that M can do, and giving SSIS the ability to capitalize will make for some very efficient ETL, especially when it’s combined with the ability to pull data from cloud sources via Azure Data Factory. The ability to select what version of SSIS you want to build for is a feature that’s long overdue, and as a consultant I couldn’t be happier.

Polybase into SQL Server

This feature alone justifies the upgrade. As a consultant, I hear a lot of companies talking about Hadoop and what to do with their unstructured data. Should we invest in Hadoop? What are the big barriers to entry? One of the biggest has been cost, especially since to date Polybase has only been available in APS (Analytic Platform System), but by putting Polybase into SQL Server directly Microsoft is really going to be able to step ahead of the entire market. Imagine being able to run this query:

SELECT SUM(salesAmount), h.CustomerInfo

FROM MyFactTable

JOIN HDFS.dbo.Customers

No more heavy loading of needing to write MapReduce jobs and complicated logic to get unstructured data and structured data together for analysis. Now, it just works. This combined with Azure SQL Data Warehouse announced last week at Build has me extremely excited about the rest of the year and 2016!

The complete data sheet is available for download here.

SSIS: Password Policy Rules Not Met When Creating Catalog

Last week I was at a client site showing them how to set up the Integration Services Catalog. We were walking through the steps in how to create it and we input the password into the Create Catalog configuration screen. It was a strong password, that adhered to all domain requirements. However, when we clicked Ok, we received the following error:

TITLE: Microsoft SQL Server Management Studio

——————————

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.Management.IntegrationServices)

——————————

ADDITIONAL INFORMATION:

The transaction ended in the trigger. The batch has been aborted.

========================

   PASSWORD POLICY RULES NOT MET

========================

Login: “##MS_SSISServerCleanupJobLogin##” either has “Enforce password policy” check set to OFF

OR

has “Enforce password expiration” check set to ON.

[Correct Values: “Enforce password policy” = Checked; “Enforce password expiration”= Unchecked ]

Rolling back…

===============================================

(Microsoft SQL Server, Error: 3609)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&ProdVer=11.00.3000&EvtSrc=MSSQLServer&EvtID=3609&LinkId=20476

——————————

BUTTONS:

OK

——————————

Ok, let’s check our password again. Type a stronger password, try it again. Still no luck. Going out to Bingoogle didn’t provide much assistance, apart from some old SQL Server 2000 issues with this from the database engine when creating a login. So that’s not it.

Let’s go check the local security policy and the domain policy. Yep, password meets requirements. Don’t see anything that’s terribly out of wack. So my next step was to go check the event viewer and application logs to look for an error, but there was nothing there. This server was created by another team off-site, so let’s go check the OS locale settings and the collation level for SQL Server to see if there are some settings off somewhere. still nothing! At this point I’m starting to sweat a little. In this environment, we’d created two servers (one for SSIS and the DB Engine and one for SSAS), so let’s switch over to the other server and see if it happens there too. yep, same problem. My next step was to create an inbound rule in the firewall settings, thinking that maybe there was some port blocking happening somewhere. A twitter out reach on #sqlhelp didn’t yield any results either. Although I do want to throw a quick thanks out to Robert Davis and Denny Cherry for the assistance!

When that still failed, I figured I’d just try and create the Catalog with Powershell. Fortunately, the code to do this is readily available on MSDN:

# Load the IntegrationServices Assembly
[Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.Management.IntegrationServices")

# Store the IntegrationServices Assembly namespace to avoid typing it every time
$ISNamespace = "Microsoft.SqlServer.Management.IntegrationServices"

Write-Host "Connecting to server ..."

# Create a connection to the server
$sqlConnectionString = "Data Source=localhost;Initial Catalog=master;Integrated Security=SSPI;"
$sqlConnection = New-Object System.Data.SqlClient.SqlConnection $sqlConnectionString

# Create the Integration Services object
$integrationServices = New-Object $ISNamespace".IntegrationServices" $sqlConnection

# Provision a new SSIS Catalog
$catalog = New-Object $ISNamespace".Catalog" ($integrationServices, "SSISDB", "P@assword1")
$catalog.Create()

I ran it locally on the server, and it failed with the same issue. There has to be something in the internal scripts causing this issue. So my next step was to go back to the basics. We do know that the issue is being caused by this specific login as a part of the process: “##MS_SSISServerCleanupJobLogin##”. Well, CleanupJobLogin sounds like the account that’s used to run the SSIS Server Maintenance Job. If we can get everything to create except for the job, that gives us the Catalog and the main SSISDB. All that login does is serve as the job owner for the “SSIS Server Maintenance Job”, which runs two procedures in the SSISDB: [internal].[cleanup_server_retention_window] and [internal].[cleanup_server_project_version].

Fire up Profiler, and run a trace. Go back in to “Create Catalog”, type the password, and click Ok. I know it’s going to fail, but that’s not what I’m after here. Go into the trace and capture all of the SQL scripts that the Catalog is running.

select SERVERPROPERTY(N’servername’)
DECLARE @edition sysname;
SET @edition = cast(SERVERPROPERTY(N’EDITION’) as sysname);
select case when @edition = N’SQL Azure’ then 2 else 1 end as ‘DatabaseEngineType’

exec sp_executesql N’SELECT
”IntegrationServices[@Name=” + quotename(CAST(SERVERPROPERTY(N”Servername”) AS sysname),””””) + ”]” AS [Urn],
CAST(SERVERPROPERTY(N”Servername”) AS sysname) AS [Name]
WHERE
(CAST(SERVERPROPERTY(N”Servername”) AS sysname)=@_msparam_0)’,N’@_msparam_0 nvarchar(4000)’,@_msparam_0=N’PWIRTPTPSP3W12′

SELECT ISNULL(HAS_PERMS_BY_NAME(null, null, ‘ALTER SETTINGS’), 0)
SELECT ISNULL(IS_SRVROLEMEMBER (‘sysadmin’), 0)
GO
————————————————————–

USE [master]
select SERVERPROPERTY(N’servername’)
DECLARE @edition sysname;
SET @edition = cast(SERVERPROPERTY(N’EDITION’) as sysname); select case when @edition = N’SQL Azure’ then 2 else 1 end as ‘DatabaseEngineType’

exec sp_executesql N’SELECT
”IntegrationServices[@Name=” + quotename(CAST(SERVERPROPERTY(N”Servername”) AS sysname),””””) + ”]” AS [Urn],
CAST(SERVERPROPERTY(N”Servername”) AS sysname) AS [Name]
WHERE
(CAST(SERVERPROPERTY(N”Servername”) AS sysname)=@_msparam_0)’,N’@_msparam_0 nvarchar(4000)’,@_msparam_0=N’PWIRTPTPSP3W12′
GO

———————————————————————————–

USE [master]
–exec sp_reset_connection
DECLARE @edition sysname;
SET @edition = cast(SERVERPROPERTY(N’EDITION’) as sysname); select case when @edition = N’SQL Azure’ then 2 else 1 end as ‘DatabaseEngineType’
SELECT
cfg.name AS [Name],
cfg.configuration_id AS [Number],
cfg.minimum AS [Minimum],
cfg.maximum AS [Maximum],
cfg.is_dynamic AS [Dynamic],
cfg.is_advanced AS [Advanced],
cfg.value AS [ConfigValue],
cfg.value_in_use AS [RunValue],
cfg.description AS [Description]
FROM
sys.configurations AS cfg

exec sp_executesql N’
        –Preparing to access the Catalog object
        DECLARE @t_catalogs TABLE (
        Name sysname COLLATE SQL_Latin1_General_CP1_CI_AS,
        EncryptionAlgorithm nvarchar(256) COLLATE SQL_Latin1_General_CP1_CI_AS,
        SchemaVersion int,
        SchemaBuild nvarchar(256) COLLATE SQL_Latin1_General_CP1_CI_AS,
        OperationLogRetentionTime int,
        MaxProjectVersions int,
        OperationCleanupEnabled bit,
        VersionCleanupEnabled bit,
        ServerLoggingLevel int,
        OperationLogNumberOfRecords int,
        VersionLogNumberOfRecords int)

        IF DB_ID(”SSISDB”) IS NOT NULL
        BEGIN
        INSERT INTO @t_catalogs VALUES(
        ”SSISDB”,
        (SELECT [property_value] FROM [SSISDB].[catalog].[catalog_properties] WHERE [property_name]  = N”ENCRYPTION_ALGORITHM”),
        (SELECT CAST([property_value] AS INT) FROM [SSISDB].[catalog].[catalog_properties] WHERE [property_name]  = N”SCHEMA_VERSION”),
        (SELECT [property_value] FROM [SSISDB].[catalog].[catalog_properties] WHERE [property_name]  = N”SCHEMA_BUILD”),
        (SELECT CAST([property_value] AS INT) FROM [SSISDB].[catalog].[catalog_properties] WHERE [property_name]  = N”RETENTION_WINDOW”),
        (SELECT CAST([property_value] AS INT) FROM [SSISDB].[catalog].[catalog_properties] WHERE [property_name]  = N”MAX_PROJECT_VERSIONS”),
        (SELECT CAST([property_value] AS BIT) FROM [SSISDB].[catalog].[catalog_properties] WHERE [property_name]  = N”OPERATION_CLEANUP_ENABLED”),
        (SELECT CAST([property_value] AS BIT) FROM [SSISDB].[catalog].[catalog_properties] WHERE [property_name]  = N”VERSION_CLEANUP_ENABLED”),
        (SELECT CAST([property_value] AS INT) FROM [SSISDB].[catalog].[catalog_properties] WHERE [property_name]  = N”SERVER_LOGGING_LEVEL”),
        (SELECT COUNT(operation_id) FROM [SSISDB].[catalog].[operations]),
        (SELECT COUNT(object_id) FROM [SSISDB].[catalog].[object_versions])
        )
        END

SELECT
”IntegrationServices[@Name=” + quotename(CAST(SERVERPROPERTY(N”Servername”) AS sysname),””””) + ”]” + ”/Catalog[@Name=” + ”””” + REPLACE((SELECT Name from @t_catalogs), ””””, ””””””) + ”””” + ”]” AS [Urn],
(SELECT Name from @t_catalogs) AS [Name],
(SELECT EncryptionAlgorithm from @t_catalogs) AS [EncryptionAlgorithm],
(SELECT SchemaVersion from @t_catalogs) AS [SchemaVersion],
(SELECT SchemaBuild from @t_catalogs) AS [SchemaBuild],
(SELECT OperationLogRetentionTime from @t_catalogs) AS [OperationLogRetentionTime],
(SELECT MaxProjectVersions from @t_catalogs) AS [MaxProjectVersions],
(SELECT OperationCleanupEnabled from @t_catalogs) AS [OperationCleanupEnabled],
(SELECT VersionCleanupEnabled from @t_catalogs) AS [VersionCleanupEnabled],
(SELECT ServerLoggingLevel from @t_catalogs) AS [ServerLoggingLevel],
(SELECT OperationLogNumberOfRecords from @t_catalogs) AS [OperationLogNumberOfRecords],
(SELECT VersionLogNumberOfRecords from @t_catalogs) AS [VersionLogNumberOfRecords]
WHERE
(CAST(SERVERPROPERTY(N”Servername”) AS sysname)=@_msparam_0)’,N’@_msparam_0 nvarchar(4000)’,@_msparam_0=N’PWIRTPTPSP3W12′

SELECT ISNULL(IS_SRVROLEMEMBER (‘sysadmin’), 0)

exec sp_executesql N’SELECT name FROM msdb.sys.sysdatabases WHERE name = @dbname’,N’@dbname nvarchar(6)’,@dbname=N’SSISDB’

declare @key_value nvarchar(1024); exec master.dbo.xp_regread ‘HKEY_LOCAL_MACHINE’,’SOFTWARE\Microsoft\Microsoft SQL Server\110\SSIS\Setup\DTSPath’, N”, @key_value output;select @key_value;

DECLARE @CatalogFileExists bit
if object_id(‘tempdb..#t’) is not null drop table #t
BEGIN
       DECLARE @CatalogFile nvarchar(1024)
       SELECT @CatalogFile = N’C:\Program Files\Microsoft SQL Server\110\DTS\Binn\SSISDBBackup.bak’
       CREATE TABLE #t (file_exists int, is_directory int, parent_directory_exists int)
       INSERT #t EXEC xp_fileexist @CatalogFile
       SELECT TOP 1 @CatalogFileExists=file_exists from #t
END
SELECT @CatalogFileExists
— check whether the database is already exist and with correct version
SET NOCOUNT ON

IF DB_ID(‘SSISDB’) IS NOT NULL
    RAISERROR(27135, 16, 1, ‘SSISDB’)

IF SUBSTRING (CAST(SERVERPROPERTY(‘ProductVersion’) AS VARCHAR(20)),1,2) <> ’11’
    RAISERROR (27193,16,1, ‘Denali’) WITH NOWAIT

SELECT [value_in_use] FROM sys.configurations WHERE [name] = ‘clr enabled’
DECLARE @path nvarchar(1024) = Convert(nvarchar(1024),ServerProperty(‘MasterFile’)); SELECT @path = SUBSTRING(@path, 1, CHARINDEX(N’master.mdf’, LOWER(@path)) – 1); SELECT @path;
–DECLARE @CatalogFileExists bit
if object_id(‘tempdb..#t’) is not null drop table #t
BEGIN
       –DECLARE @CatalogFile nvarchar(1024)
       SELECT @CatalogFile = N’D:\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\SSISDB.mdf’
       cREATE TABLE #t (file_exists int, is_directory int, parent_directory_exists int)
       INSERT #t EXEC xp_fileexist @CatalogFile
       SELECT TOP 1 @CatalogFileExists=file_exists from #t
END
SELECT @CatalogFileExists
–DECLARE @CatalogFileExists bit
if object_id(‘tempdb..#t’) is not null drop table #t
BEGIN
       –DECLARE @CatalogFile nvarchar(1024)
       SELECT @CatalogFile = N’D:\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\SSISDB.ldf’
       CREATE TABLE #t (file_exists int, is_directory int, parent_directory_exists int)
       INSERT #t EXEC xp_fileexist @CatalogFile
       SELECT TOP 1 @CatalogFileExists=file_exists from #t
       DROP TABLE #t
END
SELECT @CatalogFileExists
exec sp_executesql N’RESTORE FILELISTONLY FROM DISK = @backupfile’,N’@backupfile nvarchar(67)’,@backupfile=N’C:\Program Files\Microsoft SQL Server\110\DTS\Binn\SSISDBBackup.bak’
exec sp_executesql N’RESTORE DATABASE @databaseName FROM DISK = @backupFile  WITH REPLACE, MOVE @dataName TO @dataFilePath, MOVE @logName TO @logFilePath’,N’@databaseName nvarchar(6),@dataName nvarchar(4),@dataFilePath nvarchar(65),@logName nvarchar(3),@logFilePath nvarchar(65),@backupFile nvarchar(67)’,@databaseName=N’SSISDB’,@dataName=N’data’,@dataFilePath=N’D:\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\SSISDB.mdf’,@logName=N’log’,@logFilePath=N’D:\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\SSISDB.ldf’,@backupFile=N’C:\Program Files\Microsoft SQL Server\110\DTS\Binn\SSISDBBackup.bak’
GO
————————————————————————-

USE master; 

IF EXISTS (SELECT [name] FROM sys.databases WHERE [name]=’SSISDB’ AND [is_read_only] = 1)  ALTER DATABASE [SSISDB] SET READ_WRITE WITH ROLLBACK IMMEDIATE

USE [SSISDB];

IF EXISTS (SELECT [name] FROM sys.symmetric_keys WHERE [name] = ‘##MS_DatabaseMasterKey##’)
    DROP MASTER KEY

exec sp_executesql N’USE [SSISDB];DECLARE @pwd nvarchar(4000) = REPLACE(@password, N””””, N””””””);EXEC(”CREATE MASTER KEY ENCRYPTION BY PASSWORD = ””” + @pwd + ””””);’,N’@password nvarchar(10)’,@password=N’Password@1′
GO
——————————————————————————-

USE master

   IF NOT EXISTS(SELECT * FROM sys.asymmetric_keys WHERE name = ‘MS_SQLEnableSystemAssemblyLoadingKey’)
        CREATE ASYMMETRIC KEY MS_SQLEnableSystemAssemblyLoadingKey FROM EXECUTABLE FILE = ‘C:\Program Files\Microsoft SQL Server\110\DTS\Binn\Microsoft.SqlServer.IntegrationServices.Server.dll’

   IF EXISTS(SELECT [name] FROM sys.server_principals where name = ‘##MS_SQLEnableSystemAssemblyLoadingUser##’)
        DROP LOGIN ##MS_SQLEnableSystemAssemblyLoadingUser##

  CREATE LOGIN ##MS_SQLEnableSystemAssemblyLoadingUser## FROM ASYMMETRIC KEY MS_SQLEnableSystemAssemblyLoadingKey
    GRANT UNSAFE ASSEMBLY TO ##MS_SQLEnableSystemAssemblyLoadingUser##

  IF EXISTS(SELECT name FROM sys.procedures WHERE name=N’sp_ssis_startup’)
    BEGIN
       EXEC sp_procoption N’sp_ssis_startup’,’startup’,’off’
       DROP PROCEDURE [sp_ssis_startup]
    END
GO

   CREATE PROCEDURE [dbo].[sp_ssis_startup]
    AS
    SET NOCOUNT ON
        /* Currently, the IS Store name is ‘SSISDB’ */
        IF DB_ID(‘SSISDB’) IS NULL
            RETURN
        IF NOT EXISTS(SELECT name FROM [SSISDB].sys.procedures WHERE name=N’startup’)
            RETURN
        /*Invoke the procedure in SSISDB  */
        EXEC [SSISDB].[catalog].[startup]

   IF (0=1)
    BEGIN
        /* Run sp_ssis_startup when Sql Server restarts */
        EXEC sp_procoption N’sp_ssis_startup’,’startup’,’on’
    END
GO
——————————————————————————–

–USE msdb

–IF EXISTS (SELECT name FROM sysjobs WHERE name = N’SSIS Server Maintenance Job’)
–EXEC sp_delete_job
—    @job_name = N’SSIS Server Maintenance Job’ ;

–IF EXISTS(SELECT * FROM sys.server_principals where name = ‘##MS_SSISServerCleanupJobLogin##’)
—    DROP LOGIN ##MS_SSISServerCleanupJobLogin##

–DECLARE @loginPassword nvarchar(256)
–set @loginPassword = ‘Password@1’

–SELECT @loginPassword = REPLACE (CONVERT( nvarchar(256), CRYPT_GEN_RANDOM( 64 )), N””, N”””)
–EXEC (‘CREATE LOGIN ##MS_SSISServerCleanupJobLogin## WITH PASSWORD =”’ +@loginPassword + ”’, CHECK_POLICY = ON’)

 

There, at the end of the script, is the cause of the consternation. So it can run everything successfully except for that account. I commented out those lines of code and reran the script. Success!!! AWESOME-SAUCE! One catalog ready to do some cool stuff. Note that you could also use this if you just want to create the catalog with t-sql. The last step to do is create the job manually to call those procedures and prevent SSISDB from growing too large.

SIDE NOTE: the main password for the Catalog is on line 161. Make sure you change this if you are using this to create your catalog.

USE [msdb]
GO
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0

IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]’ AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N’JOB’, @type=N’LOCAL’, @name=N'[Uncategorized (Local)]’
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

END

DECLARE @jobId BINARY(16)
EXEC @ReturnCode =  msdb.dbo.sp_add_job @job_name=N’SSIS Server Maintenance Job’,
        @enabled=1,
        @notify_level_eventlog=2,
        @notify_level_email=0,
        @notify_level_netsend=0,
        @notify_level_page=0,
        @delete_level=0,
        @description=N’Runs every day. The job removes operation records from the database that are outside the retention window and maintains a maximum number of versions per project.’,
        @category_name=N'[Uncategorized (Local)]’,
        @owner_login_name=N’DOMAIN\USERNAME‘, @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [SSIS Server Operation Records Maintenance]    Script Date: 2/27/2015 8:17:49 AM ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N’SSIS Server Operation Records Maintenance’,
        @step_id=1,
        @cmdexec_success_code=0,
        @on_success_action=3,
        @on_success_step_id=0,
        @on_fail_action=2,
        @on_fail_step_id=0,
        @retry_attempts=3,
        @retry_interval=3,
        @os_run_priority=0, @subsystem=N’TSQL’,
        @command=N’EXEC [internal].[cleanup_server_retention_window]’,
        @database_name=N’SSISDB’,
        @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N’SSIS Server Max Version Per Project Maintenance’,
        @step_id=2,
        @cmdexec_success_code=0,
        @on_success_action=1,
        @on_success_step_id=0,
        @on_fail_action=2,
        @on_fail_step_id=0,
        @retry_attempts=3,
        @retry_interval=3,
        @os_run_priority=0, @subsystem=N’TSQL’,
        @command=N’EXEC [internal].[cleanup_server_project_version]’,
        @database_name=N’SSISDB’,
        @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N’SSISDB Scheduler’,
        @enabled=1,
        @freq_type=4,
        @freq_interval=1,
        @freq_subday_type=1,
        @freq_subday_interval=0,
        @freq_relative_interval=0,
        @freq_recurrence_factor=0,
        @active_start_date=20001231,
        @active_end_date=99991231,
        @active_start_time=0,
        @active_end_time=120000,
        @schedule_uid=N’82cb4d40-977f-480b-9c59-bf7509e16740′
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)’
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:

GO

And there you have it. Creating the SSIS Catalog via t-sql.

 

~Chris

Dynamicallly writing those pesky Update Statements

Whenever you are loading a data warehouse, often one of the most time consuming tasks is writing the update statements that you need in order to properly handle changing records coming from your system. Further adding to development time, columns are then split into type 1 or type 2 attributes, and in my current shop, processes for deletes out of the source system as well. After you write a few of these, that loving feeling you got when you started starts to dissipate, and you start to feel bored with the same statements. So with that in mind, I wrote the below procedure that takes a table name and an update_type variable of U, V, or D and then generates the appropriate update, type 2 version, or delete statement you need. The procedure makes 2 assumptions, one in that you have a separate etl_working database to handle set based updates, and the second that your table primary keys contain of column named “version_key” as a part of the naming convention for your tables primary key. If this isn’t your naming convention, feel free to change it.

UPDATE: 01/21/2015: found a bug in the version code, fixed. Sorry about that!

/*
Author: Christopher Schmidt
Date: 2015-01-19
Description: taking the table name and an update type of U, V, or D as inputs, generates an update statement against the target table. This proc makes an assumption that
there is a separate database titled “etl_working”, that consists of a mirror copy of the target table to handle updates in a set based manner. This can be adjusted in the
code below on lines 55,58,65, and 67.

It also assumes that your primary_key of the target table contains the words
“version_key”. if you do not use this naming convention, you can update them in the where clauses of the column selects on lines 34,40,46, and 52.

As always, with any code you find on the internet, test and use in development.

*/

CREATE PROCEDURE [dbo].[update_generator](
@tbl varchar(200)
,@update_type char(1)
)
AS

DECLARE @sqlu varchar(5000)
DECLARE @sqld varchar(5000)
DECLARE @sqlv varchar(5000)
DECLARE @cols varchar(4000)
DECLARE @colsi varchar(4000)
DECLARE @colsv varchar(4000)
DECLARE @keycol varchar(200)
DECLARE @keycold varchar(200)

set @cols = stuff(( select ‘,’+quotename(C.name) + ‘=Src.’ + quotename(c.name)
from sys.columns as C
where (C.object_id = object_id(@tbl)
AND c.name NOT LIKE ‘%version_key’)
for xml path(”)), 1, 1, ”)

SET @colsi = stuff(( select ‘,’+quotename(C.name)
from sys.columns as C
where (C.object_id = object_id(@tbl)
AND c.name NOT LIKE ‘%version_key’)
for xml path(”)), 1, 1, ”)

SET @keycol = stuff(( select ‘,’+quotename(C.name) + ‘=Src.’ + quotename(c.name)
from sys.columns as C
where (C.object_id = object_id(@tbl)
AND c.name LIKE ‘%version_key’)
for xml path(”)), 1, 1, ”)

SET @keycold = stuff(( select ‘,src.’+quotename(C.name) + ‘= dw.’ + quotename(c.name)
from sys.columns as C
where (C.object_id = object_id(@tbl)
AND c.name LIKE ‘%version_key’)
for xml path(”)), 1, 1, ”)

SET @sqlu = N’UPDATE [‘ + @tbl + ‘] SET ‘ + @cols + ‘ FROM change.[‘ + @tbl + ‘] [src] JOIN dbo.[‘ + @tbl + ‘] [dst] ON dst.’ + @keycol + ‘ WHERE Update_Type = ”U”’

SET @sqld = N’UPDATE [‘ + @tbl + ‘] SET dmproc_batch_key=src.dmproc_batch_key, source_revision_action = ”D”, version_end_dtmx = GETDATE(), version_current_ind = 0
FROM change.[‘ + @tbl + ‘] [src] JOIN dbo.[‘ + @tbl + ‘] [dst] ON dst.’ + @keycol + ‘ WHERE Update_Type = ”D”’

SET @sqlv = N’UPDATE ‘ + @tbl + ‘ SET version_current_ind = 0
,version_latest_ind = 0
,dmproc_batch_key=src.dmproc_batch_key
,version_end_dtmx=GETDATE()
,source_revision_end_dtmx = GETDATE()
,source_revision_action = ”V” FROM change.[‘ + @tbl + ‘] [src] JOIN dbo.[‘ + @tbl + ‘] [dst] ON dst.’ + @keycol + ‘ WHERE Update_Type = ”V”
GO
INSERT INTO dbo.[‘ + @tbl + ‘] ( ‘ + @colsi + ‘) SELECT ‘ + @colsi + ‘FROM change.[‘ + @tbl + ‘] WHERE Update_Type = ”V”’

SELECT
CASE
WHEN @update_type = ‘U’ THEN @sqlu
WHEN @update_type = ‘D’ THEN @sqld
WHEN @update_type = ‘V’ THEN @sqlv
END [sql]

 

 

Hope it saves you some time!

SSIS: Dynamically setting the Default Buffer Max Rows property

I’ve blogged in the past about the Default Buffer Max Rows property in SSIS, and how powerful it can be. As a recap, there are 2 properties in SSIS that control the amount of data that your ETL package will hold in any given buffer: The DefaultBufferMaxRows property, and the DefaultBufferSize. When performance tuning SSIS packages, we can modify either one of these to adjust how much data SSIS can carry through a buffer at once.  By default, this is 10,000 rows or 10 MB. One thing I didn’t like about my “CalculateDefaultBufferSize” spreadsheet is that it made an assumption that you were going to change your DefaultBufferSize to 100MB. So this is 100 MB per component that is running in our data flow, which can become extremely costly. If you leave it at the default however, you’re  still using 10 MB per component no matter what. If we have a really skinny data set we’re importing (such as a fact table or a hub), it’s very possible we could be leaving a lot of empty space in the data flow if we stop every 10,000 rows. Conversely,  if we have a really wide dataset that we can only fit 6,000 or 7,000 rows in a buffer, we’re having to make extra hops to move that same data. It’s always a balancing act. I recently decided to see if I could automate the setting of this property in order to maximize the efficiency of my SSIS packages without needing to manually calculate it out every time.

To start, we need to find our destination table and identify the maximum row length in that table. Fortunately, this can be obtained with a simple query (using DimCurrency in the AdventureWorksDW2012):

SELECT
    SUM (max_length)  [row_length]
FROM  sys.tables t
JOIN sys.columns c
    ON t.object_id=c.object_id
JOIN sys.schemas s
    ON t.schema_id=s.schema_id
WHERE t.name = ‘DimCurrency’
    AND s.name = ‘dbo’

You could easily turn this into a procedure to take the tablename and schema as a parameter, and return it. For this post, I’m going to leave it as a query. Once we have this, it becomes a simple exercise in math. The table above returns a max row length of 118 (in bytes), so we take this value and divide it into 1our default value (which fortunately is stored in SSIS in bytes as well – as 10485760). So our calculation becomes:

10485760/118

giving us a result of 88,862 rows. We’ll round down to the closest 100, and that will be our default buffer max rows property. Now how do we make this dynamic? To start, we open up our SSIS package and add an Execute SQL task into the Control Flow. A good item to improve upon this would be take this and put in a Pre-Execute SQL task, and write the value to an information table. Then you can log it and do some reporting around what your buffers are. As an added bonus with this dynamic structure: as your data scales, your ETL package will too!

image

Next we want to create 3 variables: str_table_name, str_schema_name, and int_row_size. str_table_name and str_schema_name should be strings, and int_row_size should be an Int32. For the int_row_size variable, set the Value to 1. This is going to avoid an error in our expression later. This way we can parameterize our inputs for our destination tables and the corresponding values, and create a placeholder for our incoming result.  If your package names are the same as the destination tables then instead of having to assign this value manually you could just configure it from an expression, using the @[System::PackageName] global variable:

image

Let’s open up our Execute SQL Task and configure it. The connection will point to our destination database, and the SQL query will be as we’ve outlined above, subbing parameters for the table and schema name:

image

Do the parameter mapping on the 2nd tab to parameterize your table and schema with the variables you created above, and set your result set to single row. The single row result set should map to your int_row_size variable. Now for the fun part. Create a data flow task, and right click on it and go to properties.

image

Under expressions, select DefaultBufferMaxRows and expand the expression ellipses. For the expression, use the following logic:

@[User::int_row_size]==1? 10000: 10485760/ @[User::int_row_size]

The initial test of the value is to ensure that if the sql fails to get the proper row size, your buffers remain at the default. And that’s it! Now you can dynamically set your default buffer max rows for ssis!

Now that’s some awesome-sauce.

Chris

BIML: Columbus SQL Server User Group

I am pleased to announce that I’ll be presenting on BIML for the Columbus SQL Server User Group on December 11th at the Microsoft Polaris office in Westerville, OH. If you’re located in or around the central OH area, please consider joining us! Here is a link to the chapter website.

Rethink your ETL with BIML

It’s estimated that 70% of data warehousing development is in ETL (extract, transformation, and load) between the different source systems, your data warehouse, and/or your data mart. Recently a new tool has entered the marked that promised to revolutionize the way we think about ETL processes. Gone are the days where we need to manually code every component, every event handler, and painstakingly ensure that every package adheres to corporate standards. In this session, Chris and Harsh will introduce you to the world of BIML and how it can change your world. We’ll show you how to quickly create a template in BIML that will make sure all of your packages adhere to corporate standards, and then use metadata to quickly generate dozens of packages to populate your dimensions and facts. Stop spending weeks developing SSIS packages that can be built in hours! You won’t want to miss this session that will get your boss to say “I don’t think we’re paying you enough!”

~Chris

SSIS: Experimenting with the target cache lookup

As a part of the project I’m currently working on, the development team is very enamored with the cache transform in SSIS. I consider myself pretty experienced with SSIS, and I’ve always stayed away components that come anywhere close to touching disk with SSIS, as I believe it slows down package performance. The cache transform component takes a query that you want (what you’re looking up) and saves it into a .caw file for SSIS to use in other locations. This is an alternative to the in memory record set destination as the result of transforms as well. As a use case, let’s say you have a customer dimension that every single fact table you load wants to reference. In the Kimball dimensional world, this would be your conformed dimensions. Do you really want to do the lookup on every package to that customer table, or would you rather save the customer records and have SSIS be able to retrieve it directly from the caw file? After much debate in the office, being the ever curious type I am, I decided to test the performance on this cache transform instead of doing the lookup on every package? Is there a point of diminishing return?

Looking at the lookup component, we have two choices on the first page when we open the command:

a) OLE DB connection manager

b) cache connection manager

Most of the time, we just leave it on the OLE DB connection manager, go to the next page, and input our command. However, there is also the cache connection manager we can use to connect. This tells SSIS to go find a cache file (what ssis internally calls a .caw file) and to use that instead of re-caching the records with every package that SSIS needs. For my example, I’ve created 3 tables: a customer dimension table with 1 million customers, a sales transaction table with 10 million sales in it, and an activity table which is tracking customer login activity to my website with another 10 million records.

The customer dimension is loaded first, and the fact tables are then generated. Upon final completion, I’ve created a master package to create parallelism:

image

Both of my fact table packages are loaded with an OLEDB Connection manager, my control flow is just a straight forward data flow task, and the data flow looks like this:

image

 

This is what we’d consider to be the “classic” approach for fact table loading. We’re pulling in the data from the source, doing the lookup to the customer dimension, and then replacing any null values farther down stream. Looking at the performance measures, the package runs these 10 million rows in 1 minute 32 seconds, which is pretty good. The lookup component in particular I’m interested in. It’s cached 1,048,540 rows and processed it in 0.391 seconds, using 37747440 bytes of memory (35.99 MB). That’s not bad. One thing to make sure of note here is that we are only pulling the lookup columns in that we need, not anything extra. This is very important for space, especially when we consider the memory SSIS is trying to use here.

Ok, so that’s our classic approach. Now let’s try using this cache transform.

On the control flow for the customer package, I’m going to add another data flow, and in it I’m going to take a source component, which I’m going to fill the query that was doing the lookup from the fact table load from before. This is just SELECT a,b FROM Customer in this case. Directly underneath it I’m going to add a cache transform, and configure a new cache connection by going to New…, and checking the Use File cache option. We’ll give the file a name, and click on columns to do the mapping. We also want to add an index on the data, so we’ll put the index position for customer_id to 1, as that’s what we’re going to be joining on later. Once this is complete my control flow will look like the below:

image

And my data flow for the Build cache is just a simple source and cache destination:

image

Alright, so now we go back to our data flows for sales and activity, and change the “LKP Customers” lookup component to a cache connection type instead of the OLE DB connection type. If you miss it, it’s on the first page under General –> Connection Type. On the connection page, select the location where you stored the customer cache. Click on columns, and make sure we’re mapping from CustomerNumber to customer_id. This should return our customer key.

So our new look package doesn’t really look any different from the data flow level. But internally, we’ve told SSIS “don’t go cache those customers every time we need to do this lookup, I’ve put it in a cache file for you. It’s on disk at this location.” Each package that runs against the lookup still caches the data in memory, it just uses this cache file you’ve built instead of pointing to the database.

Running each package individually, and then a final run of the master package results in the following table of results:

ALL TIMES IN SECONDS Customer Sales Activity Master Package
With Cache 6.7 94.3 87.5 137.2
Without Cache 3.4 92.8 85.3 141.4

It’s an interesting observation to me that overall using the cache transform doesn’t appear to give us any performance improvement gain. When running packages individually, it seems to add a few seconds to each run in order to be able to retrieve the proper results from disk. When running packages in parallel, it appears that you gain a few seconds of run time of the actual lookup packages, but do/may? lose that time again in the building of the cache. This leads me to the conclusion that it’s better to stick with the typical design pattern we see and pointing the lookup to your actual table. The issue I believe people would be tempted to use this in the first place is to prevent the occasional blocking while lookup queries are running. This is the third approach I’ve seen to solving this issue (the first being using NOLOCK as a query hint & the second being using a SNAPSHOT ISOLATION level). Use NOLOCK at your own risk. I think my personal preference after this is going to be continuing to use the SNAPSHOT ISOLATION level query hint in my query lookups. It just doesn’t seem that you’re really gaining anything with the additional development you need to do. An additional risk area that you have with building the cache files is that you now have potentially sensitive data stored in a cache file on disk. It opens up an entirely new area that you have to secure, protect, etc.

 

So my results after everything are essentially inconclusive. I’m interested in hearing additional real-world scenarios where you the readers can think of use cases for this component?

Microsoft Azure and Optimized VM images

One of the features I love about Microsoft Azure and using it a POC is that many of the initial upfront “battles” you have to face in an on-prem solution are already done for you. I’m going to be honest here: I really don’t like having to explain why databases need to be on separate disks, with log files broken out and tempdb on its own for the 200th time. It’s much easier to just go to Azure, create your machine and start loading it up with data. It removes much of the complexity and time delays from trying to stand it up yourself.  These images have existed in Azure for awhile, but they recently got a much needed revamp and are truly best practice configured. A high-level of some of the great benefits of this machine are:

Instant File Initialization Enabled

15 total disks – 12 data disks in one data pool, 3 log disks in another separate log pool

data_warehouse only: Recovery model set to “SIMPLE” for MODEL database using ALTER DATABASE

 

So, how do take advantage of this awesome sauce you ask? Don’t worry, I hear you. The rest of today’s post is going to be all about the initial configuration and stand up of one of these VM’s.

Let’s start by logging in to the Azure portal, and selecting VIrtual Machines from the category selection on the left hand side, and selecting “Create a virtual machine”

image

On the New dialogue page, select “From Gallery”

image

Then scroll down to the image you want to use. Here we are going to use the SQL Server 2014 enterprise optimized for data warehouse workloads image.

image

Then on the next page, we’ll configure the box by giving it a name, selecting the instance size, and creating a new user.

image

This next page is very important, as it’s where we’re going to configure external connectivity to the machine. Select “create a new cloud service”, and give it a name. Select the region, and your storage account. For the availability set, select whether you’d like to configure this VM to be a part of it. If you’re going to have more than one machine (which typically in a data warehouse we ALWAYS do), this is a very important property. Essentially, Availability sets guarantee redundancy in the event of a failure (think of that 99.95% uptime you keep hearing about in Azure marketing). The only way to guarantee this is to use an availability set. Finally, down at the bottom are you endpoints. This is the connections that can be made into the machine. By default, it’s RDP and Powershell. I’m going to want to use my local SQL Server Management Studio to connect, so I’m going to add MSSQL to the endpoints list by selecting it from the dropdown in the list.

image

On the last page, select whether you’d like to install any additional configurations on the machine. I’m going to leave mine on the default of just the VM agent, and select Finish.

image

Once you’re done, the VM will go begin it’s provisioning process:

image

Depending on the size and features you selected, this may take some time. Once it’s done we need to finish our setup so we can connect to our new awesome data warehouse from other SSMS instances. Remote to the machine by clicking Connect, and logging it with the username/password combo you provided earlier. Once it’s open, we need to open the firewall to allow the connections between our machine and the Azure environment. Under start, opens windows firewall (wf.msc) and create a new inbound rule. It will be Port, and the port is the port in the endpoint you specified in the setup (1433 in this case).

image

Select Next, and then allow the connection. CLick Next again and select if it should apply to only domain, private, or public. Ideally, yours should be in a domain in your environment. For this post, I’m just going to select Public and click Next. Public opens it up to the internet though, so you want to be really careful with this setting. Click Next, give it a name and description, and click Finish. Now, while still remoted into the box, open up SQL Server Management Studio and connect to the instance. Go to properties:

image

and select the security tab, Change the server authentication mode to SQL Server and Windows Authentication instead of just Windows auth, and click Ok. Right click on the server again and select Restart to restart the SQL Server service on the machine. Expand Security, and create a new SQL login for a user, Now, back on your local machine, open up management studio. On the connection string, pass in the DNS name of your new Azure VM (myserver.cloudapp.net), and select SQL User Authentication. Login will be the user name/password combo you just created.

 

Awesome sauce!

SQL Server 2008 R2 SP3 and SQL Server 2008 SP4

Today Microsoft released SP3 for SQL Server 2008R2 and SP4 for SQL Server 2008. If you’re running either of these platforms, you might want to go out and grab the installers to keep your environment. As these products are both now in extended support, there will be no more CU’s released for either of these products. However, both of these versions have some important security updates. Details around SP4 for 2008 are available here and details for SP3/2008R2 are here.

Once you’ve read through them, go install the SP’s! SP4/2008 can be downloaded here and SP3/2008R2 can be downloaded here.

~Chris