Data Dictionary – Intelligent SQL

So if you’ve read my previous post (and hopefully used it :)), you’ll have a nice data dictionary you can use to query to provide to your business and other IT members to use when they have questions. Querying system DMV’s is great, but unfortunately it doesn’t have the capability to track some additional features that aren’t contained by default. (Source table, source column, description, etc.) Our answer for this lies in extended properties. For anyone who hasn’t used them before, extended properties are incredibly powerful self-created metadata. If you just want to add one extended property to a column, it’s pretty easy. Just call the sys.sp_addextendedproperty proc with the following parameters:

EXEC sys.sp_addextendedproperty
@name= ‘Name’,
@value = ‘text of what Im adding’,
@level0type=N’SCHEMA’,
@level0name = ‘SchemaName’,
@level1type=N’TABLE’,
@level1name=’TableName’,
@level2type=N’COLUMN’,
@level2name=’ColumnName’
This works great if you want to add just 1 property. However, how many of us actually get the luxury of only needing to track 1 extended property? I really wish Microsoft would make it easier to manager this across multiple extended properties (like a sp_addmultipleextendedproperties proc or something). But they don’t, so here we are. 🙂 In my shop, our requirements are to track 6 different pieces of data for a column. They are: source table, source column, column description, null value, sample values, and any associated business logic. The plan is to eventually expose the business logic as a master data management layer that they can use to custom set the rules as SSIS is pulling it in. So SSIS would pull that column from the view for the logic. With that many values, executing an addextendedproperties proc for each column 6 different times is, I feel, unrealistic, for a developer. It would be nice if we just had a row input like we do in Excel or a SQL table that we could use. 🙂 Which is where the table “DataDictionaryInserts” comes into play. Create a table (this is using our 6 different values) with the below script:

CREATE TABLE [dbo].[DataDictionaryInserts](
[RecordNumber] [int] IDENTITY(1,1) NOT NULL,
[TableName] [sysname] NOT NULL,
[ColumnName] [sysname] NULL,
[SourceTable] [sql_variant] NULL,
[SourceColumn] [sql_variant] NULL,
[ColumnDescription] [sql_variant] NULL,
[NullValue] [sql_variant] NULL,
[SampleValues] [sql_variant] NULL,
[BusinessLogic] [sql_variant] NULL
)

You’ll see that each of the 6 different values we want to track is normalized out as it’s own separate column. The only values we need to pass for addextenedproperties to be able to read it is table and column name (schema is looked up in a DMV and passed into the proc as it runs). This makes as little as is needed to actually be typed inserted, which makes us devs happy. 🙂
In our proc, we’re going to pivot these different values out to a denormalized approach and then use a cursor to loop through the result set and keep calling the proc. Put this on a nightly job, and you can see the advantages. It automatically runs at night and all of your inserts go. Normally I would advise against a cursor, but in this case our row sets are almost always really small so the performance impact is almost nonexistent. The longest I’ve ever seen it take to run is 2 seconds. We’re going to do 2 steps in our proc: the first will pivot out our inserts table, and the second will actually do the insert. The syntax for the pivot would be:

SELECT SchemaName,
TableName,
ColumnName,
Type,
Value
INTO #DataDictionaryValues
FROM
(SELECT s.Name AS ‘SchemaName’, d.TableName, d.ColumnName, d.SourceTable, d.SourceColumn, d.ColumnDescription, d.NullValue, d.SampleValues, d.BusinessLogic
FROM admin.DataDictionaryInserts d
JOIN sys.tables t ON
d.TableName=t.name
JOIN sys.schemas s ON
t.schema_id=s.schema_id
) p
UNPIVOT
(Value FOR Type IN
(SourceTable, SourceColumn, ColumnDescription, NullValue, SampleValues, BusinessLogic)
)AS unpvt;

And the syntax for our inserts would be:

DECLARE @schema VARCHAR(50);
DECLARE @table VARCHAR(50);
DECLARE @Column VARCHAR(50);
DECLARE @name VARCHAR(100);
DECLARE @value SQL_VARIANT

DECLARE c_ins CURSOR LOCAL FOR
SELECT DISTINCT SchemaName, TableName, ColumnName, Type, Value FROM #DataDictionaryValues

OPEN c_ins
FETCH NEXT FROM c_ins INTO @schema, @table, @Column, @name, @value

WHILE @@FETCH_STATUS = 0

BEGIN
IF EXISTS(SELECT * FROM sys.extended_properties ep
JOIN sys.columns c ON
c.object_id=ep.major_id
AND c.column_id=ep.minor_id
JOIN sys.tables t ON
t.object_id=c.object_id
WHERE c.name = @Column
AND t.name = @table
AND ep.name=@name)
BEGIN
EXEC sys.sp_dropextendedproperty @name = @name,
@level0type = N’SCHEMA’, — varchar(128)
@level0name = @schema, — sysname
@level1type = N’TABLE’, — varchar(128)
@level1name = @table, — sysname
@level2type = N’COLUMN’, — varchar(128)
@level2name = @Column — sysname
END

EXEC sys.sp_addextendedproperty
@name= @name,
@value = @value,
@level0type=N’SCHEMA’,
@level0name = @schema,
@level1type=N’TABLE’,
@level1name=@table,
@level2type=N’COLUMN’,
@level2name=@Column

DELETE FROM admin.DataDictionaryInserts
WHERE TableName = @table
AND ColumnName = @column

FETCH NEXT FROM c_ins INTO @schema, @table, @Column, @name, @value

END

CLOSE c_ins
DEALLOCATE c_ins

Now, we’ll also want to wrap this in a transaction, and declare a statement to drop our temp table in case it exists. So our complete proc/insert would look like:
CREATE PROCEDURE [dbo].[insert_ColumnProperty]
AS

SET XACT_ABORT ON

IF OBJECT_ID(‘tempdb..#DataDictionaryValues’) IS NOT NULL
BEGIN
DROP TABLE #DataDictionaryValues
END

BEGIN TRY
DECLARE @schema VARCHAR(50);
DECLARE @table VARCHAR(50);
DECLARE @Column VARCHAR(50);
DECLARE @name VARCHAR(100);
DECLARE @value SQL_VARIANT

DECLARE c_ins CURSOR LOCAL FOR
SELECT DISTINCT SchemaName, TableName, ColumnName, Type, Value FROM #DataDictionaryValues

OPEN c_ins
FETCH NEXT FROM c_ins INTO @schema, @table, @Column, @name, @value

WHILE @@FETCH_STATUS = 0

DELETE FROM admin.DataDictionaryInserts
WHERE TableName = @table
AND ColumnName = @column

FETCH NEXT FROM c_ins INTO @schema, @table, @Column, @name, @value

END

CLOSE c_ins
DEALLOCATE c_ins

END TRY

BEGIN CATCH
DECLARE @ErrMsg NVARCHAR(4000);
DECLARE @ErrSeverity INT;
DECLARE @ErrorState INT;

SELECT
@ErrMsg = ERROR_MESSAGE(),
@ErrSeverity = ERROR_SEVERITY(),
@ErrorState = ERROR_STATE();

RAISERROR(@ErrMsg, @ErrSeverity, @ErrorState);
END CATCH

Now we just need to add it into our view. Quick alter statement to query extended properties and join it to our other system DMV data points, and whoa-la! The semi-annoying part is we insert our extended properties in a normalized fashion, denormalize them to insert them, and then renormalize them on the other side. However a quick CTE with a pivot statements pivots our properties back out into columns.

ALTER VIEW [admin].[DataDictionary] AS

WITH CTEPivot
AS
(
SELECT
major_id,
minor_id,
[SourceTable] AS ‘SourceTable’,
[SourceColumn] AS ‘SourceColumn’,
[ColumnDescription] AS ‘ColumnDescription’,
[NullValue] AS ‘NullValue’,
[SampleValues] AS ‘SampleValues’,
[BusinessLogic] AS ‘BusinessLogic’

FROM
(SELECT
ep.major_id, ep.minor_id, ep.name,ep.value
FROM sys.extended_properties ep
WHERE minor_id <> 0) AS Query
PIVOT
(
MAX(Query.Value)
FOR
[name]
IN
([SourceTable],
[SourceColumn],
[ColumnDescription],
[NullValue],
[SampleValues],
[BusinessLogic])
)
AS pvt
)

SELECT
s.name AS ‘SchemaName’,
t.name AS ‘TableName’,
c.name AS ‘ColumnName’,
CASE
WHEN c.is_nullable = 0 THEN ‘No’
WHEN c.is_nullable = 1 THEN ‘Yes’
END AS ‘Nullable’,
st.name AS ‘DataType’,
c.max_length AS ‘Length’,
c.precision AS ‘Precision’,
c.scale AS ‘Scale’,
CASE
WHEN c.is_identity = 0 THEN ‘N’
WHEN c.Is_identity = 1 THEN ‘Y’
END AS ‘IdentityColumn’,
cp.SourceTable,
cp.SourceColumn,
cp.ColumnDescription,
cp.NullValue,
cp.SampleValues,
cp.BusinessLogic
FROM [sys].[tables] t WITH(NOLOCK)
LEFT JOIN sys.columns c WITH(NOLOCK) ON
t.object_id=c.object_id
LEFT JOIN sys.schemas s WITH(NOLOCK) ON
t.schema_id=s.schema_id
LEFT JOIN CTEPivot cp WITH(NOLOCK) ON
c.object_id=cp.major_id
AND c.column_id=cp.minor_id
LEFT JOIN [sys].[systypes] st WITH(NOLOCK) ON
c.system_type_id=st.xtype
WHERE s.NAME NOT LIKE ‘admin’

Sorry for the long post as well.

Hope it helps you in your travels. 🙂

Category: Data Dictionary

Enhancing the Data Dictionary