SQL Server 2016 CTP 2 Preview Now Available

Back in the beginning of May, Microsoft announced SQL Server 2016 and the massive amount of new features that are going to be released. Combining this with all of the Power BI enhancements and Azure enhancements that are being released, it is a great time to be a Microsoft BI professional. This afternoon the SQL Server team has now released the first publicly available preview for SQL Server 2016, as referenced in this blog post! The download is available here.

If you’ve forgotten all of the amazing new features that are scheduled for release with this version, be sure to check out the data sheet.

What are you waiting for? Download your copy today!

Advertisements

SQL Server 2016 is coming

Yesterday at the keynote for the Microsoft Ignite conference Satya Nadella announced the next version of SQL Server, which will be available next year and in public preview later this summer. Rather than hitting all the marketing jumbo that’s been repeated elsewhere, I’d like to focus on the three biggest enhancements for BI and why you should already start thinking about upgrading to the next version.

R Integration directly into SQL Server

Much like we’ve had the ability to write DMX queries against SSAS Multidimensional models, having the ability to write R code directly into SQL Server is going to open SQL Server up to a much broader user base, and empower more people to analyze their data in different ways.

Enhanced SSIS

A feature I myself overlooked the first time I read the data sheet yesterday, but the ability to select the specific version of SSIS you want to build to and support for Power Query is going to be some serious awesome-sauce. Chris Webb (b|t) is showing us some awesome stuff that M can do, and giving SSIS the ability to capitalize will make for some very efficient ETL, especially when it’s combined with the ability to pull data from cloud sources via Azure Data Factory. The ability to select what version of SSIS you want to build for is a feature that’s long overdue, and as a consultant I couldn’t be happier.

Polybase into SQL Server

This feature alone justifies the upgrade. As a consultant, I hear a lot of companies talking about Hadoop and what to do with their unstructured data. Should we invest in Hadoop? What are the big barriers to entry? One of the biggest has been cost, especially since to date Polybase has only been available in APS (Analytic Platform System), but by putting Polybase into SQL Server directly Microsoft is really going to be able to step ahead of the entire market. Imagine being able to run this query:

SELECT SUM(salesAmount), h.CustomerInfo

FROM MyFactTable

JOIN HDFS.dbo.Customers

No more heavy loading of needing to write MapReduce jobs and complicated logic to get unstructured data and structured data together for analysis. Now, it just works. This combined with Azure SQL Data Warehouse announced last week at Build has me extremely excited about the rest of the year and 2016!

The complete data sheet is available for download here.

Installing SQL 2012

Recently I needed to upgrade my CTP edition of SQL 2012 to a full actual install. I thought since I had to uninstall the CTP to install the developer edition, now would be a good time to blog about the steps necessary to install it. Below are the steps that I took:

 

First, locate the installation media for SQL 2012. When you find the SETUP.exe file, run it. After it loads you will see a screen that looks like the below:

 

image

 

Click on the Installation tab on the left, and then select “New SQL Server Stand-Alone Installation or Add Features to an Existing Installation”

 

image

 

After you click on it a new window will appear, where some rule checks will take place. Once they finish successfully click OK and you’ll proceed to the next screen:

image

A few more minutes will pass and then you will get a new screen where you enter your product key. Enter your key or specify a free edition to install if you don’t have a key. Click Next, Accept the license terms and select if you want to send usage data to Microsoft. Let it run the update wizard and select to install the updates. The install setup screen will run:

 

image

Once it runs and extracts (this could take a few minutes depending on your updates), Once it finishes it will proceed to the Setup Support Rules wizard, where they will run and inform you of any issues. Assuming everything passes fine, Click Next.

image

 

Once you click Next, you’ll be on the Setup Role screen. Select the feature you want. In this instance, we want the SQL Server Feature Installation.

image

The next screen is the fun one. Select the features you want to install. In my case, this is for my regular development I do from my laptop and for quick demos when I need them. I’ve selected the features I want, and I’m ready to click next. One really nice feature in 2012 is that Master Data Services can now be included with the regular install, in 2008 R2 it was it’s own install file that you had to find in the setup files.

image

Once you click next, the Installation Rules wizard will run. Let it run and then click Next. The next page is the Instance Configuration page. Select if you want the default instance or if you want a named instance running on your machine. On my machine the default instance is already being used by 2008 R2, so I’m installing a named instance. Also please note the Instance Root Directory. This is the default installation path for SQL Server. It is a Microsoft best practice to install it on a volume OTHER than your C: drive. And yes, I know I’m breaking that very rule here due to the options on my machine.

 

image

 

Click Next, and the Disk Space Requirements wizard will run. Click Next. Configure your service accounts and click Next. It is considered a Microsoft best practice to use separate, least-privileged domain accounts to run each of the services needed. This prevents having one “super-account” that has permissions to do everything on your server. After you’re done click Next, and the Database Engine Configuration page will appear. Specify if you want just windows authentication or mixed mode, and then add a user to the administrator section. If you select mixed mode you’ll need to specify a password for the SA account. On the data directories tab create your installation paths (keeping in mind they should NOT be on the C drive).

 

Once you click Next the Analysis Services configuration screen will appear. Select if you want to install Multidimensional or Tabular mode, and the admin for the instance. On the data directories tab, change the path to something other than the C drive. If possible, this should be on a drive separate from your database engine and tempDB as well. Analysis Services works best on its own drive. Click Next.

 

image

 

On the Reporting Services Configuration page, select Install and Configure if you want it to be configured for you while its installing or if you just want to do the install and configure it yourself later via the Reporting Services Configuration tool. Click Next, and check if you’d like to send Error Reporting to Microsoft. Click Next again and allow the Installation Configuration Rules to run.

image

Click Next, check the Summary to ensure everything you want installed is there, and click Install. Depending on the features you selected, it may take a while. With the features I selected above, it took about 1/2 an hour to get everything up and installed. If all goes well, your final screen should look like:

image

2012 In Review

So today is the last day of 2012. A day that some thought would never happen. Sorry, Mayans. 🙂 This day, for me, is always one of reflection and outlook. I always take this day and think about where I’ve come in the past year, and where I want to go in the year ahead. 2012 was a really busy year for me. I switched jobs (twice), but am incredibly happy with my decision. I originally went to a private company working with payment transactions and building a BI solution for them with millions of records a day. Then in late August Pragmatic Works called me and lured me into the world of consulting. Although it was a hard decision for me and my family, I have no regrets about switching. The last 3 months have gone by incredibly fast, and it’s a new challenge every day which keeps me at the top of my game in a wide variety of BI. It’s exciting, the travel is awesome, and I get to work with people all over the world, so I really get to feel like I’m making an impact for not just 1 company, but many. In the last 3 months, I’ve been to Tampa, Boston, and Oslo, Norway. Early in 2013 I’ll be headed to Eastern TN, then potentially to Manchester, England after that. It’s hard sometimes with small kids but I think as a family we’re finally starting to actually adjust.

 

In September, I spoke at my first SQL Saturday event here in Denver. I got some really good feedback on my SSIS 2008 Configurations presentation, and I’ve submitted to present it again at a couple SQL Saturday’s that are coming up in 2013.

 

I’ve also just finished writing my goals for 2013. Even with all of the technology, the most effective way for me to track my annual goals is to put them on a piece of paper and put them on my desk. I get an enormous sense of satisfaction when I can physically use a pen and cross one of my goals off. But I’m also going to post them here, because 1) it’s another outlet for me to look at them, and 2) I’m hoping some of you readers out there can give me some support when I need it. For those who follow the EPL, Liverpool’s motto is “You’ll never walk alone” and, although I’m a United fan, that sentence prevails all walks of life, not just soccer *cough* football.

 

I really want to spend a lot of time in 2013 networking and getting to know the rest of the community better, too. Those who see me at the Denver UG know me but outside of Denver I’m not really as well known, and I really want to get a lot more involved with the community. So here they are, it’s a lofty list but I think if I really buckle down I can do it:

 

In Q1, I’m going to really try and set the tone for the year. My goals for January are to take the 70-461, 70-462, and 70-463 exams to earn my MCSA certification, speak at 1 SQL Saturday (North Haven, CT in March), deliver 1 online webinar (SSIS 2012 Reporting in January), and speak at 1 UG (here in DEN or COS, hopefully :)) I’m always going to try and buckle down on some SharePoint 2013 BI work and understand it better.

 

In Q2, it’s no rest for the weary. I want to take exams 70-466 and 70-467 to move up to MCSE BI certification, and speak at 2 SQL Saturdays (Omaha and Phoenix, both in April). Deliver another online webinar (maybe SSIS 2012 Reporting again, but I’m thinking I might do SSAS tabular or multidimensional next time), and speak at another User Group, not DEN or COS, maybe a little farther out.

 

Q3 goals entail a focus change. I really feel that I can do something that is extremely extremely difficult to achieve: SSAS Maestro. Now, I also don’t feel like I could just show up and rattle it all off the top of my head, either. So my goals for Q3 are to go to the speak at the SQL Saturday here in Denver in September, and study for the SSAS Maestro program while I’m applying for acceptance. For anyone that doesn’t know, to be an SSAS Maestro you have to be accepted to even take the test (http://www.microsoft.com/learning/en/us/sql-training.aspx)

 

Q4 I hope to be accepted to take the SSAS Maestro course and then attend. Apart from that my only other goals is to attend the PASS Summit in October. I’m already registered, and this will be my first Summit. I’m super-excited for October already! 🙂

 

Overall, I think they’re some difficult goals, especially to try and crunch them into a 1 year time span. But, hopefully, by this time next year I’ll be able to add SSAS Maestro to my skills set. Wish me luck!

Types of BI

I recently had a client ask me to explain the different types of BI in SQL Server 2012, and explain what each tool could be used for and who the ideal target audience would be. I thought this would be a great blog post, as I’ve heard this question passed around quite a bit in the community and there always seems to be some confusion. I’ve placed a quick description of each below, in order to help. A really important one that I feel gets overlooked a lot is the concept of Team BI.

There are 3 distinct types of BI: Personal, Team, and Corporate.

Personal BI is designed for the individual. This is data that can be dumped very quickly for consumption by an individual business user. This gives the user extreme flexibility into having access to the data they will know best and has the flexibility to join multiple data sources together.

Team BI is the next phase in the evolution. Team BI is a platform designed for sharing amongst individual teams and departments, but not necessarily across the entire company. Departmentalized reporting is another way to describe this. Once an individual builds a personal BI report (in Excel or PowerPivot) and wants to share it with others by loading it into a SharePoint site (such as an Excel document library or PowerPivot gallery) would be an example of Team BI.

Corporate BI drives the other 2. This is the gathering of the data, putting it in a data warehouse, building analysis services databases, and delivering the mechanism. It is the standard that drives all of the others. It is available to all users across the company, can handle huge workloads, and has a very large feature set. SSAS Multidimensional, SSAS Tabular, PerformancePoint are all examples of Corporate BI. Notice how SSAS Multidimensional and SSAS Tabular are corporate tools, and the excel front end is personal. The process becomes full circle.

 

Next I’ll break down each of the tools and what they’re capable of.

Personal BI Solutions

 

Excel

Excel is the most widely used personal analytic tool in the world. As a part of a BI implementation, Excel is an extremely powerful BI tool that can be used to connect to Analysis Services for straight Pivot table reporting regarding aspects of the business. Personal dashboards can be created in excel across multiple tabs and scheduled for refresh upon the workbook opening.In Excel 2013, there is also a Power View add in for Excel that can be used to enhance the personal BI experience.

Target Audience: Supervisors/Managers

 

Power Pivot

Power Pivot is a new tool Microsoft introduced in 2010 to handle very large datasets in memory. Power Pivot uses a compression technology called Vertipaq that compresses the data up to 90% of the original source size, making it extremely useful for loading larger workbooks and for combining multiple datasets together. Power Pivot workbooks cannot be refreshed through excel, only from within the Power Pivot window directly. This makes them personal in nature, as they can’t be refreshed once shared.

Target Audience: Analysts/Finance/Accounting

 

Team BI Solutions

Power Pivot for SharePoint

Once a Power Pivot workbook has been built and published from a personal perspective, it can be shared amongst the team. Rather than needing to email the workbook all around, uploading the document to SharePoint puts it in a controlled location where it can be shared amongst the team, and scheduled to be refreshed on a nightly basis through an automatic process. As this workbook grows in use, IT can monitor and then move to an Analysis Services cube if necessary.

Target Audience: Departments

 

Power View

Power View is a self-service reporting tool that can be used on top of a tabular database model and provide ad hoc reports for consumption. Power View reports can only be housed inside of a Power Pivot gallery within SharePoint.

Target Audience: Departments

 

Corporate BI Solutions

SSAS Tabular

Tabular SSAS databases are extremely fast and can support many different data sources, are much faster than multidimensional, less time is needed to build them, and are more flexible in terms of database design. Additionally, in situations where a measure requires the use of snapshots (i.e. quantity totals by day), tabular is better because it can avoid snapshots by making up-to-date calculations at query time

Target Audience: Supervisors/managers (delivered via Excel or Power Pivot), analysts, finance, accounting

 

SSAS Multidimensional

SSAS Multidimensional uses MDX and preaggregates the data across all available dimensions prior to browsing (at the time of processing). It also allows writeback, many to many relationships, data mining, rollups, translations, and custom drillthrough actions.

Target Audience: Supervisors/managers (delivered via Excel or Power Pivot), analysts, finance, accounting.

 

SSRS

SSRS is a corporate BI reporting tool that’s great for use of list driven reporting that is needed and can just be delivered. Reporting is 2 dimensional and list-driven, with data presented in columns and rows. Great for workflow reporting (giving someone a specific list of people to call, things to do, etc.)

Target Audience: data feeds, frontline workers (list consumers)

 

PPS

PerformancePoint Services is a dashboard/performance management interface that can be utilized within SharePoint to provide the insights needed from a company perspective without going in to too much detail. It is great for performance management, and can create rich visual displays for consumption.

Target Audience: High-level managers, C-Level executives

Sources, Credits, and just some good reference points, because I believe in promoting other awesome authors. 🙂

 

http://prologika.com/CS/blogs/blog/archive/2012/05/27/organizational-bi-vs-personal-bi.aspx

http://www.jamesserra.com/archive/2012/04/sql-server-2012-multidimensional-vs-tabular/

Business Intelligence Interview Questions

Hello all,

So in major news for me I’ve recently accepted a senior consulting position at Pragmatic Works. I’m really excited, and can’t wait to get started as I start this new journey. As a part of this transition, my current employer has requested that I come up with a list of interview questions for them to ask my successor. I would like to take a moment to post these questions here, as I believe they cover a wide width and breadth of knowledge over different levels of SSIS and SSAS. Additionally, as I believe in giving credit where credit is due, some of these questions I pulled from other blogs, and some I pulled the answer straight out of BOL. The sources for these are all referenced at the bottom of this post.

For ease of reading, I’m going to break these up into sections:

SSIS

Beginner

1) In SSIS, what is the difference between the Control Flow and the Data Flow?

a. The control flow is used to control the flow of components in a set logical order and the data flow is used to control what the data is used

2) What is a buffer?

a. A buffer is simply storing the data in memory. As SSIS runs, it stores the data in memory (buffers) and performs the requested transformations as data moves through the data flow pipeline. This is also what makes SSIS so fast at what it does. By alleviating the need to store the data physically, SSIS can take huge amounts of data and move it through very quickly, as it’s all been cached

3) What’s a precedence constraint?

a. SSIS allows you to place as many as tasks you want to be placed in control flow. You can connect all these tasks using connectors called Precedence Constraints. Precedence Constraints allow you to define the logical sequence of tasks in the order they should be executed. You can also specify a condition to be evaluated before the next task in the flow is executed.

Intermediate

1) What are the 3 different cache modes in SSIS?

a. Full

b. Partial

c. None

2) What are variables and what is variable scope?

a. A variable is used to store values. There are basically two types of variables, System Variable (like ErrorCode, ErrorDescription, PackageName etc) whose values you can use but cannot change and User Variable which you create, assign values and read as needed. A variable can hold a value of the data type you have chosen when you defined the variable.

b. Variables can have a different scope depending on where it was defined. For example you can have package level variables which are accessible to all the tasks in the package and there could also be container level variables which are accessible only to those tasks that are within the container.

Advanced

1) What are the 3 different cache modes in SSIS?

a. Full

b. Partial

c. None

2) What are the different SSIS log providers?

a. There are several places where you can log execution data generated by an SSIS event log:

i. SSIS log provider for Text files

ii. SSIS log provider for Windows Event Log

iii. SSIS log provider for XML files

iv. SSIS log provider for SQL Profiler

v. SSIS log provider for SQL Server, which writes the data to the msdb..sysdtslog90 or msdb..sysssislog table depending on the SQL Server version.

3) How many different configuration properties are there? What are they? What is the Microsoft recommended best practice for configuration in SSIS 2008?

a. There are 5 different ways to configure an SSIS package. They are:

i. XML Configuration File

ii. Environment Variable

iii. Registry Entry

iv. Parent Package

v. SQL Server

b. The Microsoft recommended best practice is to use a parent environment variable and point to a SQL server table for the remainder of the connection strings. This is the most dynamic/flexible solution that you can use in BIDS (SSIS) 2008.

SSAS

Beginner

1) In SSAS, what is the Data Source View?

a. A data source view (DSV) is an abstraction of a relational data source that becomes the basis of the cubes and dimensions you create in a multidimensional project

b. Analysis Services 2008 stores this data in a Unified Dimensional Model (UDM)

i. At a minimum, when the UDM is constructed merely as a thin layer over the data source, the advantages to the end user are a simpler, more readily understood model of the data, isolation from heterogeneous backend data

sources, and improved performance for summary type queries. In some scenarios a simple UDM like this is constructed totally automatically. With greater investment in the construction of the UDM, additional benefits accrue from the richness of metadata that the model can provide.

2) What is a friendly name?

a. A friendly name is an alias for a column/table in the Analysis Services Data Source View (UDM)

3) In a cube, what are the different kinds of actions that are available?

a. Action, Reporting Action, Drillthrough Action

i. A reporting action can be used to link to an SSRS report

ii. A drillthrough action can be used to define the drillthrough that a user sees when viewing the data

iii. A regular action can be used in 5 different ways.

1. A dataset returns a Multidimensional Expressions (MDX) statement, representing a multidimensional dataset, to be run and displayed by the client application.

2. A proprietary action returns a proprietary string that can be interpreted by client applications associated with the Application setting for this action.

3. A Rowset action returns a Multidimensional Expressions (MDX) statement, representing a tabular rowset, to be run and displayed by the client application.

4. A statement action returns a command string to be run by the client application.

5. A URL action returns a Uniform Resource Location (URL) string to be opened by the client application, typically with an Internet browser.

Intermediate

1) What are attribute relationships?

a. Attributes within a dimension are always related either directly or indirectly to the key attribute. When you define a dimension based on a star schema, which is where all dimension attributes are derived from the same relational table, an attribute relationship is automatically defined between the key attribute and each non-key attribute of the dimension.

2) What is the difference between an aggregation and a partition?

a. A partition limits the cube space and a an aggregation improves performance

Advanced

1) In MDX, what is the ParallelPeriod function and what does it do?

a. The ParallelPeriod functions returns a member from a prior period in the same relative position as a specified member. (For example, June 2011 in the month of June 2012)

2) What is the Microsoft recommended partition size?

a. 20 Million members, or 250 MB (no, that’s not a typo J) There are some exceptions:

i. If you’re building a 1TB cube, the first rule takes precedence and you should exceed the 250 MB suggestion to keep the partition count down. In this case, sizing partitions between 500MB to 1 GB is reasonable.

ii. If records come in at the end of each day and history does not change you can get the quickest processing with daily partitions for recent data and monthly or annual partitions for historical data.

iii. If you have to reprocess a good portion of your cube on some regular basis, the size of partitions can be driven by how many can be processed in parallel keeping CPU and/or I/O usage as high as possible.

3) Is it possible to expose an analysis services cube through an application? How?

a. Part 1: Yes

b. Part 2: Use IIS and have Analysis Services use an HTTP pump to show the data out OR you can write custom .NET assemblies.

T-SQL

1) What is a Common Table Expression? How is it different from a subquery?

a. In general, a CTE can be used recursively, a sub-query cannot.

2) What is De-normalization?

a. De-normalization is the process of attempting to optimize the performance of a database by adding redundant data. It is sometimes necessary because current DBMSs implement the relational model poorly. A true relational DBMS would allow for a fully normalized database at the logical level, while providing physical storage of data that is tuned for high performance. De-normalization is a technique to move from higher to lower normal forms of database modeling in order to speed up database access.

Sources:

http://www.mssqltips.com/sqlservertip/2485/ssis-interview-questions–part-1/

http://msdn.microsoft.com/en-us/library/ms345143%28v=SQL.90%29.aspx

http://blog.sqlauthority.com/2007/04/20/sql-server-interview-questions-part-6/

Enhancing the Data Dictionary

So if you’ve read my previous post (and hopefully used it :)), you’ll have a nice data dictionary you can use to query to provide to your business and other IT members to use when they have questions. Querying system DMV’s is great, but unfortunately it doesn’t have the capability to track some additional features that aren’t contained by default. (Source table, source column, description, etc.) Our answer for this lies in extended properties. For anyone who hasn’t used them before, extended properties are incredibly powerful self-created metadata. If you just want to add one extended property to a column,  it’s pretty easy. Just call the sys.sp_addextendedproperty proc with the following parameters:

EXEC sys.sp_addextendedproperty
@name= ‘Name’,
@value = ‘text of what Im adding’,
@level0type=N’SCHEMA’,
@level0name = ‘SchemaName’,
@level1type=N’TABLE’,
@level1name=’TableName’,
@level2type=N’COLUMN’,
@level2name=’ColumnName’
This works great if you want to add just 1 property. However, how many of us actually get the luxury of only needing to track 1 extended property? I really wish Microsoft would make it easier to manager this across multiple extended properties (like a sp_addmultipleextendedproperties proc or something). But they don’t, so here we are. 🙂 In my shop, our requirements are to track 6 different pieces of data for a column. They are: source table, source column, column description, null value, sample values, and any associated business logic. The plan is to eventually expose the business logic as a master data management layer that they can use to custom set the rules as SSIS is pulling it in. So SSIS would pull that column from the view for the logic. With that many values, executing an addextendedproperties proc for each column 6 different times is, I feel, unrealistic, for a developer. It would be nice if we just had a row input like we do in Excel or a SQL table that we could use. 🙂 Which is where the table “DataDictionaryInserts” comes into play. Create a table (this is using our 6 different values) with the below script:

CREATE TABLE [dbo].[DataDictionaryInserts](
[RecordNumber] [int] IDENTITY(1,1) NOT NULL,
[TableName] [sysname] NOT NULL,
[ColumnName] [sysname] NULL,
[SourceTable] [sql_variant] NULL,
[SourceColumn] [sql_variant] NULL,
[ColumnDescription] [sql_variant] NULL,
[NullValue] [sql_variant] NULL,
[SampleValues] [sql_variant] NULL,
[BusinessLogic] [sql_variant] NULL
)

You’ll see that each of the 6 different values we want to track is normalized out as it’s own separate column. The only values we need to pass for addextenedproperties to be able to read it is table and column name (schema is looked up in a DMV and passed into the proc as it runs).  This makes as little as is needed to actually be typed inserted, which makes us devs happy. 🙂
In our proc, we’re going to pivot these different values out to a denormalized approach and then use a cursor to loop through the result set and keep calling the proc. Put this on a nightly job, and you can see the advantages. It automatically runs at night and all of your inserts go. Normally I would advise against a cursor, but in this case our row sets are almost always really small so the performance impact is almost nonexistent. The longest I’ve ever seen it take to run is 2 seconds. We’re going to do 2 steps in our proc: the first will pivot out our inserts table, and the second will actually do the insert. The syntax for the pivot would be:

SELECT SchemaName,
TableName,
ColumnName,
Type,
Value
INTO #DataDictionaryValues
FROM
(SELECT s.Name AS ‘SchemaName’, d.TableName, d.ColumnName, d.SourceTable, d.SourceColumn, d.ColumnDescription, d.NullValue, d.SampleValues, d.BusinessLogic
FROM admin.DataDictionaryInserts d
JOIN sys.tables t ON
d.TableName=t.name
JOIN sys.schemas s ON
t.schema_id=s.schema_id
) p
UNPIVOT
(Value FOR Type IN
(SourceTable, SourceColumn, ColumnDescription, NullValue, SampleValues, BusinessLogic)
)AS unpvt;

 

And the syntax for our inserts would be:

DECLARE @schema VARCHAR(50);
DECLARE @table VARCHAR(50);
DECLARE @Column VARCHAR(50);
DECLARE @name VARCHAR(100);
DECLARE @value SQL_VARIANT

DECLARE c_ins CURSOR LOCAL FOR
SELECT DISTINCT SchemaName, TableName, ColumnName, Type, Value FROM #DataDictionaryValues

OPEN c_ins
FETCH NEXT FROM c_ins INTO @schema, @table, @Column, @name, @value

WHILE @@FETCH_STATUS = 0

BEGIN
IF EXISTS(SELECT * FROM sys.extended_properties ep
JOIN sys.columns c ON
c.object_id=ep.major_id
AND c.column_id=ep.minor_id
JOIN sys.tables t ON
t.object_id=c.object_id
WHERE c.name = @Column
AND t.name = @table
AND ep.name=@name)
BEGIN
EXEC sys.sp_dropextendedproperty @name = @name,
@level0type = N’SCHEMA’, — varchar(128)
@level0name = @schema, — sysname
@level1type = N’TABLE’, — varchar(128)
@level1name = @table, — sysname
@level2type = N’COLUMN’, — varchar(128)
@level2name = @Column — sysname
END

EXEC sys.sp_addextendedproperty
@name= @name,
@value = @value,
@level0type=N’SCHEMA’,
@level0name = @schema,
@level1type=N’TABLE’,
@level1name=@table,
@level2type=N’COLUMN’,
@level2name=@Column

DELETE FROM admin.DataDictionaryInserts
WHERE TableName = @table
AND ColumnName = @column

FETCH NEXT FROM c_ins INTO @schema, @table, @Column, @name, @value

END

CLOSE c_ins
DEALLOCATE c_ins

Now, we’ll also want to wrap this in a transaction, and declare a statement to drop our temp table in case it exists. So our complete proc/insert would look like:
CREATE PROCEDURE [dbo].[insert_ColumnProperty]
AS

SET XACT_ABORT ON

IF OBJECT_ID(‘tempdb..#DataDictionaryValues’) IS NOT NULL
BEGIN
DROP TABLE #DataDictionaryValues
END

SELECT SchemaName,
TableName,
ColumnName,
Type,
Value
INTO #DataDictionaryValues
FROM
(SELECT s.Name AS ‘SchemaName’, d.TableName, d.ColumnName, d.SourceTable, d.SourceColumn, d.ColumnDescription, d.NullValue, d.SampleValues, d.BusinessLogic
FROM admin.DataDictionaryInserts d
JOIN sys.tables t ON
d.TableName=t.name
JOIN sys.schemas s ON
t.schema_id=s.schema_id
) p
UNPIVOT
(Value FOR Type IN
(SourceTable, SourceColumn, ColumnDescription, NullValue, SampleValues, BusinessLogic)
)AS unpvt;

BEGIN TRY
DECLARE @schema VARCHAR(50);
DECLARE @table VARCHAR(50);
DECLARE @Column VARCHAR(50);
DECLARE @name VARCHAR(100);
DECLARE @value SQL_VARIANT

DECLARE c_ins CURSOR LOCAL FOR
SELECT DISTINCT SchemaName, TableName, ColumnName, Type, Value FROM #DataDictionaryValues

OPEN c_ins
FETCH NEXT FROM c_ins INTO @schema, @table, @Column, @name, @value

WHILE @@FETCH_STATUS = 0

BEGIN
IF EXISTS(SELECT * FROM sys.extended_properties ep
JOIN sys.columns c ON
c.object_id=ep.major_id
AND c.column_id=ep.minor_id
JOIN sys.tables t ON
t.object_id=c.object_id
WHERE c.name = @Column
AND t.name = @table
AND ep.name=@name)
BEGIN
EXEC sys.sp_dropextendedproperty @name = @name,
@level0type = N’SCHEMA’, — varchar(128)
@level0name = @schema, — sysname
@level1type = N’TABLE’, — varchar(128)
@level1name = @table, — sysname
@level2type = N’COLUMN’, — varchar(128)
@level2name = @Column — sysname
END

EXEC sys.sp_addextendedproperty
@name= @name,
@value = @value,
@level0type=N’SCHEMA’,
@level0name = @schema,
@level1type=N’TABLE’,
@level1name=@table,
@level2type=N’COLUMN’,
@level2name=@Column

DELETE FROM admin.DataDictionaryInserts
WHERE TableName = @table
AND ColumnName = @column

FETCH NEXT FROM c_ins INTO @schema, @table, @Column, @name, @value

END

CLOSE c_ins
DEALLOCATE c_ins

END TRY

BEGIN CATCH
DECLARE @ErrMsg NVARCHAR(4000);
DECLARE @ErrSeverity INT;
DECLARE @ErrorState INT;

SELECT
@ErrMsg = ERROR_MESSAGE(),
@ErrSeverity = ERROR_SEVERITY(),
@ErrorState = ERROR_STATE();

RAISERROR(@ErrMsg, @ErrSeverity, @ErrorState);
END CATCH

 

 

Now we just need to add it into our view. Quick alter statement to query extended properties and join it to our other system DMV data points, and whoa-la! The semi-annoying part is we insert our extended properties in a normalized fashion, denormalize them to insert them, and then renormalize them on the other side. However a quick CTE with a pivot statements pivots our properties back out into columns.

ALTER VIEW [admin].[DataDictionary] AS

WITH CTEPivot
AS
(
SELECT
major_id,
minor_id,
[SourceTable] AS ‘SourceTable’,
[SourceColumn] AS ‘SourceColumn’,
[ColumnDescription] AS ‘ColumnDescription’,
[NullValue] AS ‘NullValue’,
[SampleValues] AS ‘SampleValues’,
[BusinessLogic] AS ‘BusinessLogic’

FROM
(SELECT
ep.major_id, ep.minor_id, ep.name,ep.value
FROM sys.extended_properties ep
WHERE minor_id <> 0) AS Query
PIVOT
(
MAX(Query.Value)
FOR
[name]
IN
([SourceTable],
[SourceColumn],
[ColumnDescription],
[NullValue],
[SampleValues],
[BusinessLogic])
)
AS pvt
)

SELECT
s.name AS ‘SchemaName’,
t.name AS ‘TableName’,
c.name AS ‘ColumnName’,
CASE
WHEN c.is_nullable = 0 THEN ‘No’
WHEN c.is_nullable = 1 THEN ‘Yes’
END AS ‘Nullable’,
st.name AS ‘DataType’,
c.max_length AS ‘Length’,
c.precision AS ‘Precision’,
c.scale AS ‘Scale’,
CASE
WHEN c.is_identity = 0 THEN ‘N’
WHEN c.Is_identity = 1 THEN ‘Y’
END AS ‘IdentityColumn’,
cp.SourceTable,
cp.SourceColumn,
cp.ColumnDescription,
cp.NullValue,
cp.SampleValues,
cp.BusinessLogic
FROM [sys].[tables] t WITH(NOLOCK)
LEFT JOIN sys.columns c WITH(NOLOCK) ON
t.object_id=c.object_id
LEFT JOIN sys.schemas s WITH(NOLOCK) ON
t.schema_id=s.schema_id
LEFT JOIN CTEPivot cp WITH(NOLOCK) ON
c.object_id=cp.major_id
AND c.column_id=cp.minor_id
LEFT JOIN [sys].[systypes] st WITH(NOLOCK) ON
c.system_type_id=st.xtype
WHERE s.NAME NOT LIKE ‘admin’

GO

 

 

 

 

Sorry for the long post as well.

 

Hope it helps you in your travels. 🙂