So in major news for me I’ve recently accepted a senior consulting position at Pragmatic Works. I’m really excited, and can’t wait to get started as I start this new journey. As a part of this transition, my current employer has requested that I come up with a list of interview questions for them to ask my successor. I would like to take a moment to post these questions here, as I believe they cover a wide width and breadth of knowledge over different levels of SSIS and SSAS. Additionally, as I believe in giving credit where credit is due, some of these questions I pulled from other blogs, and some I pulled the answer straight out of BOL. The sources for these are all referenced at the bottom of this post.
For ease of reading, I’m going to break these up into sections:
1) In SSIS, what is the difference between the Control Flow and the Data Flow?
a. The control flow is used to control the flow of components in a set logical order and the data flow is used to control what the data is used
2) What is a buffer?
a. A buffer is simply storing the data in memory. As SSIS runs, it stores the data in memory (buffers) and performs the requested transformations as data moves through the data flow pipeline. This is also what makes SSIS so fast at what it does. By alleviating the need to store the data physically, SSIS can take huge amounts of data and move it through very quickly, as it’s all been cached
3) What’s a precedence constraint?
a. SSIS allows you to place as many as tasks you want to be placed in control flow. You can connect all these tasks using connectors called Precedence Constraints. Precedence Constraints allow you to define the logical sequence of tasks in the order they should be executed. You can also specify a condition to be evaluated before the next task in the flow is executed.
1) What are the 3 different cache modes in SSIS?
2) What are variables and what is variable scope?
a. A variable is used to store values. There are basically two types of variables, System Variable (like ErrorCode, ErrorDescription, PackageName etc) whose values you can use but cannot change and User Variable which you create, assign values and read as needed. A variable can hold a value of the data type you have chosen when you defined the variable.
b. Variables can have a different scope depending on where it was defined. For example you can have package level variables which are accessible to all the tasks in the package and there could also be container level variables which are accessible only to those tasks that are within the container.
1) What are the 3 different cache modes in SSIS?
2) What are the different SSIS log providers?
a. There are several places where you can log execution data generated by an SSIS event log:
i. SSIS log provider for Text files
ii. SSIS log provider for Windows Event Log
iii. SSIS log provider for XML files
iv. SSIS log provider for SQL Profiler
v. SSIS log provider for SQL Server, which writes the data to the msdb..sysdtslog90 or msdb..sysssislog table depending on the SQL Server version.
3) How many different configuration properties are there? What are they? What is the Microsoft recommended best practice for configuration in SSIS 2008?
a. There are 5 different ways to configure an SSIS package. They are:
i. XML Configuration File
ii. Environment Variable
iii. Registry Entry
iv. Parent Package
v. SQL Server
b. The Microsoft recommended best practice is to use a parent environment variable and point to a SQL server table for the remainder of the connection strings. This is the most dynamic/flexible solution that you can use in BIDS (SSIS) 2008.
1) In SSAS, what is the Data Source View?
a. A data source view (DSV) is an abstraction of a relational data source that becomes the basis of the cubes and dimensions you create in a multidimensional project
b. Analysis Services 2008 stores this data in a Unified Dimensional Model (UDM)
i. At a minimum, when the UDM is constructed merely as a thin layer over the data source, the advantages to the end user are a simpler, more readily understood model of the data, isolation from heterogeneous backend data
sources, and improved performance for summary type queries. In some scenarios a simple UDM like this is constructed totally automatically. With greater investment in the construction of the UDM, additional benefits accrue from the richness of metadata that the model can provide.
2) What is a friendly name?
a. A friendly name is an alias for a column/table in the Analysis Services Data Source View (UDM)
3) In a cube, what are the different kinds of actions that are available?
a. Action, Reporting Action, Drillthrough Action
i. A reporting action can be used to link to an SSRS report
ii. A drillthrough action can be used to define the drillthrough that a user sees when viewing the data
iii. A regular action can be used in 5 different ways.
1. A dataset returns a Multidimensional Expressions (MDX) statement, representing a multidimensional dataset, to be run and displayed by the client application.
2. A proprietary action returns a proprietary string that can be interpreted by client applications associated with the Application setting for this action.
3. A Rowset action returns a Multidimensional Expressions (MDX) statement, representing a tabular rowset, to be run and displayed by the client application.
4. A statement action returns a command string to be run by the client application.
5. A URL action returns a Uniform Resource Location (URL) string to be opened by the client application, typically with an Internet browser.
1) What are attribute relationships?
a. Attributes within a dimension are always related either directly or indirectly to the key attribute. When you define a dimension based on a star schema, which is where all dimension attributes are derived from the same relational table, an attribute relationship is automatically defined between the key attribute and each non-key attribute of the dimension.
2) What is the difference between an aggregation and a partition?
a. A partition limits the cube space and a an aggregation improves performance
1) In MDX, what is the ParallelPeriod function and what does it do?
a. The ParallelPeriod functions returns a member from a prior period in the same relative position as a specified member. (For example, June 2011 in the month of June 2012)
2) What is the Microsoft recommended partition size?
a. 20 Million members, or 250 MB (no, that’s not a typo J) There are some exceptions:
i. If you’re building a 1TB cube, the first rule takes precedence and you should exceed the 250 MB suggestion to keep the partition count down. In this case, sizing partitions between 500MB to 1 GB is reasonable.
ii. If records come in at the end of each day and history does not change you can get the quickest processing with daily partitions for recent data and monthly or annual partitions for historical data.
iii. If you have to reprocess a good portion of your cube on some regular basis, the size of partitions can be driven by how many can be processed in parallel keeping CPU and/or I/O usage as high as possible.
3) Is it possible to expose an analysis services cube through an application? How?
a. Part 1: Yes
b. Part 2: Use IIS and have Analysis Services use an HTTP pump to show the data out OR you can write custom .NET assemblies.
1) What is a Common Table Expression? How is it different from a subquery?
a. In general, a CTE can be used recursively, a sub-query cannot.
2) What is De-normalization?
a. De-normalization is the process of attempting to optimize the performance of a database by adding redundant data. It is sometimes necessary because current DBMSs implement the relational model poorly. A true relational DBMS would allow for a fully normalized database at the logical level, while providing physical storage of data that is tuned for high performance. De-normalization is a technique to move from higher to lower normal forms of database modeling in order to speed up database access.