Sponsorship of SQL Saturday #79, What a Great Event!

First, let me just say that I love SQL Server.  We’ve had an intimate and steamy relationship since 1999, version 7.0 that has gone on behind my wife’s back during the middle of many nights that lasted into the hours of many mornings.  I started out in the Data Warehouse Business as a developer using Oracle 7.2 and Informatica for ETL. Back in the mid 90s there simply wasn’t a comparable product from Microsoft, SQL Server was known to be for small databases and DTS was a far cry from a mature ETL tool.  Most clients we dealt with would say you can’t say SQL without SQL Sucks.  I could but anyway the point I am trying to make is no one would dare create an enterprise data warehouse with SQL Server, boy did that change quickly.   From 1999-2000 I have almost exclusively used the Microsoft SQL Server DB engine for reporting databases, operational data stores, and data warehouses.  When I was learning SQL Server, the community for support and questions was not nearly as large and easy to communicate with as today. Twitter, and too many great sites and blogs to mention here were non existent, especially in the middle of night when I learned everything from books by Kalen Delaney (BlogLinkedIn) or Rob Volk (BlogLinkedIn | Twitter) from SQLTeam.com.  I couldn’t think of a better event to sponsor to give back to the community and technology that I have grown to love over the last 12 years.

My first SQL Saturday amazed me to say the least.  Not only was it organized very well by Scott Klein (LinkedInTwitter), Herve Roggero and all the other great folks from the South Florida SQL Server Users Group (SFSSUG) (Twitter | Web) and many volunteers from the South Florida SQL community, but the speakers were phenomenal. I learned more in one day about Azure and Denali that I would have in three months reading on my own.  As a sponsor we had a table in a room and got to introduce the speakers, myself, and what TekPartners BI Solutions is all about and it was great.

Me, Luis, and Andre

I had a really good turnout for my educational lunchtime session on what Business Intelligence means, how organizations use it, and what tools in the Microsoft BI stack are used for each layer.  My good friend who is also a great BI Architect Luis Figueroa (Blog | LinkedIn | Twitter) told me I should start presenting at events in the future and am definitely thinking about it.

The speakers in our room had some very informative sessions to say the least. David Cobb (LinkedIn | Twitter) had a great session on Hyper V and SQL Server Clustering, Michael Antonovich (Blog | LinkedIn ) had a phenomenal session on Powerpivot, Sharepoint, and SSAS (All in 1 hour btw even with a power failure and restart of his laptop), Herve Roggero Azure MVP (LinkedInTwitter) had a really informative session on SQL Azure. It covered how to import data into the cloud and a tool he developed to backup and restore SQL Azure databases.  Geoff Hiten (Blog | LinkedIn | Twitter) had a very advanced session on SQL Denali Always On and was blown away by the information he covered and how far mirroring has come.  Seems like yesterday when Mirroring had just been released and we were saying I’m not trying it, you try it for DR.

All in all a great event and feel very proud that we sponsored it and gave a little back to the SQL community that has provided me a very good career and life for the past 12+ years. I am going to make it a point to get more involved in the SQL Server community going forward.

Posted in Events | Tagged , , , , , | Leave a comment

PM Tools Session – South Florida PMI-IIBA Event

I had a great time presenting “Business Intelligence with a Hint of Scrum” for the PM Tools session at the South Florida PMI – IIBA event last thursday August 11, 2011. Thank you for the great turnout and never thought it would be a standing room only session!  I will recap for everyone that wasn’t there what was covered and feel free to download the presentation Business Intelligence with a Hint of Scrum.

I began the session introducing myself and my background on Business Intelligence and Data Warehousing, a little about TekPartners (www.tekpartners.com) and how we incorporate Scrum into every Business Intelligence project we undertake.  I then gave an overview of Business Intelligence and the major technology areas that it encompasses: Dashboards, Scorecards, KPIs, Self Service BI, BI in the cloud, Operational Data Stores (ODS), Data Warehouses, and OLAP.   Then proceeded with an overview of Scrum and the most common terms that are part of every scrum project: Product Backlog, Release Backlog, Release, Sprint, Burndown Chart, Scrum Master, Features, User Stories, Product Owner, and a few others.  Introduced a tool that we have used on projects called Axosoft Ontime (www.axosoft.com) that has scrum functionality integrated into its core.  We switched gears a bit and jumped into the tool for a demo and showed the audience how powerful and easy it is to use.  Demonstrated the creation of a Project, Product Backlog, Release Backlog (Features), Sprints (Dates and Features), Burndown Charts showing Project Velocity and Estimated Time to Completion, Due Date, Features (updating them), its other email tracking features, and easy reporting on resources, workload, features completed etc. right out of the box.

I really enjoyed presenting this session and hope it was educational for everyone that attended.  Thank you for all the positive feedback I received so far I really appreciate it. Any questions about the content or on the tool, please don’t hesitate to email me at asammartino@tekpartners.com.

Posted in BI Project Management | Tagged , , , , , , , , , , | Leave a comment

I am a Project Management Professional

I recently passed the Project Management Institute’s Project Management Professional Exam and it was not easy.  When I decided to pursue this credential I did not think it would be as difficult as it turned out to be.  The Project Management Book of Knowledge (PMBOK), which is what the exam is based on for the most part is approximately 400 pages and my first pass at reading this put me to sleep literally after about 80 pages.  The PMBOK defines and goes into great detail how Projects should be done according to the Project Management Institute.  The last part of that sentence is very important because the way you have managed projects for however many years you have, will differ from PMI’s guidelines. Unfortunately they only test you on the way they view how projects should be done and you need to immediately separate yourself from your experience to a certain extent.  This can be very challenging to do and found myself arguing with myself about certain points on more than one occasion, which will do you no good to pass the exam.

With a toddler at home and running a BI practice full time, studying in my spare time was not working out well.  I decided to enroll in a training class from the Project Management Training Institute (http://www.4pmti.com), and recommend it as it was an eye opening experience to the breadth and depth of the material covered in the exam.  It was 4 days of 10 hours per day with an hour and a half of homework each night and covered the ~20 Math Formulas, 5 Process Groups, 9 Knowledge Areas, and all 42 Processes that have approximately 3-7 Inputs, and 3-7 Outputs each, which essentially make up most of the material on the exam. Not only do you have to know all of that, you have to understand it all very well. You’ll notice I said “most of the material”. This is because there are other topics covered on the exam that test your experience that are not covered in the PMBOK that make sure you are qualified.

All in all it makes me feel very proud to have this credential as I learned a tremendous amount of things that will help me manage BI projects better, and that was really the goal from the beginning.  If you embark on this journey to achieve the PMP® credential I encourage you to prepare well because the 200 question – 4 hour exam does a very good job of making sure you really know all the material.

Posted in BI Project Management | Tagged , , , , , , | Leave a comment

Expert…Really?

You ever talk to someone and they claim to be an expert in a certain technology and are clearly not, not even close?  Further they have the audacity to try to convince you that you don’t know what you are talking about when in fact you have done the particular thing they are struggling to accomplish several times because they thought they read otherwise somewhere they can’t remember.  Its highly annoying and frustrating, but it happens more than people care to share.  I’m sharing :-)

If you’ve published a book on a particular technology, you’re an expert.  If you have an MVP next to your name for a particular technology, you’re an expert.  If you have a number of years (5+ at least) of experience working with that technology and written blogs or white papers and known in the technology community for knowing your stuff, you are an expert.

BUT, if you have 1-1.5 years working with a technology that is new to you for a project, you are not an expert. It is quite amusing to claim you are even if you read MVP blogs and remember bits and pieces of things without understanding what they mean.

I would never tell anyone I am an expert with the Microsoft SQL Server RDBMS, Integrations Services (SSIS), or Analysis Services (SSAS) even though I have breathed these technologies for the last 10+ years.  I would say “I have a lot of experience with these technologies” to anyone that asked.  I can setup an active – active cluster attached to a high performing SAN that replicates to 10 subscribers using a separate distributor in my sleep.  I have written SSIS packages to perform ETL using ForEachLoops with containers and slowly changing dimension tasks and have created Analysis Services cubes with advanced calculations and dynamic partitions more than a few times but you won’t hear me say I am an expert.

Maybe its just how I am because I still learn new things with those technologies that make me say wow, didn’t know that existed and its very cool.

Please do everyone a favor and ease up on the expert claims if you are not, and come to terms with its okay to be experienced and still learning like most of us who work with these technologies.

Posted in General | Leave a comment

That Creepy Little Scope

I wanted to write this around Halloween as the title would have been more fitting, also thought about changing it to the “Grinch Who Stole the BI Project Delivery Date”, but didn’t really like it even though it is funny, or not so funny when it happens to your project.

If you have ever been on a Business Intelligence Project or any Information Technology project for that matter it is no secret that scope creep is a killer and can send any project into a tailspin.  Some of you might not agree with my next statement about who is responsible to ensure this does not happen, which is okay. I welcome any healthy debate on the subject.  The person responsible is the Project Manager.

One of the many duties of a Project Manager is to ensure that the delivery date of a BI project is met.  For this to happen a project plan must be crafted with milestones that collectively add towards meeting the delivery date with some buffer room because things happen. The following milestones are examples and should not be followed verbatim. One milestone (first) is to have a fully defined scope that contains business requirements from which technical specifications (second) can be created for the engineers to architect or design (third), build (fourth), then test and verify (fifth) a solution.  So when the VP of Marketing etc. who is paying for the project out of his budget comes to you when you’re near the third or fourth milestone and says “you know…the marketing spend per product measure we talked about two months ago, we need to add this that and the other because its not really complete how we defined it so please do that and add these additional figures” (via email).  This very situation has happened to me more than once and sometimes you are lucky to get an email and not just a phone call.  This is how it usually starts and then immediately you go to the team and ask if we can do this because the VP said it needs to be done…and so on and so forth.  The best way to handle this situation is to document the exact conversation of what is desired, get most of questions out of the way, then have a meeting with your project team and explain the situation. Have the team provide an honest assessment with some buffer room and then communicate back to the VP the detailed feedback from your team.  I am not saying he or she will be happy if you tell them the team needs an additional three weeks, but they will understand if you communicate the facts to them and you just did your job of managing the project.  Just saying yes has the potential to blowup your well oiled plan to deliver on time.  Also, be creative and explain that three weeks can be added to the plan but that will delay the delivery date, or we can circle back and address this in an additional phase after the initial delivery date.

There are also some things you can do to mitigate scope creep.  Keep the time between scope and rollout as short as possible with smaller more iterative subject area releases. The longer it takes for your team to deliver a solution, the more of a chance you are exposing your project to business changes that were valid six months ago but not today.  I have had conversations with business folks that say it all needs to be released at the same time as it is all very important.  When you discuss the possibility of the business changing if you wait six months, and that you have determined that their business measures really can be segregated into four areas that can be released separately, they will see what you mean. They want the solution earlier rather than later in most cases and in my experience it is not all (collectively) critical, rather certain pieces are more than others. Here is another area where you just managed the project successfully.

When I go into clients to assess a project that is in trouble it usually has some scope creep that happened once or multiple times.  You can throw more resources at a project to increase the time to delivery but there still needs to be some re-scope or it has the potential to be another source of scope creep. Your original plan was probably devised for your planned resources (if done correctly) and now tasks have to be assigned to the new resources to make it useful. Don’t forget it is going to take the new resources some time to get up to speed.  I hope this helps those out who manage BI Projects or IT Projects in general.

Happy Holidays as well.  I want to write another blog entry before the Holidays get here but who am I kidding this one was supposed to be done for Halloween :-)

Posted in BI Project Management | Tagged , , , , , , , , | Leave a comment

Is it a Mirror or an ODS?

You ever have one of those conversations that makes you say, huh thats a strange coincidence?  I figured I would blog this conversation I had about what the differences are between mirrored databases and an ODS.  This might seem like, come on whats this guy talking about its obvious, yep I thought the same thing until a client asked me to explain further and we got into it with a surprise.

So lets define each:

  • Mirror – Exact duplicate copy of your transactional (OLTP) source databases. Not to be confused with SQL Server Mirroring technology.
  • ODS (Operational Data Store) – Referencing Ralph Kimball here, an interim area between the transactional (OLTP) source systems and the data mart or warehouse that has one or more data sources and combines subject areas to avoid duplication. The ODS is transactional data and does not store history and is traditionally more up to date, meaning the data latency from the source systems is smaller and very close to the transactional (OLTP) source databases.

My client said we have five different databases that do not use the same data.  But, they can all be tied together as one database captures transactions that feed another database for a different portion of our business, then that feeds another, and so on and so forth.  If we don’t combine the sources is it really truly an ODS or is it just a mirror?  I had a good long pause on this one and then smiled and said yes.  My client said what do you mean yes, which one is it?  I said pick one, your database systems are inherently designed in a way that if mirrored you essentially have an ODS.  The databases don’t have any duplicate data elements in different structures, and are all contained in one location (server in this case). They can be queried with the data combined easily so you have an ODS but really all it took to get you one was to build a mirror of your five databases.   I also explained that they have a rare case as usually data is duplicated in different systems with a twist on it (more or different attributes, and data), and some transformation or cleansing is usually required.

So, my point is that because you mirror your transactional (OLTP) source systems doesn’t mean you have an ODS, but in rare instances like this one you may.  Jen Stirrup has a really good blog post about SQL Server Denali and the place of the ODS and gets into a lot of ODS detail.

Posted in Database | Tagged , , , , , , , , , , | Leave a comment

How Big Is Your Data Warehouse?

I ask this question at every client that hires me to enhance their Microsoft SQL Server Business Intelligence solution. It seems like a pretty easy and straight forward question right?  You would be surprised at how many companies or BI teams within companies struggle to answer this question easily or have up to date information.  The answer usually requires a DBA to find, copy, or manually create an excel sheet containing the data file information, then use formulas to sum it all up.  There is nothing wrong with this approach but it is time consuming.

There are usually several data marts or a warehouse that is physically split up over many different databases, and it can take a few days to figure this out if there are many data files.  Another question I ask is how full are your data marts or warehouse? You would be surprised at how many companies or BI teams struggle with this information or just don’t know that it is important.  This info can help with capacity planning and performance as well.  If your data files have to grow a few times a week to meet increasing data needs you are taking a performance hit each time the data file has to grow.

To simplify these questions, I wrote a script a few years ago that make the answers easy to attain.  Disclaimer: I gathered some of the guts of the #datainfo T-SQL from a SQL Server forum a few years ago. Credit goes to the author of it as it got me thinking to create this script and his / her thought process was very cool.

I hope this script helps those out that struggle with these questions. This script can obviously be used to answer any database space question or modified to suit your needs rather easily.

--This script will gather all the database sizes and log file sizes on a SQL server rolled up.

set nocount on

create table #datainfo (

[databasename] varchar(250),

[datafile] varchar(250),

[totaldbspace(mb)] numeric(10,4),

[useddbspace(mb)] numeric(10,4),

[freedbspace(mb)] numeric(10,4))

create table #loginfo (

[databasename] varchar(150),

[totallogspace(mb)] numeric(10,4),

[logspaceused(%)] numeric(10,4),

[status] int)

insert into #loginfo

exec ('dbcc sqlperf(logspace) with no_infomsgs')

select database_id, name

into #dbs

from sys.databases

declare @dbname varchar(250)

declare @dbid int

declare @sql varchar(5000)

set @dbid = 1

while @dbid <= (select max(database_id) from #dbs)

begin

select @dbname = name

from #dbs

where database_id = @dbid

set @sql = 'use '+@dbname+'

create table #showfilestats (

fileid int,

filegroup int,

totalextents float,

usedextents float,

name varchar(150),

filename varchar(150))

create table #helpfile (

name varchar(150),

fileid int,

filename varchar(500),

filegroup varchar(100),

size varchar(150),

maxsize varchar(150),

growth varchar(150),

usage varchar(100))

insert into #helpfile

exec (''sp_helpfile'')

insert into #showfilestats

exec (''dbcc showfilestats with no_infomsgs'')

insert into #datainfo

select '''+@dbname+''',

h.name,

(totalextents*64/1024),

(usedextents*64/1024),

((totalextents - usedextents)*64/1024)

from #showfilestats s

join #helpfile h on h.fileid = s.fileid

drop table #showfilestats

drop table #helpfile'

exec (@sql)

set @dbid = @dbid + 1

end

--This can get you all the raw data if you don't want to sum
--it up and see individual files, etc.

--select *

--from #datainfo d

--join #loginfo l on l.[DatabaseName]=d.[DatabaseName]

select d.DatabaseName as DatabaseName,

SUM(d.[TotalDBSpace(MB)]) as [TotalDBSpace(MB)],

SUM(d.[UsedDBSpace(MB)]) as [UsedDBSpace(MB)],

SUM(d.[FreeDBSpace(MB)]) as [FreeDBSpace(MB)],

SUM(l.[TotalLogSpace(MB)]) as [TotalLogSpace(MB)],

SUM(cast(l.[LogSpaceUsed(%)]*l.[TotalLogSpace(MB)] as numeric(10,4))) as [UsedLogSpace(MB)],

SUM(cast(l.[TotalLogSpace(MB)] - (l.[LogSpaceUsed(%)]*(l.[TotalLogSpace(MB)]/100)) as numeric(10,4))) as [FreeLogSpace(MB)],

SUM(d.[TotalDBSpace(MB)]+l.[TotalLogSpace(MB)]) as [TotalSpaceLogAndDB(MB)]

from #datainfo d

join #loginfo l on l.[DatabaseName]=d.[DatabaseName]

group by d.DatabaseName

drop table #loginfo

drop table #datainfo

drop table #dbs

GO

The results look like:



Posted in Data Warehouse | Tagged , , , , , , , | Leave a comment

SSIS Server Architecture Depends on Package Design

A common question that my clients ask when they purchase hardware for the ETL Layer of a Business Intelligence architecture using SQL Server Integration Services (SSIS) is usually, “does this server need to be attached to SAN disk or a dedicated external disk, or will the server’s local disk suffice?”   The answer is “it depends” and I will explain why.

Disclaimer: I usually recommend a dedicated SSIS server for exclusively running ETL packages for larger environments, or for smaller environments that have the potential to become large. For throughput, recovery, etc. SAN disk or a dedicated external disk is usually a better option than local server disk. With budgets being tight in this economy it is not always an option so I won’t make this a “SAN or not” article, because “it also depends”.

I find that there are two schools of thought regarding SSIS package design in the last few years.  The first perspective is usually from the Database Engineers – DBAs who think that anything related to data should use execute SQL Tasks for the most part, utilizing the database engine with optimized transact SQL (T-SQL).  The second perspective is from the SSIS Developer that believes packages should use lookups, data flow tasks, script tasks, raw data files, etc., and execute sql tasks should be used as little as possible.

Another Disclaimer: I can relate to both perspectives as I have filled both roles on BI projects and this is a debate that I have seen at more than one client and am sure others have this discussion as well, so why not blog about it and hopefully it can help.

Drum roll please – the answer is both the Database Engineers – DBAs and SSIS Developers are correct.  From the SSIS Developers point of view everything is done and contained within the SSIS packages and local disk will suffice since most operations are done in memory so there is no need for SAN or external dedicated disk.  From the Database Engineers – DBAs point of view, the SSIS server can be configured to have the database engine present with attached SAN or external dedicated disk and use staging database(s) that run optimized T-SQL within execute SQL tasks.

The point is that it is not a clear cut question and answer and it really depends on a lot of factors. If you are the person making the architectural decisions for the SSIS server, you should understand how the packages will be designed first to answer this question correctly.

Posted in ETL | Tagged , , , , , , , | Leave a comment

Divide and Conquer Reporting (Architecturally Speaking)

This is my first blog post and surely won’t be my last. I have talked about creating a SQL BI blog for a few years and over the last few weekends I finally gathered my thoughts and documents I wrote over the years to share, so here goes the RTM version.  If there are any errors, they will be fixed in SP1 :-)

A common situation I encounter at various clients is that they perform heavy reporting out of the same transactional databases that power revenue driving sites.  This usually hinders the performance and decreases revenue generation, not a good situation.  The cost of additional hardware or additional SQL Server licenses is sometimes hard to justify, but if its possible to show that slow transactional throughput hinders revenue generation, or if the lack of timely reporting is limiting visibility for additional revenue opportunities, it might be a good place to start.

So now that you have wowed the folks with budgets that can buy servers and SQL Server licenses with this amazing information, how do you take the data and have it duplicated to your new SQL Server in real time so reporting can be performed there? real time was the requirement for these particular clients The answer is simple, transactional replication.  I am using transactional replication in this example because the last two times I provided this architectural solution to my clients, one was still using SQL 2000 and the other only needed a few tables of a very large SQL 2005 database. You can use snapshot replication, database mirroring for SQL 2005 or later, or even log shipping. It all depends on the requirements but that won’t be talked about in this example and you should research each option for your situation.

To implement SQL Server Replication you need to setup the database server that you would like to replicate as a publisher, you need a distributor this manages the transactions in terms of reading, storing and sending, and you also need a subscriber this contains your duplicated database(s) you will use for future reporting. Clients that I have provided consulting services for struggled with the next step of the replication architecture on more than one occasion; that is where to configure the distributor and subscriber. This is where it gets interesting, you may or may not need a separate server for each. As a general rule of thumb for smaller publisher database implementations small in size, small number of concurrent users, small number of transactions, I usually place the distributor and subscriber on one server separate from the publisher. For larger database implementations large in size, large number of concurrent users, large number of transactions, I usually place the distributor on its own server and the subscriber on its own server both separate from the publisher.  I’ll say it again, always separate the distributor and subscriber from the publisher as that usually avoids additional overhead on the publisher. Your original goal was to take load off the publisher with this architectural solution and there is a reason you are dividing and conquering, not adding then dividing and conquering in my opinion. You should test your scenario thoroughly before deciding on the final replication architecture and see what works best for you.

Another transactional replication recommendation I have is that after your publisher, distributor, and subscriber are all setup and you begin to add the tables as articles you would like to replicate I would ensure that you DO NOT use separate distribution agents per publication if you set up your transactional replication as per table publications as I call them.  The reason is parent child relationships. For example, a parent table is heavily transactional and the child table is lightly transactional, the child records can be delivered to the subscriber database before the parent records are delivered and reports that need to join the two for a complete data set will not be able to make the join and lead to missing results.

I feel like I could write a lot more on this topic with code examples etc. but am trying to keep it architectural in nature.  Also, there are always exceptions to the above but generally speaking in my experience this is what worked for my clients.  I hope this helps you divide and conquer your transactional database performance and reporting.

Posted in Replication | Tagged , , , , , , , | Leave a comment