Agile, Data Modeling, Data Warehouse, Database, Software Development

Why #Dimensional Modeling matters

I’ve recently completed a data modeling initiative on a major project. After doing this I’ve come to two major conclusions:

  1. The coverage area in Insurance is probably the most devious and twisted area of data that I have ever modeled.
  2. Dimensional Modeling should be done on every model to ensure you can simply model the domain.

Why Dimensional Model?

For the model I worked on, there are over 261 tables in the Relational Model. In the Dimensional Model, there are only 25. The process to distill a Relational Model to a Dimensional one is not easy, but the process highlighted problems in my Relational Model that I was unaware of. These issues would have been exposed later during testing or post-production, but being able to identify them in development was extremely valuable.

So what is Dimensional Modeling? although most people have been exposed to the rules of Data Normalization, not as many people have been exposed to Dimensional Modeling. I will try to explain my opinion of Dimensional Modeling, but if you are interested I would highly recommend ‘The Star Schema Reference’ by Chris Adamson. Simply put, it is the most complete and concise book ever written on Dimensional Modeling. Dimensional Modeling is usually done to create a model for a Data Warehouse. This Data Warehouse can then be used for Operational and Analytic reporting. But the process of Dimensional Modeling does not need to be limited to a Data Warehouse. The process itself has value to validate the Relational Model.

The term that is commonly used to refer to a Dimensional Model is a ‘Star Schema’ model. This is due to the fact that most Dimensional Models look like a star. (With the Facts in the centre and the Dimensions around the Fact)

Dimensional Modeling

Dimensional Modeling is the process of taking a Relational Model in some normal form and being able to distill the many tables down to the business objects that the business works with. In most cases this would result in single tables that correspond to objects like:

  • Client
  • Product
  • Sales Rep
  • Store
  • Sales Transaction

These business objects fall into two categories:

Facts – These are Facts about the business. They typically correspond to events or transactions and have metrics or measures that the business is concerned about. In many cases these fact tables are what the business wants to report on.

Dimensions – These are Dimensions to the Facts. They typically correspond to business objects that interact with Facts. These Dimensions represent how the business describes these objects and how they like to slice, dice, filter, sort, group, and order the Facts. A standard Dimension that occurs in all Dimensional Models that does not correspond to a business object is Time. Time is a dimension for almost all business events and transactions.

Initially, the act of creating these Facts and Dimensions is a large task of de-normalization. Although this sounds like a simply task, it is anything but. A Relational Model with hundreds of tables must be distilled down to a handful of tables. Many to Many relationships must be transformed into simple relationships that are easy to understand and report against. A natural key must also be defined that uniquely identifies a row in the Fact and Dimension. Although this sounds easy, you may find that some Dimensions don’t actually have a natural key. This is a clue that the Dimensional Model needs more attention.

A Dimensional Model also places the modeling of history at the forefront. A lot of attention is spent ensuring that data can accurately represent the changes that occur over time. Most of the time, a Relational Model is primarily concerned with current state.

Note: This is a gross over-simplification, if this is interesting to you I would recommend the book by Chris Adamson on all of the theory and complexity behind Dimensional Modeling. Or you can just start Googling!

Conclusion

The act of creating a Dimensional Model and creating the 25 Facts and Dimensions highlighted inconsistencies and duplication in my model. It also challenged my understanding of the data domain. It raised questions that I realized I didn’t have the answer for. It is easy for inconsistency and errors to hide in data model with 200+ tables. A data model that has 10-30 tables has nowhere to hide. It has the brutal transparency so valuable in Agile.

About Terry Bunio

Terry Bunio has worked for Protegra for 14+ years because of the professionalism, people, and culture. Terry started as a software developer and found his technical calling in Data Architecture. Terry has helped to create Enterprise Operational Data Stores and Data Warehouses for the Financial and Insurance industries. Along the way Terry discovered that he enjoys helping to build teams, grow client trust and encourage individual career growth, completing project deliverables, and helping to guide solutions. It seems that some people like to call that Project Management. As a practical Data Modeller and Project Manager, Terry is known to challenge assumptions and strive to strike the balance between the theoretical and real world approaches for both Data Modelling and Agile. Terry considers himself a born again agilist as Agile implemented according to the Lean Principles has made him once again enjoy Software Development and believe in what can be accomplished. Terry is a fan of Agile implemented according to the Lean Principles, the Green Bay Packers, Winnipeg Jets, Operational Data Stores, 4th Normal Form, and asking why

Discussion

No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: