Data Warehouse, Database, ETL, Software Development

#SAP, Breaking Data, and Re-enabling #SQLServer Database Referential Integrity Constraints #Microsoft #FTW

Many times as Data professionals we no longer have full control over the quality of data in the source systems. I am discussing SAP in my example, but I could have easily mentioned PeopleSoft, SalesForce, or a number of other purchased solutions. Usually those solutions are purchased and then we are tasked with maintaining those environments and also extracting data from those environments to be incorporated into a Business Intelligence corporate solution.

Our issue is one somewhat of our own choosing as well. We want to enforce integrity and constraints at a greater level than what was intended and specified in the purchased applications. This may be for a variety of reasons including that the business never specified it as a requirement. It may also be that the purchased application was never built to handle that level of integrity.

To be clear, this isn’t a complaint but more a reflection of reality. We as Data professionals are going to receive data that is not as consistent and complete as we as Data professionals want it to be. (I purposely did not state ‘require’ as there could be a discussion of what is truly required) So what are we to do?

The Problem

Typically we end up extracting data from these purchased applications and load them into a consolidated database. This database can be either a relational or dimensional database. We also typically need to cleanse the data we are loading so load the business can report on the data in a clear and consistent manner.

The challenge is what we do with data that we cannot load in a consistent manner. We really have two options; modify the data or reject it outright. Although there are many types of inconsistent data we may need to correct, I will limit my discussion to data that links tables together. Typically we define Referential Integrity or Foreign Keys constraints to ensure that the data to link tables are valid so that reports and queries return correct results.

Possible Solution

When we have more control over the quality of source systems, I usually see the solution embedded in the Extract, Transform, and Load (ETL) solution that extracts and loads the data into a corporate database.  This is because the data issues will be more known, of lesser frequency, and the data issues are things we can correct ourselves. In this type of solution, the Foreign Key constraints are always enabled and the  ETL solution validates all the data values before trying to insert the data in the database. Any errors that are encountered will result in the data being changed or rejected and an error written to a log file.

There are two majors issue with this approach:

1) Performance – The look-up to validate all Foreign Keys row by row can cause the process to run slower. It can eliminate a performant two step approach where some of the fields can be set in a subsequent SQL Update statement. (Depending on the column’s Nullability) It can also prevent the use of some bulk load methods in SQL Server Integration Services.

2) Availability – If major data issues are encountered, the data issues may prevent the data load from continuing and may affect the availability of the database.

Our Solution

Since we are loading data from multiple external providers, we designed a different solution.

Although we have Foreign Key constraints on the entire database, they will be disabled during the load. (and during the week) We will enable them every Sunday to validate the data loaded has not broken integrity rules. If we find we cannot re-enable any constraint, we will email the Data Team informing them of the offending constraint for investigation. If all Foreign Key constraints can be re-enabled, we will inform the Data Team of the success and disable them again.

We could also do this re-enabling nightly if we start to encountered more frequent data errors.

In this manner, we are in a better position to react to data outside of our control and load the data as quickly as possible.

Our SQL Server Solution

A couple of things to note about our SQL Server solution. Frequently I see the solution to re-enable all constraints use the sp_msforeachtable stored procedure. A sample of how to do this is listed below:

EXEC sp_msforeachtable “ALTER TABLE ? NOCHECK CONSTRAINT all”

This solution is virtually useless you can guarantee all your constraints can be re-enabled without failure. If one constraint fails, it will stop the process. Not good.

To accommodate the ability to re-enable all constraints even when errors are encountered we created our own processes to disable and re-enable our constraints using a cursor.

Here is the disable constraints SQL

DECLARE @disable_sql NVARCHAR(255)

SELECT ROW_NUMBER() OVER (ORDER BY o.[schema_id]) AS RowID,
QUOTENAME(o.name) AS CONSTRAINT_NAME,
QUOTENAME(SCHEMA_NAME(po.[schema_id])) AS FOREIGN_TABLE_SCHEMA,
QUOTENAME(po.name) AS FOREIGN_TABLE_NAME,
QUOTENAME(rccu.COLUMN_NAME) AS FOREIGN_COLUMN_NAME,
QUOTENAME(SCHEMA_NAME(ro.[schema_id])) AS PRIMARY_TABLE_SCHEMA,
QUOTENAME(ro.name) AS PRIMARY_TABLE_NAME,
QUOTENAME(rc.name) AS PRIMARY_COLUMN_NAME,
CASE fk.is_disabled
WHEN 0 THEN ‘CHECK’
ELSE ‘NOCHECK’
END AS [ENABLED]
INTO temp_disable_constraints
FROM sys.foreign_keys AS fk
INNER JOIN sys.objects AS o ON o.[object_id] = fk.[object_id]
INNER JOIN sys.objects AS po ON po.[object_id] = fk.parent_object_id
INNER JOIN sys.objects AS ro ON ro.[object_id] = fk.referenced_object_id
INNER JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE AS rccu ON rccu.CONSTRAINT_SCHEMA = SCHEMA_NAME(o.[schema_id])
AND rccu.CONSTRAINT_NAME = o.name
AND rccu.TABLE_SCHEMA = SCHEMA_NAME(po.[schema_id])
AND rccu.TABLE_NAME = po.name
INNER JOIN sys.index_columns AS ric ON ric.[object_id] = fk.referenced_object_id
AND ric.index_id = fk.key_index_id
AND ric.is_included_column = 0
INNER JOIN sys.columns AS rc ON rc.[object_id] = fk.referenced_object_id
AND rc.column_id = ric.column_id

DECLARE disable_cursor CURSOR for
SELECT ‘ALTER TABLE ‘ + FOREIGN_TABLE_SCHEMA + ‘.’ + FOREIGN_TABLE_NAME
+ ‘ ‘ + ‘ NOCHECK CONSTRAINT ‘ + CONSTRAINT_NAME
FROM temp_disable_constraints

OPEN disable_cursor
FETCH NEXT FROM disable_cursor INTO @disable_sql

WHILE @@FETCH_STATUS = 0
BEGIN

PRINT @disable_sql

EXEC sp_executesql @disable_sql
FETCH NEXT FROM disable_cursor INTO @disable_sql

END

CLOSE disable_cursor
DEALLOCATE disable_cursor
DROP TABLE temp_disable_constraints

And our re-enable constraint SQL:

DECLARE @enable_sql NVARCHAR(255)

SELECT ROW_NUMBER() OVER (ORDER BY o.[schema_id]) AS RowID,
QUOTENAME(o.name) AS CONSTRAINT_NAME,
QUOTENAME(SCHEMA_NAME(po.[schema_id])) AS FOREIGN_TABLE_SCHEMA,
QUOTENAME(po.name) AS FOREIGN_TABLE_NAME,
QUOTENAME(rccu.COLUMN_NAME) AS FOREIGN_COLUMN_NAME,
QUOTENAME(SCHEMA_NAME(ro.[schema_id])) AS PRIMARY_TABLE_SCHEMA,
QUOTENAME(ro.name) AS PRIMARY_TABLE_NAME,
QUOTENAME(rc.name) AS PRIMARY_COLUMN_NAME,
CASE fk.is_disabled
WHEN 0 THEN ‘CHECK’
ELSE ‘NOCHECK’
END AS [ENABLED]
INTO temp_enable_constraints
FROM sys.foreign_keys AS fk
INNER JOIN sys.objects AS o ON o.[object_id] = fk.[object_id]
INNER JOIN sys.objects AS po ON po.[object_id] = fk.parent_object_id
INNER JOIN sys.objects AS ro ON ro.[object_id] = fk.referenced_object_id
INNER JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE AS rccu ON rccu.CONSTRAINT_SCHEMA = SCHEMA_NAME(o.[schema_id])
AND rccu.CONSTRAINT_NAME = o.name
AND rccu.TABLE_SCHEMA = SCHEMA_NAME(po.[schema_id])
AND rccu.TABLE_NAME = po.name
INNER JOIN sys.index_columns AS ric ON ric.[object_id] = fk.referenced_object_id
AND ric.index_id = fk.key_index_id
AND ric.is_included_column = 0
INNER JOIN sys.columns AS rc ON rc.[object_id] = fk.referenced_object_id
AND rc.column_id = ric.column_id

DECLARE enable_cursor CURSOR for
SELECT ‘ALTER TABLE ‘ + FOREIGN_TABLE_SCHEMA + ‘.’ + FOREIGN_TABLE_NAME
+ ‘ ‘ + ‘ WITH CHECK CHECK CONSTRAINT ‘ + CONSTRAINT_NAME
FROM temp_enable_constraints

OPEN enable_cursor
FETCH NEXT FROM enable_cursor INTO @enable_sql

WHILE @@FETCH_STATUS = 0
BEGIN

BEGIN TRY
EXEC sp_executesql @enable_sql
END TRY

BEGIN CATCH
PRINT ‘ERROR–>’ + @enable_sql
FETCH NEXT FROM enable_cursor INTO @enable_sql
CONTINUE
END CATCH

FETCH NEXT FROM enable_cursor INTO @enable_sql

END

CLOSE enable_cursor
DEALLOCATE enable_cursor
DROP TABLE temp_enable_constraints

Conclusion

This solution has provided us the flexibility to load our data as efficiently as possible and validate our Foreign Key relationships on a recurring basis. It also minimizes the chance that our load process will stop mid-stream. Did I mentioned this is a key requirements as we are loading data into the Data Warehouse every 60 minutes? 🙂

I was initially concerned with how long it would take to re-enable the constraints, but it only takes 75 minutes to re-enable 616 Foreign Key constraints on a 1.1 Terabyte database. Thanks Microsoft!

Now that we have this process we also plan to use it on large software deployments just to ensure to major data issues were introduced with the deployment as well.

About Terry Bunio

Terry Bunio has worked for Protegra for 14+ years because of the professionalism, people, and culture. Terry started as a software developer and found his technical calling in Data Architecture. Terry has helped to create Enterprise Operational Data Stores and Data Warehouses for the Financial and Insurance industries. Along the way Terry discovered that he enjoys helping to build teams, grow client trust and encourage individual career growth, completing project deliverables, and helping to guide solutions. It seems that some people like to call that Project Management. As a practical Data Modeller and Project Manager, Terry is known to challenge assumptions and strive to strike the balance between the theoretical and real world approaches for both Data Modelling and Agile. Terry considers himself a born again agilist as Agile implemented according to the Lean Principles has made him once again enjoy Software Development and believe in what can be accomplished. Terry is a fan of Agile implemented according to the Lean Principles, the Green Bay Packers, Winnipeg Jets, Operational Data Stores, 4th Normal Form, and asking why

Discussion

No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: