Background: SQL-Hub: Database Merging Tool

Introduction

Burr IST Ltd have been working with a wide range of customers since 2001 on all sorts of reporting, integration and analysis projects using SQL-Hub, JetViewer and other products.

Over the years we have come across a lot of the same challenges and looked at many ways of solving them

These articles explore some of the things that are often discussed and the pros and cons of a number of different solutions. These discussion are intended to give an understanding of some different types of scenarios and how they can be addressed.

Articles...

Managing changes in multiple databases...

It is always simplest to make sure there is no data duplication in a system, but this is not always the case.

There are many circumstances when having more than one copy of the data is desirable, convenient, efficient or even necessary. Pretty much any integration task between different systems is going to result in some duplication; in many cases it is simply much faster to keep more than one copy; and often the 'live' data is simply not reachable all the time. This doesn't need to be a problem as long as we know for sure what each version is for.

Native solutions

Where Federated Databases or near real time Replication can be used (where the database technology can take care of this for us) or where the problem can be dealt with at an application level, then this is probably the best solution.

1 A redundant copy

Database B contains a snapshot of the data on Database A (and potentially Database X,Y and Z too) used for reporting and analysis purposes only.

There is no problem here, Database A (and potentially X,Y and Z) hold the master records and any changes made on Database B are disposable, there is nothing to worry about.

This is the standard operation model for SQL-Hub

2 A data transfer

Records from Database A (and potentially Database X,Y and Z too) are transferred to Database B, once a specific record is transferred no further updates are sent to Server B about this record

This is quite a common methodology. While there is a fairly clear what the workflow is here, this is really only ideal if the data won't change on Database A after it is transferred to Database B (or the changes made on Database A after the transfer are irrelevant to Database B). It is quite possible to end up with different values for the same information on Databases A and B. But sometimes this is still a valuable excercise.

3 Ongoing data transfer

This is a case that comes up very often. Records are transfered from Database A to Database B, changes can be made - and kept - on Database B, and Database B is updated with changes made on Database A (and X, Y and Z) but there is no requirement (or it is not possible or deemed too risky) to send any changes back to Database A (or X, Y and Z).

To implement this a method is required to work out which records have been changed (see detecting changes).

Any records not changed on Database B can be updated with the data from Database A (this could be done incrementally by detecting changes on Database A and only updating the new and changed records - though sometimes detecting the changes is more work than just transferring the data). Most likely we will be left with some records that have been changed on Database B and on Database A so now we need some sort of conflict resolution rule, this might be...

3a) Keep the whole record from Database B
3b) Use the whole record from Database A
3c) Update field by field

3c i) With a defined precedence between Database A and Database B for each field
3 c ii) Using the value from the record (Database A or Database B) where this field has been changed, preferring values from Database A if the field has been updated in both records
3 c iii) Using the value from the record (Database A or Database B) where this field has been change, preferring values from Database B if the field has been updated in both records
3 c iv) Using the value from the record (Database A or Database B) where this field has been changed with a defined precedence between Database A and Database B for each field if the field has been changed in both records

There's no reason why different rules can't be applied to different sets of data/tables.

Note that as no changes are ever sent back to Database A many records are likely to remain permanently in conflict.

So in this case some useful updates from Database A are sent to Database B whilst changes are made on Database B as well, all with a level of control on the precedence of the data and without interfering with Database A. However, is is still very likely that there will differences in the data between the two databases.

While the precedence may be clear from a technical point of view it is critical that the implementation fits with the workflow of the organisation/users. In practice in most organisations there is no single fixed workflow, but still this might be the best that is achievable.

The managed version of SQL-Hub implements all of this functionality but it is not exposed on the standard product.

4 Two Way Integration

This is a solution which is often seen as very desirable, but as always we need to be very careful. Changes can be made and kept on Database B and Database B is also updated with changes made on Database A (and X, Y and Z) and changes on Database B are sent back to Database A (or X, Y and Z).

This works similarly to 3 above except that once all conflicts have been resolved updates are sent back to the source database for the records that have changed in Database B.

Note that as the changes have been sent back to Database A all the conflicts are resolved with each cycle of integration.

So in this case useful updates are sent from Database A to Database B and from B back to A, again with a level of control on the precedence of the data. There should no longer be any differences in the data between the two databases however the users might find the data changing in ways they don't expect. Again, while the precedence may be clear from a technical point of view it is critical that the implementation fits with the workflow of the organisation/users. In practice in most organisations there is no single fixed workflow, but still this might be the best that is achievable.

It's worth saying of course, that making any changes to the data in the database of a third party application is not something to be done lightly.

The managed version of SQL-Hub implements all of this functionality but it is not exposed on the standard product.

5 Partial Two Way Integration

In this case not all fields or tables are updated in the source databases (Database A,X,Y,Z) only those that are specified. This will reduce the impact on the source Databases (which might mitigate some risk/confusion). However there may well remain permanent conflicts between the databases.

The managed version of SQL-Hub implements all of this functionality but it is not exposed on the standard product.

Conclusions

While there are many other approaches to these sorts of problems those above are ones that we see most often. None of them is perfect and in all cases it is absolutely essential to understand the organisation's workflow as well as the technical aspects - and also that the users understand the workflow that has been implemented in any solution. These scenarios are not just a technical problem but an organisational one too.

next...

Background: SQL-Hub: Database Merging Tool

Introduction

Articles...

Managing changes in multiple databases...

It is always simplest to make sure there is no data duplication in a system, but this is not always the case.

Native solutions

Other Solutions

1 A redundant copy

2 A data transfer

3 Ongoing data transfer

4 Two Way Integration

5 Partial Two Way Integration

Conclusions