screen decoration screen decoration
Data at the heart of your organisation
Trial: £free
Standard: from £450

Background: SQL-Hub: Database Merging Tool

screendecoration

Introduction

Burr IST Ltd have been working with a wide range of customers since 2001 on all sorts of reporting, integration and analysis projects using SQL-Hub, JetViewer and other products.

Over the years we have come across a lot of the same challenges and looked at many ways of solving them

These articles explore some of the things that are often discussed and the pros and cons of a number of different solutions. These discussion are intended to give an understanding of some different types of scenarios and how they can be addressed.

screendecoration

Is Data Duplication Always Bad?

Everyone knows to be wary of Data Duplication - keeping more than one copy of the same information - removing duplication is one of the key elements of 'database normalisation'.


This isn't about saving disk space but about making sure that the correct data is always retrieved - if there's more than one version of the data it's harder to be sure that either

  • a) all the copies have been updated and are the same or
  • b) we are accessing the right version

This doesn't just apply to the internal structure of a database but also to any system that is storing information - if there is more than one version, there is work to do ensure we get the right data - so it is simplest if we can have just one version of the information.

So, 'no duplication' is the simplest answer, but it is not the only answer. There are many circumstances when having more than one copy of the data is desirable, convenient, efficient or even necessary. Pretty much any integration task between different systems is going to result in some duplication; in many cases it is simply much faster to keep more than one copy, and often the 'live' data is simply not reachable all the time. This doesn't need to be a problem as long as we know for sure what each version is for. Let's take a simple case:

Server A holds a normalised database containing information about a company's sales activities (orders, invoices etc). This database is constantly being update as the company goes about its business.

So everything is fine here, the database is normalised, there is only one version of the data.

But

There is a requirement to run intensive analysis of this data which will involve long running queries on the server.

There is a justifiable concern that this could interfere with the day to day running of Server A - either due to the processor time used by the analysis or due to locking of the database. So another server is added to do the analysis work.

Server B holds a daily snapshot of the normalised database from Server A.

So now we have duplication of the data, but is this a problem? Well no, not really - everything is pretty clear. As far as live operations go Server A can continue as normal without any consideration for Server B - it makes no difference to the operation of Server A that there is a copy on Server B. As for the data on Server B this is pretty clear to - we have a copy of the data as it was at some point in the last 24 hours, this data is disposable, we can do what we want with it - no other system needs it.

So in this case we have improved performance and robustness and mitigated risk by duplicating data, without clouding the waters as to which is the 'right' data - Server A is always the master.

In its standard configuration this is exactly what SQL-Hub does, the SQL-Hub central database (like Server B above) just has a disposable copy of the data from the source databases (like multiple Server A s above) and in no way complicates the operation of those source databases (the Server A s).

More advanced schemes are also possible as long as we stick to the rule that it must always be crystal clear which version of each piece of information is the 'right' version, be that for the whole database, record by record, or field by field. The problem is always in finding a workflow that sticks to this rule, while meeting the business requirements...

next...

All content © SQL-Hub / Burr IST 2023
Site by Burr IST, design by foliozine

Burr Information Systems Technology Ltd

28 Park Avenue, West Point, Manchester, M19 2EE

Tel: +44 (0) 161 327 2934