Background: SQL-Hub: Database Merging Tool

Introduction

Burr IST Ltd have been working with a wide range of customers since 2001 on all sorts of reporting, integration and analysis projects using SQL-Hub, JetViewer and other products.

Over the years we have come across a lot of the same challenges and looked at many ways of solving them

These articles explore some of the things that are often discussed and the pros and cons of a number of different solutions. These discussion are intended to give an understanding of some different types of scenarios and how they can be addressed.

Articles...

Approach X2

Query the multiple sources on demand, on the fly, for each enquiry, individually, and combine the result programmatically

This will work ok for servers that are not linked.

Scenario A

For scenario A this is similar to using a UNION query only the results are combined by the code, not the database. e.g. (an extremely simple example!)

In PHP for MySQL:

  function &getMultiDatabaseData($aaListOfDatabaseParameters,$sQuery)
  {
       $aResult=array();
      foreach($aaListOfDatabaseParameters as $aDatabaseParameters)
      {     
          mysql_connect($aDatabaseParameters['Host'],$aDatabaseParameters['User'],
                                                        $aDatabaseParameters['Password']);
          mysql_select_db($aDatabaseParameters['DatabaseName']);
          $resSingleDatabaseResult=mysql_query($sQuery);    
          while($aRow=mysql_fetch_array($resSingleDatabaseResult))
          {
              $aResult[]=$aRow;
          }
      }
      return $aResult;
  }
  
  $aaListOfDatabaseParameters=array(
     array("Host"=>"localhost","DatabaseName"=>"Database_1","User"=>"root","Password"=>""),
     array("Host"=>"localhost","DatabaseName"=>"Database_2","User"=>"root","Password"=>"")
    );
  $sQuery="SELECT * FROM Orders where ProductID=3";
  $aResultData=getMultiDatabaseData($aaListOfDatabaseParameters,$sQuery);

For aggregates the arrays of data can be combined programmatically and it is easy enough to add something to identify the source databases too.

Scenario B

For scenario B, again, different SQL statements will be needed for each different type of source database, however some standardisation can be performed programmatically rather than via SQL which may be more efficient (and less load on the source servers).

Scenario C

For scenario C you need to link the data from the 2 source databases programmatically, to do this ad hoc should be fairly straight forward, while to do it generically is more complicated (why not let a database do this for us?).

The advantages of this approach are:

No duplication - the data comes straight from the source
No latency - the data comes straight from the source

The disadvantages are:

It falls down or provides an incomplete result if any of the sources is not available
The implementation can become complex
The response may well be slow (or very slow) especially with poor connectivity to remote servers
It unpredictably loads the source servers
Core database functionalities (joins, aggregate functions etc) need to be implemented programatically and applied to the resulting data from the different sources which makes for extra work

next...