screen decoration screen decoration
Data at the heart of your organisation
Trial: £free
Standard: from £450

Background: SQL-Hub: Database Merging Tool

screendecoration

Introduction

Burr IST Ltd have been working with a wide range of customers since 2001 on all sorts of reporting, integration and analysis projects using SQL-Hub, JetViewer and other products.

Over the years we have come across a lot of the same challenges and looked at many ways of solving them

These articles explore some of the things that are often discussed and the pros and cons of a number of different solutions. These discussion are intended to give an understanding of some different types of scenarios and how they can be addressed.

screendecoration

Approach X1

Query the multiple sources on demand, on the fly for each enquiry, using a query across multiple databases

This will only work if all the databases are on the same server or linked servers.

Scenario A

To do this for scenario A above you might use a union query e.g.

MS-SQL:

  SELECT
      'Database_1' as DatabaseName, Orders.*
  FROM Database_1.dbo.Orders where ProductID=3 union SELECT
      'Database_2' as DatabaseName, Orders.*
  FROM Database_2.dbo.Orders where ProductID=3

For MySQL just remove the .dbo

Also, you could construct the SQLs dynamically something like this...

PHP (for MySQL):

  function getMultiDatabaseQuerySQL($aListOfDatabases,$sQueryWithPlaceHoldersForDBName)
  {
    $sSQL="";
    foreach($aListOfDatabases as $sDatabaseName)
    {
        if($sSQL)
            $sSQL.=" union ";
        $sSQL.=str_ireplace("",$sDatabaseName,$sQueryWithPlaceHoldersForDBName);
    }
  }
  function &getMultiDatabaseData($aListOfDatabases,$sQueryWithPlaceHoldersForDBName)
  {
      $resMultiDatabaseResult=mysql_query(getMultiDatabaseQuerySQL($aListOfDatabases),
                                                $sQueryWithPlaceHoldersForDBName);
      $aResult=array();
      while($aRow=mysql_fetch_array($resMultiDatabaseResult))
      {
          $aResult[]=$aRow;
      }
      return $aResult;
  }
  
  $aListOfDatabases=array(
    "Database_1",
    "Database_2"
    );
  $sQueryWithPlaceHoldersForDBName="SELECT * FROM .dbo.Orders where ProductID=3";
   $aResultData=getMultiDatabaseData($aListOfDatabases,$sQueryWithPlaceHoldersForDBName);

And for completeness if it's an aggregate the query need to look something like this...

MS-SQL:

 SELECT sum([Value]) from (SELECT
      *
  FROM Database_1.dbo.Orders where ProductID=3 union SELECT
      *
  FROM Database_2.dbo.Orders where ProductID=3) as allDBsOrders

Or

MS-SQL:

  SELECT sum([Value]),DatabaseName FROM (SELECT
      'Database_1' as DatabaseName, Orders.*
  FROM Database_1.dbo.Orders where ProductID=3 union SELECT
      'Database_1' as DatabaseName, Orders.*
  FROM Database_2.dbo.Orders where ProductID=3) as allDBsOrders group by DatabaseName

(you can modify the php code easily enough to do this if you like)


Scenario B

For scenario B though this is going to get pretty complicated.

Different SQL statement will be needed for each different type of source database, which produce a comparable result across the different types.


Scenario C

While for scenario C a simple join will work e.g. (if ProductID happens to be the same in both systems):

MS-SQL:

    SELECT * FROM SalesDatabaseName.dbo.Orders,WebsiteDatabaseName.dbo.Promotions 
    WHERE Orders.ProductID=Promotions.ProductID;

For MySQL just remove the .dbo

If there are different ProductIDs in each systerm it might be more like...

MS-SQL:

    SELECT * FROM
        SalesDatabaseName.dbo.Orders, SalesDatabaseName.dbo.Products as SalesProducts,
        WebsiteDatabaseName.dbo.Promotions, WebsiteDatabaseName.dbo.Products as WebsiteProducts 
    WHERE 
        Orders.ProductID=SalesProducts.ProductID and Promotions.ProductID=WebsiteProducts.ProductID 
        and WebsiteProducts.ProductName=SalesProducts.ProductName;

The advantages of this approach are:

  • No duplication - the data comes straight from the source
  • No latency - the data comes straight from the source

The disadvantages are:

  • It falls down if any of the sources is not available
  • The SQLs statement can become very complex
  • The response may be slow (or very slow) especially with poor connectivity to remote servers
  • It unpredictably loads the source servers
  • In scenario B it will be increasing complicated and possibly unworkable

next...

All content © SQL-Hub / Burr IST 2023
Site by Burr IST, design by foliozine

Burr Information Systems Technology Ltd

28 Park Avenue, West Point, Manchester, M19 2EE

Tel: +44 (0) 161 327 2934