Why Database Abstraction Is Useless 90% of The Time

I’m sure you’ve read a lot of posts, articles and forums where one of the “best practices” is to use a database abstraction layer. It really isn’t, and experience has taught me that the amount of times you change database server is far too minimal to matter. When you do change, you’re re-writing a lot of SQL again to either leverage your change, or remove vendor-specific commands.

Portability Is Impractacle

Many proponents of database abstraction layers state that if a good abstraction layer is used, when you change your database it’s as simple as changing $db to $newdb and all of your code will continue to work; it’s just that easy. Unfortunately it’s not and statements like those couldn’t be farther from the truth.

Unless the only thing your application does is insert/update/delete one row in one table ever, you will run into problems. If this is the case, why are you changing to a new database server to begin with? You should buy yourself a lottery ticket if the only thing you must worry about is changing mysql_query() to pg_query(). The vast majority of times you will also need to redesign the database schema, re-create views/triggers/functions and re-write your code and SQL to accomodate new or different data types, indexes or syntax.

In some code I’ve seen, you will see a database abstraction layer which wraps mysql_query() but then will make use of auto_increment values and mysql_insert_id() calls. The abstraction layer is useless in this instance because should you switch your DB to PostgreSQL you will find auto_increment as a flag does not exist, and getting the last inserted primary key requires a sequence to be created. This is the simplest of conversion issues you will run into, nevermind triggers and functions.

Any good programmer will alway want to use the best tool for the job, versatility just isn’t viable. For databases, even moreso as each RDBMS implements it’s own unique features, functions and methodologies that must be considered in any database application (like the last inserted primary key feature that MySQL provides). While limiting yourself to the non-vendor specific features will make your SQL more portable, it will do nothing to improve your application both from a performance point of view as well as architecturally. Your SQL will actually become more complex as you attempt to make it more generic.

Changing a live application’s database just doesn’t happen often enough as it is cost prohibitive. Your database schema and SQL are not portable across different types of servers and your database abstraction layer doesn’t actually solve this problem.

Database Abstraction Vs. Database API

The other major argument for a database abstraction layer is that your code is not littered with mysql_query() statements everywhere. Instead, they are $db->query() or some other similar construct which will allow you to change the $db variable from one class to another. I hope you can see the error in this way of thinking. As discussed earlier, your SQL code can’t be ported from server to server in any non-trivial application.

The solution to this problem is rather simple. Move your abstraction further up the stack. Create an API for your application to interface with. This produces a level of abstraction from your application to your data source while still utilizing your server’s specific features.

Your code will not have SQL littered everywhere, it will be in one self-contained location so that updating, changing or even depreciating various tasks can be done easily. The code below shows you how the API approach differs from an abstraction layer.

class DataAccess()
{
  public function __construct($host, $user, $pass)
  {
    $this->link = mysql_connect($host, $user, $pass) or die 'errr';
    mysql_select_db('my_db_name', $this->link);
  }
}

class Users extends DataAccess
{

  public function varDumpUsers()
  {
    $q = mysql_query('SELECT * FROM user', $this->link);
    while (( $row = mysql_fetch_array($q) ))
    {
      var_dump($row);
    }
  }
}
// -- separate file with the appropriate require statement
$_user = new Users();
$_user->getAllUsers();

And now look at our database abstraction layer :

abstract class baseDb
{
  abstract public function __construct($host, $user, $pass);
  abstract public function query($sql);
  abstract public function getRow();
}

class MySQLDb extends baseDb
{
  public function __construct($host, $user, $pass)
  {
    $this->link = mysql_connect($host, $user, $pass) or die 'errr';
    mysql_select_db('my_db_name', $this->link);
  }

  public function query($sql)
  {
    return mysql_query($sql, $this->link);
  }

  public function getRow()
  {
    return mysql_fetch_array();
  }
}
// -- separate file with the appropriate require statement
$_db = new MySQLDb($host, $user, $pass);
$result = $_db->query('SELECT * FROM user');
while (( $row = $db->getRow() ))
{
  var_dump($row);
}

What is the difference? The difference lies in where the abstraction occurs. It’s not at the database connection level. All SQL is contained in one location (my API classes) and this allows me to decrease code-duplication, and let my application become a little dumber (or decouple it). In the first example, my code doesn’t care what database I’m using. It calls varDumpUsers() and expects somehow, all users get dumped to the screen. It’s an API call with a defined input of nothing and a return value of void. In the abstraction layer example, the application knows it’s talking to a database, provides the SQL to use and then executes it.

One of the benefits of the API method is that if I am constantly var dump’ing the values from the user table, it’s one call to the API. Via the abstraction method, it’s SQL, a loop and a var dump duplicated anywhere I need it. One of the golden rules of coding is never duplicate code.

What About Combining The Two?

Combining the two patterns can be beneficial as long as the abstraction layer is not simply a wrapper to the various mysql_ functions. To expand on our API example, consider the following:

class DataAccess()
{
  private $result = null, $link = null;
  public function __construct($host='localhost', $user='user', $pass='pass')
  {
    $this->link = mysql_connect($host, $user, $pass) or die 'errr';
    mysql_select_db('my_db_name', $this->link);
  }

  public function query(sql)
  {
    $this->result = mysql_query($sql, $this->link);
    if (!$this->result)
    {
      error_log('MySQL error ' . mysql_errno() . ' - ' . mysql_error());
      throw new Exception('MySQL error ' . mysql_errno() . ' - ' . mysql_error());
    }
    return $this->result;
  }

  public function getRow($offset=0)
  {
    if ($offset != 0) { mysql_data_seek($this->result, $offset); }
    return mysql_fetch_array($this->result);
  }
}

class Users
{
  private $db;

  public function __construct($db)
  {
    $this->db = $db;
  }

  public function varDumpUsers()
  {
    $this->db->query('SELECT * FROM user');
    while (( $row = $this->db->getRow() ))
    {
      var_dump($row);
    }
  }
}
// -- separate file with the appropriate require statement
$_da = new DataAccess(); // put in var so it can be passed to other API classes
$_user = new Users($_da);
$_user->varDumpUsers();

We’ve added to our DataAccess class and provided more functionality. Our query() method will throw an exception if it encounters an error and our getRow() method allows for offsets. It’s more than just an abstraction now, it’s layer that provides actual functionality on top of the standard mysql_() functions.

With that said you will also notice that the DataAccess class doesn’t add much value to the API. The only reason it’s used is to ensure that each additional API class is able to re-use the same database connection in this example. It’s still a database abstraction layer at it’s heart.

An Alternative?

Combining the two works, however the abstraction layer’s only justifiable purpose is to separate the database connection out of the API so that the API is not making new connections all the time. There are ways around this of course; you could make DataAccess a singleton, but consider this instead:

class ApiLoader()
{
  private $loadedApi = array(), $link = null;
  private static $instance;
  private function __construct()
  {
    $this->link = mysql_connect($host, $user, $pass) or die 'errr';
    mysql_select_db('my_db_name', $this->link);
  }

  public static function getInstance($apiName)
  {
    if (!self::$instance)
    {
      self::$instance = new ApiLoader();
    }
    return self::$instance;
  }

  public function loadApi($apiName)
  {
    if (!isset($this->loadedApi[$apiName]))
    {
      $this->loadedApi[$apiName] = new $apiName($this->link);
    }
    return $this->loadedApi[$apiName];
  }
}

class Users
{
  protected $link;

  public function __construct($link)
  {
    $this->link = $link;
  }

  public function varDumpUsers()
  {
    $q = mysql_query('SELECT * FROM user', $this->link);
    while (( $row = mysql_fetch_array($q) ))
    {
      var_dump($row);
    }
  }
}
// -- separate file with the appropriate require statement
$_apiLoader = new ApiLoader::getInstance();
$_apiLoader->loadApi('Users')->getAllUsers();

The ApiLoader->loadApi() method will load any API. Ideally it would throw an exception if it can’t include the file, or the class does not exist and the API should throw an exception if the SQL failing is fatal, but in it’s most basic form, the above code should demonstrate another alternative to database abstraction. Yes, it could use a lot of work, but the goal of the class is to show an alternative to wrapping a bunch of mysql_ functions in a class.

The ApiLoader class has one job, that is to load the API I wish to use. It creates the database connection and then instantiates the API. While this may not be the best way to handle API calls, it does remove the the database abstraction layer and actually provide an API of it’s own.

What Is The Other 10%?

The only time I’ve ever seen a plain database abstraction layer benefit anyone is for open source projects for popular applications. If I wrote an application, let’s say Drupal, I’d want as many people to be able to use it as possible. By adding that layer to abstract the database, I can ensure my target audience is broader. Unless you’re writing an application where supporting multiple database servers would help, you’re better off focusing on one database and utilizing all tools / optimizations / features available to you.

Final Thoughts

There is a difference between database abstraction and a database API. One is useful, the other is not.

This entry was posted in Articles. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>