Comparison of Database libraries
The Purpose of this document is to list our different options for database libraries going forward. This issue is raised because it has become evident that our existing DB abstraction library is no longer satisfactory. We seek to find the best fit given our business and technical needs.
The problem
As you may well know, the database is the backbone of our software. Because of this, interaction with the database comprises a substantial portion of our code. Furthermore, additional code is required to prepare for said database interaction, and yet more code to process the output.
This all works well enough as is, one could say. It certainly has gotten us this far, but I think the time has come to step back a bit and contemplate the way we do a few things. Because of it's repetition throughout the enormous volume of code out there, and the volume of code we have yet to write, It would seem that a small reduction in code volume and complexity would end up having a tremendous effect. Any simplification to the interface of the DB library would be multiplied by hundreds or even thousands of occurrances throughout the code. It would Increase readability, decrease overall complexity as well as bugs, and potentially quicken on-boarding of new employees.
Consider a DB call that currently looks like this:
return $self->_error('_enum failed') unless
my $Estatus = $self->_enum('alloc','status');
my $allocs = $dbh->select(
-table => 'allocation',
-fields => 'alloc_id product_id date status quantity',
-where => {
product_id => ['d in',@{$prodids}],
status => ['d in', $Estatus->_multi('active inactive')],
},
);
foreach my $alloc (@{$allocs}){
if($alloc->{status} == $Estatus->active))
$dbh->update(
-table => 'allocation',
-fields => {
quantity => ['d', $alloc->{quantity} + 1],
},
-where => {
alloc_id => ['d',$alloc->{alloc_id}],
},
);
}
}
Could potentially be made to look like this:
my $allocs = $dbh->allocation->where( product_id => $prodids, status => 'active inactive' );
while (my $alloc = $allocs->next){
if($alloc->status eq 'active'){
$alloc->quantity( $alloc->quantity + 1 );
}
}Obviously it's a whole lot shorter, and why are there no fields specified on the record retrieval? we'll get to that in a bit.
One must also consider some of the pre and post DB processing that is frequently involved.
- enum coding and decoding
- building lookup hashes
- date conversion (to and from unixtime)
- other database lookups
- adding customer / user info to the resultset
- storage and retrieval of attributes
- checking for errors
A quick perusal through our code will show that the Enum system is the worst offender of these. It's not uncommon to see a sight such as this in a majority of our database interacting subroutines:
return $self->_error('_enum failed') unless
my $Estatus = $self->_enum('corr','status');
return $self->_error('_enum failed') unless
my $Etype = $self->_enum('corr','type');
return $self->_error('_enum failed') unless
my $Eaction = $self->_enum('corr','action');
return $self->_error("type not found: $params{type}") unless $Etype->{$params{type}};
$fields{type_id} = ['d',$Etype->{$params{type}}];
... database code ...
The solution
The solution to these problems is fairly straight forward in theory... We must find or build a library that is aware of the data structure. It must know what tables have what fields. It must know what fields are enums, what fields are dates, what fields are integers, and so on.
'''Enter, the ORM'''
An [Relational Mapper] is basically a DB interface layer that is entirely object oriented. It allows a relational database to behave in much the same way as a true object database. If executed well, an ORM affords the user all of the power of a relational DB, combined with the ease of use of an object database, and none of the flaws (Lack of scalability, etc)
The ORM option that we select should, at a minimum, achieve the following:
- Simplify the syntax for common DB tasks, as compares to our existing library.
- allow for fields to be updated in an OO fashion, instead of building a full update statement.
- Automatically translate enums bidirectionally
- Automatically translate dates bidirectionally
- Preserve our ability to execute multiple statements inside of a transaction
- Simplify or eliminate the building of lookup hashes
- Enforce constraints on values allowed into fields (length, data type, regex, etc)
Expanding DBR
DBR as it stands now is a pretty simplistic library. It is not currently an ORM, and instead functions as an SQL abstraction layer with the added benefit of login/password management and connection caching. My First instinct is to expand DBR to meet these requirements. In fact, the work to do so has already begun... I have a slightly ambitious, but achievable plan that should do just what we need. That said, reinventing the wheel is best reserved for cases when there are no previously invented wheels that will do. So It's worth having a sanity check to make sure we aren't passing up any better options.
The intention here is to find the best solution for our needs on an ongoing basis, and the fact that we've always used DBR should not necessarily hold much weight in this decision making process. Instead it should be based on the merits of each given option.
Our choices
So without further ado, I present you with a few options that are currently under consideration:
| | | Class::DBI | DBIx::Class | Rose::DB::Object | ApolloUtils::DBR (upgrade, phase 1) |
| General scheme | | One perl module per table. | One perl module per table -OR- Scans DB every load | One perl module per table -OR- Scans DB every load | Connects to config DB, which describes the DB schema. No overloaded perl modules. Config DB is managed with a command line admin tool. NO DB scanning is done by the library itself. |
| DB Login info management | | No, you code it | No, you code it | No, you code it | Yes, Stores usernames passwords, etc, and manages multiple classes of DB ( master, query, etc) |
| Connection mgmt / caching | | No | No | Yes | Yes |
| Object oriented usage | | Yes | Yes | Yes | Yes |
| Relationship Management | | Very basic, manually defined in code | Manual definition -OR- Scans DB for foreign keys. | Manual definition -OR- Scans DB for foreign keys. | Config DB provides all relationship definitions. |
| Initialization overhead | | Moderate. Must load many perl modules. | Moderate, Must load many perl modules, or execute many describe table statements on load | Moderate, Must load many perl modules, or execute many describe table statements on load | Light to moderate, must select definitions from the config DB, but could potentially load them on the fly as well. |
| Custom functions | | Yes, Overload per table module | Yes, Overload per table module (Not possible with DB scan) | Yes, Overload per table module. (Not possible with DB scan) | No custom code per table. Instead, Classes of behaviors can be created, and applied to various tables and/or fields. |
| Enum value translation | | None | None | None | Automatic. Maintained in Definitions DB |
| Lazy retrieval / Iterator function | | Yes | Yes | Yes | Yes |
| Basic Transactions | | Yes, but no auto rollback or nested transaction management | No | Yes, but verbose. You have to pass a special db object to all queries within the transaction. | Yes, with auto rollback and nested transaction management. |
| Multi-object transactions | | identical to above | Yes, but Really odd. Must create an anonymous sub and call it with $schema->txn_do($coderef); | identical to above | Yes |
| Basic select | | $cds = Music::CD->search( year => 1990 ); | $schema->resultset('Album')->search({ artist => 'Santana' }); | $products = Product::Manager-> get_products( query =>price=> 1.00 ); | $dbh->album->where( artist => 'Santana' ); |
| Greater than / Less than | | No support | $schema->resultset('Album')-> search( { last_attempt=>{ '<' => $value}); | $products = Product::Manager-> get_products( query =>price=>{ lt=>1.00 } ); | $dbh->album->where( price => LT $value); |
| Subqueries | | No support | Raw Sql only | Raw Sql only | Yes |
| Update | | $record->somevalue('newvalue') | $record->somevalue('newvalue') | $record->somevalue('newvalue'); $record->save | $record->somevalue('newvalue') |
| Join | | Limited | Yes | Yes | Yes |
| Insert | | Music::CD->insert ( { artistid => 1, name => 'Foo' }); | $album = $schema->resultset('Album')-> create({artist => 'Pink Floyd'}); | $p1 = Product->new(name => 'Sprocket'); $p1->save; | $album = $dbh->album->new( $somehashref ) |
| Delete | | $cd->delete; | $album->delete | $p1->delete | $album->delete |
| Create | | No | Yes, creates tables based on perl coded table definitions. | No | No |
| Data constraints | | Yes, you write it per module | Yes, you write it per module | Yes, you write it per module | Yes, Configurable |
| Perl triggers | | Yes, you write it per module | Yes, you write it per module | Yes, you write it per module | Possibly, but would be difficult |
| Lazy update | | Yes, Optional | Yes, Optional | Only Lazy update. | Yes, Optional, possibly automatic inside transactions. |
| Field prefetch | | Very basic, you must specify the fields or field groups to be prefetched. | Very basic, you must specify the fields to be prefetched. No field groups are supported. | Non-configurable. Fields are either “Lazy” (fetched one demand when an accessor is called) or they are always prefetched. By default, ALL fields in a table are fetched. | Yes, fields will be automatically prefetched based on the scope the code is being executed from, and stored information about the fields used in that scope in the past. In practice, all used fields will be prefetched after the first few uses of a new piece of code. |
| Query flexibility | | Not very flexible | More flexible, but more verbose too | Flexible, but the most verbose syntax of the bunch. | Flexible and Concise. |
| Pros | | Simpler than the other modules | Reasonably powerful, and configurable | Powerful, and configurable. | We can tailor it to our needs. |
| Cons | | Less powerful than the other modules. Prefetch problem makes required code for selects larger. | Much of that power is lost unless you write custom perl modules for every table in your database. Also suffers from the same problem with prefetched fields as Class::DBI; | Very complicated. The underlying code is ENORMOUS. Much of the power of this library is lost unless you write custom perl modules for every table in your database. | It's not written yet. We would have to write it. We would have to document it. |
| Notable Gotchas | | N/A | Each resultset object can be queried, yielding another resultset object. It is unclear what the memory and performance implications of this are, but It would seem that this feature could give you plenty of rope to hang yourself | The prefetching problem is potentially worse with this module, as all fields are fetched by default. This uses a tremendous amount of memory. Additionally, there seems to be no option to override the prefetched fields. | N/A |
| General comments | | Class::DBI is very basic and is less than ideal for our needs. | Making modules for over 150 tables would be difficult at hard to maintain. Loading definitions by scanning the DB would be easier, but presents no ability to store or act on extended information, such as enum translations or special constraints. | Making modules for over 150 tables would be difficult at hard to maintain. Loading definitions by scanning the DB would be easier, but presents no ability to store or act on extended information, such as enum translations or special constraints. | It may seem very strange to store database schema information inside of yet another database, but it allows additional information to be stored in association with the described tables and fields, such as how to look up an enum, or what rules apply to a given table. This will give us much of the same power as the custom module-based solutions, without the huge volume of code. Additionally, This config database allows us to store optimisation data, such that we won't run into prefetch / post fetch issues. |
John's Questions
- DBR
- can you provide an example of a sub-query syntax?
- can you provide examples of data constraints?
Data constraints would throw an error if an incorrect value was entered.
Basic constraints would throw and error if, you attempted to set a field to something incorrect.
Examples include:
- Integers
- Unsigned integers (positive only)
- Out of range errors (try to set a tinyint to > 255)
- Invalid enum values
- Bad email addresses in an email field
- Dates and times
Custom Constraints could potentially include:
- Duplicate checking
- Other stuff
- could/can interesting things be done with 'virtual' fields, such as
- 'items' for an order, or
- 'product' for an order item?
- DBIx::Class supports that, DBR could as well.
- YASL
- would existing pages require any update?
- No change. any DBR changes, or cutover to another DB library would not affect existing code
- would YASL require a parallel update?
Ollie's Input