Class::DBI performance
Class::DBI is a very nice database abstraction layer for perl. It allows you to define your tables and columns and it magically provides you with classes with accessors/mutators for those columns. With something like Class::DBI::Pg, you don't even need to tell it your columns; it asls the database on startup. It's all very cool mojo and massively decreases the development time on anything database related in perl.
Unfortunately, as far as I can tell, it has a massive performance problem in its design. One of the features of Class::DBI is lazy population of data. It won't fetch data from the database until you try to use one of the accessors. This isn't normally a problem, except with retrieve_all(). Basically this function returns a list of objects for every row in your table. Unfortunately, due to the lazy loading of data, retrieve_all() calls SELECT id FROM table; and then every time you use an object it calls SELECT * FROM table WHERE id = n;. For a small table, this isn't too bad, but for a large table, it's a killer.
I did a little benchmark today to see just how much slower it is over plain DBI. I wrote two functions which iterate over a table, assigning one value to a function (forcing Class::DBI to fetch the data). The table in question contains 635 rows. The code I used was:
use strict;
use warnings;
use Benchmark qw(:all) ;
use Foo;
use DBI;
sub class_dbi {
for my $foo (Foo->retrieve_all()) {
my $bar = $foo->bar;
}
}
sub dbi {
my $dbh = DBI->connect("dbi:Pg:dbname=$db;host=$host",$user, $passwd);
my $sth = $dbh->prepare("SELECT * FROM foos;");
$sth->execute();
while(my $row = $sth->fetchrow_hashref()) {
my $bar = $row->{bar};
}
}
cmpthese(100, {
'Class::DBI' => 'class_dbi();',
'DBI' => 'dbi();',
});The results:
brick david% perl benchmark.pl
s/iter Class::DBI DBI
Class::DBI 10.3 -- -97%
DBI 0.351 2845% --Class::DBI is more than 28 times slower than using DBI directly. I'm hoping that someone will now tell me "Oh you just do blah", otherwise I'm going to have to rewrite some of my code. One thing to learn from this is that reduction in development time can often cost you more in other areas, and it's often runtime performance.
Update: It appears that the bug is that Class::DBI::Pg does't set the Essential list of columns, so Class::DBI uses the primary column. you can fix this by adding the following to your database modules:
__PACKAGE__->columns(Essential => __PACKAGE__->columns);
Remember you'll need to do that for each of your modules; you won't be able to do it in your superclass, as you won't have discovered your columns yet. This has increased performance, but not massively. New timings (with the addition of using Class::DBI through an iterator):
s/iter Class::DBI it Class::DBI DBI Class::DBI it 6.35 -- -2% -94% Class::DBI 6.23 2% -- -94% DBI 0.350 1714% 1680% --
Update 2: It appears that further speedgains can be made by not using Class::DBI::Plugin::DateTime::Pg to convert the three timestamp columns in my table into DateTime objects.:
s/iter Class::DBI it Class::DBI DBI Class::DBI it 1.26 -- -11% -72% Class::DBI 1.12 12% -- -69% DBI 0.350 260% 220% --