I keep writing code to talk to databases in perl and I’m forever
forgetting the correct runes for talking to databases, so I thought I’d
stick it here for easy reference.

use DBI;

my $db_driver = "Pg" # Pg or mysql (or others)
my $db_name = "database";
my $db_host = "localhost";
my $db_user = "username";
my $db_pass = "password";


my $dbh = DBI->connect("dbi:$db_driver:dbname=$db_name;host=$db_host",
   $db_user, $db_pass);

It’s probably handy to give an example of a common database read
operation

my $sth = $dbh->prepare( "SELECT * FROM table WHERE id  = ?")
      or die $dbh->errstr;

$sth->execute($id) or die $dbh->errstr;

while (my $hashref = $sth->fetchrow_hashref) {
   print $hashref->{id};
}

Perl doesn’t enforce access to modules’ namespaces. This would
usually be considered a bad thing, but sometimes it allows us to work
around problems in modules without changing their code. Here’s a perfect
example:

I’ve been writing a script to talk to an XML-RPC endpoint, using
Frontier::Client but for
one of the requests, the script throws the following error:

wanted a data type, got `ex:i8'

Turning on debugging showed the response type was indeed ex:i8, which
isn’t one of the types that Frontier::Client supports.

<?xml version="1.0" encoding="UTF-8"?>
<methodResponse xmlns:ex="http://ws.apache.org/xmlrpc/namespaces/extensions">
  <params>
    <param>
      <value>
        <ex:i8>161</ex:i8>
      </value>
    </param>
  </params>
</methodResponse>

Searching through the code shows Frontier::Client is a wrapper around
Frontier::RPC2 and the error message happens at the following
section:

   } elsif ($scalars{$tag}) {
       $expat->{'rpc_text'} = "";
       push @{ $expat->{'rpc_state'} }, 'cdata';
   } else {
       Frontier::RPC2::die($expat, "wanted a data type, got `$tag'n");
   }

So we can see that it’s looking up the tag into a hash called
%scalars to see if the type is a scalar type, otherwise throws
the error we saw. Looking at the top, we can see this hash:

%scalars = (
    'base64' => 1,
    'boolean' => 1,
    'dateTime.iso8601' => 1,
    'double' => 1,
    'int' => 1,
    'i4' => 1,
    'string' => 1,
);

So, if we could add ex:i8 to this scalar, we could fix the
problem. We could fix the module, but that would require every user of
the script to patch their copy of the module. The alternative is to
inject something into that hash across module boundaries, which we can
do by just refering to the hash by it’s complete name including the
package name. We can use:

$Frontier::RPC2::scalars{'ex:i8'} = 1;

Now when we run the script, everything works. It’s not nice and it’s
dependent on Frontier::RPC2 not changing. but it allows us to get on
with our script.

Archive::Tar
is a useful library for working with tar archives from Perl.
Unfortunately, one thing it doesn’t allow is using data from memory as
the archive. From the TODO section:

Allow archives to be passed in as string

Currently, we only allow opened filehandles or filenames, but not
strings. The internals would need some reworking to facilitate
stringified archives.

Fortunately, it does allow you to use a filehandle. I’ve previously
mentioned
about how useful the IO::Handle subsystem in perl is. And we
should be able to use it in this case. The module we’ll want is
IO::String, which is a IO::Handle over a perl scalar. We can use it:

my $tar = new Archive::Tar(new IO::String($data));

Unfortunately when we run this now we get:

Cannot read compressed format in tar-mode at Foo.pm line 41
No data could be read from file at Foo.pm line 41

It turns out that this is because Archive::Tar uses IO::Zlib
internally if the file isn’t uncompressed, but this doesn’t provide the
ability to uncompress from a filehandle. The answer is to uncompress the
data before passing it to Archive::Tar and the easiest way to do this is
to use the IO::Uncompress::Gunzip module, so we can rewrite our code
to:

my $tar = new Archive::Tar(new IO::Uncompress::Gunzip(new IO::String($data)));

Now when you run the script, Archive::Tar has an uncompressed tar
stream. Yet another situation where IO::Handles comes to the rescue.

Because I always forget when I need to create a new class in
perl:

package Foo::Bar;

use strict;
use warnings;

sub new {
   my $this = shift;
   my $class = ref($this) || $this;
   my $self = {};
   bless $self, $class;
   $self->initialize(@_);
   return $self;
}

sub initialize {
   my $self = shift;
}

1;

If you have any useful additions I’d love to know.

In my
article
on Perl’s IO::Handle objects I talked briefly about IO::AtomicFile
and IO::Digest. I’ve just had reason to use these very useful modules to
create a script which edits a file in place. These modules allowed me to
do the rewrite atomically and optionally make a backup if the contents
have changed. The example assumes you have a function called
perform_rewrite that takes two file handles as the first two
parameters.

use File::Copy;
use IO::File;
use IO::AtomicFile;
use IO::Digest;

sub rewrite_file {
   my $file = shift;
   my $sub = shift;
   my $input = new IO::File($file,'r');
   my $input_md5 = new IO::Digest($input, 'MD5');
   my $output = new IO::AtomicFile($file,'w');
   my $output_md5 = new IO::Digest($output, 'MD5');

   $sub->($input, $output, @_);

   if ($input_md5->hexdigest ne $output_md5->hexdigest) {
           copy ("$file", "$file.bak");
           $output->close();
   } else {
           # we haven't changed so don't bother updating
           $output->delete();
   }
   $input->close();
}

rewrite_file("/foo/bar", &perform_rewrite, $baz, $quux);

Class::DBI is
a very nice database abstraction layer for perl. It allows you to define
your tables and columns and it magically provides you with classes with
accessors/mutators for those columns. With something like
Class::DBI::Pg, you don’t even need to tell it your columns; it
asls the database on startup. It’s all very cool mojo and massively
decreases the development time on anything database related in perl.

Unfortunately, as far as I can tell, it has a massive performance
problem in its design. One of the features of Class::DBI is lazy
population of data. It won’t fetch data from the database until you try
to use one of the accessors. This isn’t normally a problem, except with
retrieve_all(). Basically this function returns a list of
objects for every row in your table. Unfortunately, due to the lazy
loading of data, retrieve_all() calls SELECT id FROM
table;
and then every time you use an object it calls SELECT *
FROM table WHERE id = n;
. For a small table, this isn’t too bad, but
for a large table, it’s a killer.

I did a little benchmark today to see just how much slower it is over
plain DBI. I wrote two functions which iterate over a table, assigning
one value to a function (forcing Class::DBI to fetch the data). The
table in question contains 635 rows. The code I used was:

use strict;
use warnings;

use Benchmark qw(:all) ;

use Foo;

use DBI;

sub class_dbi {
   for my $foo (Foo->retrieve_all()) {
      my $bar = $foo->bar;
   }
}

sub dbi {
   my $dbh = DBI->connect("dbi:Pg:dbname=$db;host=$host",$user, $passwd);
   my $sth = $dbh->prepare("SELECT * FROM foos;");
   $sth->execute();
   while(my $row = $sth->fetchrow_hashref()) {
      my $bar = $row->{bar};
   }
}
cmpthese(100, {
      'Class::DBI' => 'class_dbi();',
      'DBI' => 'dbi();',
   });

The results:

brick david% perl benchmark.pl
           s/iter Class::DBI        DBI
Class::DBI   10.3         --       -97%
DBI         0.351      2845%         --

Class::DBI is more than 28 times slower than using DBI directly. I’m
hoping that someone will now tell me “Oh you just do blah”, otherwise
I’m going to have to rewrite some of my code. One thing to learn from
this is that reduction in development time can often cost you more in
other areas, and it’s often runtime performance.

Update: It appears that the bug is that
Class::DBI::Pg does’t set the Essential list of columns, so
Class::DBI uses the primary column. you can fix this by adding the
following to your database modules:

__PACKAGE__->columns(Essential => __PACKAGE__->columns);

Remember you’ll need to do that for each of your modules; you won’t
be able to do it in your superclass, as you won’t have discovered your
columns yet. This has increased performance, but not massively. New
timings (with the addition of using Class::DBI through an iterator):

              s/iter Class::DBI it    Class::DBI           DBI
Class::DBI it   6.35            --           -2%          -94%
Class::DBI      6.23            2%            --          -94%
DBI            0.350         1714%         1680%            --

Update 2: It appears that further speedgains can be
made by not using Class::DBI::Plugin::DateTime::Pg to convert the three
timestamp columns in my table into DateTime objects.:

              s/iter Class::DBI it    Class::DBI           DBI
Class::DBI it   1.26            --          -11%          -72%
Class::DBI      1.12           12%            --          -69%
DBI            0.350          260%          220%            --

Just when you thought Perl couldn’t get more unreadable, someone[0] comes up with something like this:

print join ", ", map ord, split //, $foo;

This mess of perl might be easier to understand if I put the brackets
in:

print join (", ", map( ord, split( //, $foo)));

What this does is split $foo into a list of characters. It then uses
map to run ord() on each item in the list to return a new list
containing the numeric character values. We then join these again with
“, ” to make the output easier to read.

david% perl -e 'print join ", ", map ord, split //, "word";'
119, 111, 114, 100

The map function is familiar to functional programmers and is very
powerful, but beware it can reduce the clarity of your code.

[0] Me

Another article for your viewing pleasure. This
article
describes how to use Perl’s IO based IO::Handle IO system and a
couple of modules that allow you do to some interesting things like seamlessly
handle compressed files and calculate MD5 sums as you read a file in.

use IO::Digest;

my $fh = new IO::file($filename, "r");
my $iod = new IO::Digest($fh, ’MD5’);

read_and_parse($fh);

print $iod->hexdigest