Thu, 24 Apr 2008

Violating Perl Module Namespaces

Perl doesn't enforce access to modules' namespaces. This would usually be considered a bad thing, but sometimes it allows us to work around problems in modules without changing their code. Here's a perfect example:

I've been writing a script to talk to an XML-RPC endpoint, using Frontier::Client but for one of the requests, the script throws the following error:

wanted a data type, got `ex:i8'

Turning on debugging showed the response type was indeed ex:i8, which isn't one of the types that Frontier::Client supports.

<?xml version="1.0" encoding="UTF-8"?>
<methodResponse xmlns:ex="http://ws.apache.org/xmlrpc/namespaces/extensions">
  <params>
    <param>
      <value>
        <ex:i8>161</ex:i8>
      </value>
    </param>
  </params>
</methodResponse>

Searching through the code shows Frontier::Client is a wrapper around Frontier::RPC2 and the error message happens at the following section:

   } elsif ($scalars{$tag}) {
       $expat->{'rpc_text'} = "";
       push @{ $expat->{'rpc_state'} }, 'cdata';
   } else {
       Frontier::RPC2::die($expat, "wanted a data type, got \`$tag'\n");
   }

So we can see that it's looking up the tag into a hash called %scalars to see if the type is a scalar type, otherwise throws the error we saw. Looking at the top, we can see this hash:

%scalars = (
    'base64' => 1,
    'boolean' => 1,
    'dateTime.iso8601' => 1,
    'double' => 1,
    'int' => 1,
    'i4' => 1,
    'string' => 1,
);

So, if we could add ex:i8 to this scalar, we could fix the problem. We could fix the module, but that would require every user of the script to patch their copy of the module. The alternative is to inject something into that hash across module boundaries, which we can do by just refering to the hash by it's complete name including the package name. We can use:

$Frontier::RPC2::scalars{'ex:i8'} = 1;

Now when we run the script, everything works. It's not nice and it's dependent on Frontier::RPC2 not changing. but it allows us to get on with our script.

[perl] | # Read Comments (1) |

Comments

Thu, 17 Apr 2008

Using In-memory tarballs with Archive::Tar

Archive::Tar is a useful library for working with tar archives from Perl. Unfortunately, one thing it doesn't allow is using data from memory as the archive. From the TODO section:

Allow archives to be passed in as string

Currently, we only allow opened filehandles or filenames, but not strings. The internals would need some reworking to facilitate stringified archives.

Fortunately, it does allow you to use a filehandle. I've previously mentioned about how useful the IO::Handle subsystem in perl is. And we should be able to use it in this case. The module we'll want is IO::String, which is a IO::Handle over a perl scalar. We can use it:

my $tar = new Archive::Tar(new IO::String($data));

Unfortunately when we run this now we get:

Cannot read compressed format in tar-mode at Foo.pm line 41
No data could be read from file at Foo.pm line 41

It turns out that this is because Archive::Tar uses IO::Zlib internally if the file isn't uncompressed, but this doesn't provide the ability to uncompress from a filehandle. The answer is to uncompress the data before passing it to Archive::Tar and the easiest way to do this is to use the IO::Uncompress::Gunzip module, so we can rewrite our code to:

my $tar = new Archive::Tar(new IO::Uncompress::Gunzip(new IO::String($data)));

Now when you run the script, Archive::Tar has an uncompressed tar stream. Yet another situation where IO::Handles comes to the rescue.

[] | # Read Comments (1) |

Comments

Boilerplate code for a perl class

Because I always forget when I need to create a new class in perl:

package Foo::Bar;

use strict;
use warnings;

sub new {
   my $this = shift;
   my $class = ref($this) || $this;
   my $self = {};
   bless $self, $class;
   $self->initialize(@_);
   return $self;
}

sub initialize {
   my $self = shift;
}

1;

If you have any useful additions I'd love to know.

[] | # Read Comments (5) |

Comments

Wed, 13 Jun 2007

Atomic in-place rewriting of files with backup in perl

In my article on Perl's IO::Handle objects I talked briefly about IO::AtomicFile and IO::Digest. I've just had reason to use these very useful modules to create a script which edits a file in place. These modules allowed me to do the rewrite atomically and optionally make a backup if the contents have changed. The example assumes you have a function called perform_rewrite that takes two file handles as the first two parameters.

use File::Copy;
use IO::File;
use IO::AtomicFile;
use IO::Digest;

sub rewrite_file {
   my $file = shift;
   my $sub = shift;
   my $input = new IO::File($file,'r');
   my $input_md5 = new IO::Digest($input, 'MD5');
   my $output = new IO::AtomicFile($file,'w');
   my $output_md5 = new IO::Digest($output, 'MD5');

   $sub->($input, $output, @_);

   if ($input_md5->hexdigest ne $output_md5->hexdigest) {
           copy ("$file", "$file.bak");
           $output->close();
   } else {
           # we haven't changed so don't bother updating
           $output->delete();
   }
   $input->close();
}

rewrite_file("/foo/bar", \&perform_rewrite, $baz, $quux);

[] | # Read Comments (0) |

Comments

Sun, 11 Jun 2006

Class::DBI performance

Class::DBI is a very nice database abstraction layer for perl. It allows you to define your tables and columns and it magically provides you with classes with accessors/mutators for those columns. With something like Class::DBI::Pg, you don't even need to tell it your columns; it asls the database on startup. It's all very cool mojo and massively decreases the development time on anything database related in perl.

Unfortunately, as far as I can tell, it has a massive performance problem in its design. One of the features of Class::DBI is lazy population of data. It won't fetch data from the database until you try to use one of the accessors. This isn't normally a problem, except with retrieve_all(). Basically this function returns a list of objects for every row in your table. Unfortunately, due to the lazy loading of data, retrieve_all() calls SELECT id FROM table; and then every time you use an object it calls SELECT * FROM table WHERE id = n;. For a small table, this isn't too bad, but for a large table, it's a killer.

I did a little benchmark today to see just how much slower it is over plain DBI. I wrote two functions which iterate over a table, assigning one value to a function (forcing Class::DBI to fetch the data). The table in question contains 635 rows. The code I used was:

use strict;
use warnings;

use Benchmark qw(:all) ;

use Foo;

use DBI;

sub class_dbi {
   for my $foo (Foo->retrieve_all()) {
      my $bar = $foo->bar;
   }
}

sub dbi {
   my $dbh = DBI->connect("dbi:Pg:dbname=$db;host=$host",$user, $passwd);
   my $sth = $dbh->prepare("SELECT * FROM foos;");
   $sth->execute();
   while(my $row = $sth->fetchrow_hashref()) {
      my $bar = $row->{bar};
   }
}
cmpthese(100, {
      'Class::DBI' => 'class_dbi();',
      'DBI' => 'dbi();',
   });

The results:

brick david% perl benchmark.pl
           s/iter Class::DBI        DBI
Class::DBI   10.3         --       -97%
DBI         0.351      2845%         --

Class::DBI is more than 28 times slower than using DBI directly. I'm hoping that someone will now tell me "Oh you just do blah", otherwise I'm going to have to rewrite some of my code. One thing to learn from this is that reduction in development time can often cost you more in other areas, and it's often runtime performance.

Update: It appears that the bug is that Class::DBI::Pg does't set the Essential list of columns, so Class::DBI uses the primary column. you can fix this by adding the following to your database modules:

__PACKAGE__->columns(Essential => __PACKAGE__->columns);

Remember you'll need to do that for each of your modules; you won't be able to do it in your superclass, as you won't have discovered your columns yet. This has increased performance, but not massively. New timings (with the addition of using Class::DBI through an iterator):

              s/iter Class::DBI it    Class::DBI           DBI
Class::DBI it   6.35            --           -2%          -94%
Class::DBI      6.23            2%            --          -94%
DBI            0.350         1714%         1680%            --

Update 2: It appears that further speedgains can be made by not using Class::DBI::Plugin::DateTime::Pg to convert the three timestamp columns in my table into DateTime objects.:

              s/iter Class::DBI it    Class::DBI           DBI
Class::DBI it   1.26            --          -11%          -72%
Class::DBI      1.12           12%            --          -69%
DBI            0.350          260%          220%            --
[] | # Read Comments (2) |

Comments

Thu, 01 Jun 2006

Join Map Ord Split

Just when you thought Perl couldn't get more unreadable, someone[0] comes up with something like this:

print join ", ", map ord, split //, $foo;

This mess of perl might be easier to understand if I put the brackets in:

print join (", ", map( ord, split( //, $foo)));

What this does is split $foo into a list of characters. It then uses map to run ord() on each item in the list to return a new list containing the numeric character values. We then join these again with ", " to make the output easier to read.

david% perl -e 'print join ", ", map ord, split //, "word";'
119, 111, 114, 100

The map function is familiar to functional programmers and is very powerful, but beware it can reduce the clarity of your code.

[0] Me

[] | # Read Comments (7) |

Comments

Fri, 19 May 2006

IO::Handle

Another article for your viewing pleasure. This article describes how to use Perl's IO based IO::Handle IO system and a couple of modules that allow you do to some interesting things like seamlessly handle compressed files and calculate MD5 sums as you read a file in.

use IO::Digest;

my $fh = new IO::file($filename, "r");
my $iod = new IO::Digest($fh, ’MD5’);

read_and_parse($fh);

print $iod->hexdigest
[] | # Read Comments (0) |

Comments