April 2008 - David Pashley.comDavid Pashley.com

Not like not like not like

April 30, 2008

Dear lazyweb,

I’m possibly being stupid, but can someone explain the differences
between these two queries?

mysql> select count(*) from Table
   where blobid is null or not blobid like '%-%';
+----------+
| count(*) |
+----------+
| 15262487 |
+----------+
1 row in set (25 min 4.18 sec)

mysql> select count(*) from Table
   where blobid is null or blobid not like '%-%';
+----------+
| count(*) |
+----------+
| 20044216 |
+----------+
1 row in set (24 min 54.06 sec)

For reference:

mysql> select count(*) from Table where blobid is null;
+----------+
| count(*) |
+----------+
| 15262127 |
+----------+
1 row in set (24 min 7.15 sec)

Update: It turns out that the former was doing (not blobid) like '%-%' which turns out to not do anything sensible:

mysql> select not blobid from Table limit 10;
+------------+
| not blobid |
+------------+
|          0 |
|       NULL |
|          1 |
|          0 |
|          0 |
|          0 |
|          1 |
|       NULL |
|          1 |
|          0 |
+------------+
10 rows in set (0.02 sec)

mysql> select blobid from Table limit 10;
+-------------------------------------------------------------------+
| blobid                                                            |
+-------------------------------------------------------------------+
| 5cd1237469cc4b52ca094e215156c582-9ef460ac4134c600a4d2382c4b0acee7 |
| NULL                                                              |
| d20cb4037f8f9ab1de5de264660f005c-2c34209dcfb39251cf7c16bb6754bbd2 |
| 845a8d06719d8bad521455a8dd47745c-095d9a0831433c92cd269e14e717b3a9 |
| 9580ed23f34dd68d35da82f7b2a293d6-bf39df7509d977a1de767340536ebe80 |
| 06c9521472cdac02a2d4b2a18f8bec0f-0a8a28d3b63df54860055f1d1de92969 |
| ed3cd0dd9b55f76db7544eeb64f3cfa0-80a6a3eb6d73c0a58f88b7c332866d5c |
| NULL                                                              |
| b339f6545651fbfa49fa500b7845c4ce-6defb5ffc188b8f72f1aa10bbd5c6bec |
| 642075963d6f69bb11c35a110dd07c2c-8db54ac2d2accae7fa4a22db1d6caae9 |
+-------------------------------------------------------------------+
10 rows in set (0.00 sec)

The documentation
says Logical NOT. Evaluates to 1 if the operand is 0, to 0 if the
operand is non-zero, and NOT NULL returns NULL. but doesn’t
describe the behaviour of NOT 'string'. It would appear that a
string starting with a number returns 0 and a string starting with a
letter returns 1. Either way, neither has a hyphen in.

User Administration under PostgreSQL 8.3

April 28, 2008

A while ago I published an article on PostgreSQL
user administration. Typically, things have changed since I wrote
that article. I thought I’d detail a couple of the differences since
I wrote that guide.

The major difference is that you now have roles rather than users and
you use the CREATE ROLE command to create them instead of
CREATE USER, although the latter command still works. The
command line options for the createuser command have changed as
a result too. Before superuser and the ability to create new users were
the same thing. Now you can give a role permissions to create new roles
without giving them superuser powers. The options are now -s for
superuser and -S for not superuser, -d to allow them to create
databases and -D to disallow database creation and -r to allow the new
role to create other roles and -R to prevent them. for a standard user
you probably want somethig like:

createuser -S -D -R -P user

The -P makes createuser ask you for a password for
the new role.

You can find out more information about the new role system in
PostgreSQL in the user
management and CREATE ROLE reference sections of the manual.

Upgrading to latest Pyblosxom

April 26, 2008

I’m currently upgrading my blog to PyBlosxom 1.4.3. I apologise for
any broken links or entry flooding.

Update: I’ve finished playing now. I’ve upgraded to
1.4.3 and I don’t think I’ve broken anything yet.

I’ve also taken the opportunity to add a couple of plugins to add
tagging to entries and added the obligatory tag cloud to the side bar
rather than the list of months. I’m going to make some changes to the
comment plugin later to add OpenID support. I’d be interested to know of
any other pyBlosxom plugins you find useful.

I did manage to make a mistake by using vim to edit entries to
add some tags rather than my wrapper script to keep timestamps the same.
This is where I’m glad I have a database table with the metadata from
all my entries to hand. A quick touch foo.txt -d 2006-06-07 19:02:57+01 later and everything was fixed. Hopefully not too many
people got bitten by the few entries that had new dates for a few
minutes. Please let me know if you notice anything broken.

Violating Perl Module Namespaces

April 24, 2008

Perl doesn’t enforce access to modules’ namespaces. This would
usually be considered a bad thing, but sometimes it allows us to work
around problems in modules without changing their code. Here’s a perfect
example:

I’ve been writing a script to talk to an XML-RPC endpoint, using
Frontier::Client but for
one of the requests, the script throws the following error:

wanted a data type, got `ex:i8'

Turning on debugging showed the response type was indeed ex:i8, which
isn’t one of the types that Frontier::Client supports.

<?xml version="1.0" encoding="UTF-8"?>
<methodResponse xmlns:ex="http://ws.apache.org/xmlrpc/namespaces/extensions">
  <params>
    <param>
      <value>
        <ex:i8>161</ex:i8>
      </value>
    </param>
  </params>
</methodResponse>

Searching through the code shows Frontier::Client is a wrapper around
Frontier::RPC2 and the error message happens at the following
section:

   } elsif ($scalars{$tag}) {
       $expat->{'rpc_text'} = "";
       push @{ $expat->{'rpc_state'} }, 'cdata';
   } else {
       Frontier::RPC2::die($expat, "wanted a data type, got `$tag'n");
   }

So we can see that it’s looking up the tag into a hash called
%scalars to see if the type is a scalar type, otherwise throws
the error we saw. Looking at the top, we can see this hash:

%scalars = (
    'base64' => 1,
    'boolean' => 1,
    'dateTime.iso8601' => 1,
    'double' => 1,
    'int' => 1,
    'i4' => 1,
    'string' => 1,
);

So, if we could add ex:i8 to this scalar, we could fix the
problem. We could fix the module, but that would require every user of
the script to patch their copy of the module. The alternative is to
inject something into that hash across module boundaries, which we can
do by just refering to the hash by it’s complete name including the
package name. We can use:

$Frontier::RPC2::scalars{'ex:i8'} = 1;

Now when we run the script, everything works. It’s not nice and it’s
dependent on Frontier::RPC2 not changing. but it allows us to get on
with our script.

Photography In Public Areas Early Day Motion

April 24, 2008

I just emailed my MP the following letter:

Dear David Lepper,

I would just like to thank you for signing Auston Mitchell’s Early Day
Motion 1155 Photography In Public Areas. I have been increasingly
concerned with reports of police action against innocent photographers,
including most recently a man assaulted by several security guards in
Stoke (http://www.flickr.com/photos/happyaslarry/2420960125/). I’m sure
you appreciate Brighton’s reputation as an artistic city and your
support for this motion shows your continued support for the
photography community in Brighton.

Yours sincerely,
David Pashley

If your MP hasn’t signed this EDM, I recommend you contact them to urge them
to sign it and if they have, contact them again to thank them.

Using In-memory tarballs with Archive::Tar

April 17, 2008

Archive::Tar
is a useful library for working with tar archives from Perl.
Unfortunately, one thing it doesn’t allow is using data from memory as
the archive. From the TODO section:

Allow archives to be passed in as string

Currently, we only allow opened filehandles or filenames, but not
strings. The internals would need some reworking to facilitate
stringified archives.

Fortunately, it does allow you to use a filehandle. I’ve previously
mentioned about how useful the IO::Handle subsystem in perl is. And we
should be able to use it in this case. The module we’ll want is
IO::String, which is a IO::Handle over a perl scalar. We can use it:

my $tar = new Archive::Tar(new IO::String($data));

Unfortunately when we run this now we get:

Cannot read compressed format in tar-mode at Foo.pm line 41
No data could be read from file at Foo.pm line 41

It turns out that this is because Archive::Tar uses IO::Zlib
internally if the file isn’t uncompressed, but this doesn’t provide the
ability to uncompress from a filehandle. The answer is to uncompress the
data before passing it to Archive::Tar and the easiest way to do this is
to use the IO::Uncompress::Gunzip module, so we can rewrite our code
to:

my $tar = new Archive::Tar(new IO::Uncompress::Gunzip(new IO::String($data)));

Now when you run the script, Archive::Tar has an uncompressed tar
stream. Yet another situation where IO::Handles comes to the rescue.

Boilerplate code for a perl class

April 17, 2008

Because I always forget when I need to create a new class in
perl:

package Foo::Bar;

use strict;
use warnings;

sub new {
   my $this = shift;
   my $class = ref($this) || $this;
   my $self = {};
   bless $self, $class;
   $self->initialize(@_);
   return $self;
}

sub initialize {
   my $self = shift;
}

1;

If you have any useful additions I’d love to know.

InnoDB being silently disabled

April 3, 2008

Regular viewers will know that I don’t think favourably of MySQL.
Here is yet another reason. Let’s create an InnoDB table:

mysql> CREATE TABLE `User_` (
mysql> ...
mysql> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Query OK, 0 rows affected, 1 warning (0.04 sec)

One warning, but we’re running this as part of an import, so we’ll
fail to spot this and even if we did, we wouldn’t be able to get it back
out of mysql because SHOW WARNINGS only shows the last command.
So let’s look at the table we just created:

mysql> show create table User_G
*************************** 1. row ***************************
       Table: User_
Create Table: CREATE TABLE `User_` (
...
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Eh? what’s going on? We asked for InnoDB, but have got a MyISAM
table. Lets look at the engines available.

mysql> show engines;
+------------+----------+----------------------------------------------------------------+
| Engine     | Support  | Comment                                                        |
+------------+----------+----------------------------------------------------------------+
| MyISAM     | DEFAULT  | Default engine as of MySQL 3.23 with great performance         |
| MEMORY     | YES      | Hash based, stored in memory, useful for temporary tables      |
| InnoDB     | DISABLED | Supports transactions, row-level locking, and foreign keys     |
| BerkeleyDB | NO       | Supports transactions and page-level locking                   |
| BLACKHOLE  | NO       | /dev/null storage engine (anything you write to it disappears) |
| EXAMPLE    | NO       | Example storage engine                                         |
| ARCHIVE    | YES      | Archive storage engine                                         |
| CSV        | YES      | CSV storage engine                                             |
| ndbcluster | DISABLED | Clustered, fault-tolerant, memory-based tables                 |
| FEDERATED  | YES      | Federated MySQL storage engine                                 |
| MRG_MYISAM | YES      | Collection of identical MyISAM tables                          |
| ISAM       | NO       | Obsolete storage engine                                        |
+------------+----------+----------------------------------------------------------------+
12 rows in set (0.00 sec)

Oh, so innodb has been disabled. We can fix that easily by removing
skip-innodb from my.cnf.

root@cmsdb01:/var/log# grep skip-innodb /etc/mysql/my.cnf
root@cmsdb01:/var/log#

But hang on a second, that’s not in the config file. What’s going on?
It turns out that the reason InnoDB is disabled is because of the
innodb_log_file_size setting not matching the files on disk.

root@cmsdb01:/var/log# grep innodb_log_file_size /etc/mysql/my.cnf
innodb_log_file_size            = 512M
root@cmsdb01:/var/log# ls -lh /var/lib/mysql/ib_logfile*
-rw-rw---- 1 mysql mysql 5.0M 2006-12-19 18:39 /var/lib/mysql/ib_logfile0
-rw-rw---- 1 mysql mysql 5.0M 2006-12-19 18:39 /var/lib/mysql/ib_logfile1

Rumour has it that you can just stop MySQL, delete these log files
and start MySQL again. I’m yet to try this as the server in question is
in production use. The alternative is to change the
innodb_log_file_size setting to match the file.

So in summary the problems with MySQL are:

Not logging warnings anywhere useful.
Converting engine types with a warning rather than throwing an
error. This can be fixed by setting sql_mode to include
NO_ENGINE_SUBSTITUTION.
Starting up and disabling InnoDB when there is a problem rather than
failing to start, giving a false impression that everything is
working.

MySQL has not impressed me this week.