I should have mentioned that my previous
blog posting
was using MySQL
4.0(4.0.23_Debian-3ubuntu2-log). It seems that in 5.0.2 they changed the
precedence of the NOT operator to be lower than LIKE.
From the manual:

The precedence shown for NOT is as of MySQL 5.0.2. For
earlier versions, or from 5.0.2 on if the HIGH_NOT_PRECEDENCE SQL mode
is enabled, the precedence of NOT is the same as that of the !
operator.

Using 5.0 (5.0.22-Debian_0ubuntu6.06.8-log), and a slightly smaller
dataset, I get:

mysql> select count(*) from Table where blobid is null or not blobid like '%-%';
+----------+
| count(*) |
+----------+
|   199057 |
+----------+
1 row in set (3.26 sec)

mysql> select count(*) from Table where blobid is null or blobid not like '%-%';
+----------+
| count(*) |
+----------+
|   199057 |
+----------+
1 row in set (0.96 sec)

Jim Kingdon
experimented with other databases and was unable to reproduce this
problem. My test with PostgreSQL 8.3:

quux=> create table foo (blobid varchar(255));
CREATE TABLE
quux=> insert into foo (blobid) values
   ('5cd1237469cc4b52ca094e215156c582-9ef460ac4134c600a4d2382c4b0acee7'),
   (NULL),
   ('d20cb4037f8f9ab1de5de264660f005c-2c34209dcfb39251cf7c16bb6754bbd2'),
   ('845a8d06719d8bad521455a8dd47745c-095d9a0831433c92cd269e14e717b3a9'),
   ('9580ed23f34dd68d35da82f7b2a293d6-bf39df7509d977a1de767340536ebe80'),
   ('06c9521472cdac02a2d4b2a18f8bec0f-0a8a28d3b63df54860055f1d1de92969'),
   ('ed3cd0dd9b55f76db7544eeb64f3cfa0-80a6a3eb6d73c0a58f88b7c332866d5c'),
   (NULL),
   ('b339f6545651fbfa49fa500b7845c4ce-6defb5ffc188b8f72f1aa10bbd5c6bec'),
   ('642075963d6f69bb11c35a110dd07c2c8db54ac2d2accae7fa4a22db1d6caae9');
INSERT 0 10
quux=> select count(*) from foo
   where blobid is null or blobid not like '%-%';
 count
-------
     3
(1 row)

quux=> select count(*) from foo
   where blobid is null or not blobid like '%-%';
 count
-------
     3
(1 row)

quux=> select not blobid from foo limit 10;
ERROR:  argument of NOT must be type boolean, not type character varying

This appears to have been the case since at least 7.4

Problems like this is going to make the transition from MySQL 4.0 to 5.x all the more fun when we get around to doing it.

Dear lazyweb,

I’m possibly being stupid, but can someone explain the differences
between these two queries?

mysql> select count(*) from Table
   where blobid is null or not blobid like '%-%';
+----------+
| count(*) |
+----------+
| 15262487 |
+----------+
1 row in set (25 min 4.18 sec)

mysql> select count(*) from Table
   where blobid is null or blobid not like '%-%';
+----------+
| count(*) |
+----------+
| 20044216 |
+----------+
1 row in set (24 min 54.06 sec)

For reference:

mysql> select count(*) from Table where blobid is null;
+----------+
| count(*) |
+----------+
| 15262127 |
+----------+
1 row in set (24 min 7.15 sec)

Update: It turns out that the former was doing (not blobid) like '%-%' which turns out to not do anything sensible:

mysql> select not blobid from Table limit 10;
+------------+
| not blobid |
+------------+
|          0 |
|       NULL |
|          1 |
|          0 |
|          0 |
|          0 |
|          1 |
|       NULL |
|          1 |
|          0 |
+------------+
10 rows in set (0.02 sec)

mysql> select blobid from Table limit 10;
+-------------------------------------------------------------------+
| blobid                                                            |
+-------------------------------------------------------------------+
| 5cd1237469cc4b52ca094e215156c582-9ef460ac4134c600a4d2382c4b0acee7 |
| NULL                                                              |
| d20cb4037f8f9ab1de5de264660f005c-2c34209dcfb39251cf7c16bb6754bbd2 |
| 845a8d06719d8bad521455a8dd47745c-095d9a0831433c92cd269e14e717b3a9 |
| 9580ed23f34dd68d35da82f7b2a293d6-bf39df7509d977a1de767340536ebe80 |
| 06c9521472cdac02a2d4b2a18f8bec0f-0a8a28d3b63df54860055f1d1de92969 |
| ed3cd0dd9b55f76db7544eeb64f3cfa0-80a6a3eb6d73c0a58f88b7c332866d5c |
| NULL                                                              |
| b339f6545651fbfa49fa500b7845c4ce-6defb5ffc188b8f72f1aa10bbd5c6bec |
| 642075963d6f69bb11c35a110dd07c2c-8db54ac2d2accae7fa4a22db1d6caae9 |
+-------------------------------------------------------------------+
10 rows in set (0.00 sec)

The documentation
says Logical NOT. Evaluates to 1 if the operand is 0, to 0 if the
operand is non-zero, and NOT NULL returns NULL. but doesn’t
describe the behaviour of NOT 'string'. It would appear that a
string starting with a number returns 0 and a string starting with a
letter returns 1. Either way, neither has a hyphen in.

A while ago I published an article on PostgreSQL
user administration
. Typically, things have changed since I wrote
that article. I thought I’d detail a couple of the differences since
I wrote that guide.

The major difference is that you now have roles rather than users and
you use the CREATE ROLE command to create them instead of
CREATE USER, although the latter command still works. The
command line options for the createuser command have changed as
a result too. Before superuser and the ability to create new users were
the same thing. Now you can give a role permissions to create new roles
without giving them superuser powers. The options are now -s for
superuser and -S for not superuser, -d to allow them to create
databases and -D to disallow database creation and -r to allow the new
role to create other roles and -R to prevent them. for a standard user
you probably want somethig like:

createuser -S -D -R -P user

The -P makes createuser ask you for a password for
the new role.

You can find out more information about the new role system in
PostgreSQL in the user
management
and CREATE
ROLE
reference
sections of the manual.

Regular viewers will know that I don’t think favourably of MySQL.
Here is yet another reason. Let’s create an InnoDB table:

mysql> CREATE TABLE `User_` (
mysql> ...
mysql> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Query OK, 0 rows affected, 1 warning (0.04 sec) 

One warning, but we’re running this as part of an import, so we’ll
fail to spot this and even if we did, we wouldn’t be able to get it back
out of mysql because SHOW WARNINGS only shows the last command.
So let’s look at the table we just created:

mysql> show create table User_G
*************************** 1. row ***************************
       Table: User_
Create Table: CREATE TABLE `User_` (
...
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Eh? what’s going on? We asked for InnoDB, but have got a MyISAM
table. Lets look at the engines available.

mysql> show engines;
+------------+----------+----------------------------------------------------------------+
| Engine     | Support  | Comment                                                        |
+------------+----------+----------------------------------------------------------------+
| MyISAM     | DEFAULT  | Default engine as of MySQL 3.23 with great performance         |
| MEMORY     | YES      | Hash based, stored in memory, useful for temporary tables      |
| InnoDB     | DISABLED | Supports transactions, row-level locking, and foreign keys     |
| BerkeleyDB | NO       | Supports transactions and page-level locking                   |
| BLACKHOLE  | NO       | /dev/null storage engine (anything you write to it disappears) |
| EXAMPLE    | NO       | Example storage engine                                         |
| ARCHIVE    | YES      | Archive storage engine                                         |
| CSV        | YES      | CSV storage engine                                             |
| ndbcluster | DISABLED | Clustered, fault-tolerant, memory-based tables                 |
| FEDERATED  | YES      | Federated MySQL storage engine                                 |
| MRG_MYISAM | YES      | Collection of identical MyISAM tables                          |
| ISAM       | NO       | Obsolete storage engine                                        |
+------------+----------+----------------------------------------------------------------+
12 rows in set (0.00 sec)

Oh, so innodb has been disabled. We can fix that easily by removing
skip-innodb from my.cnf.

root@cmsdb01:/var/log# grep skip-innodb /etc/mysql/my.cnf
root@cmsdb01:/var/log#

But hang on a second, that’s not in the config file. What’s going on?
It turns out that the reason InnoDB is disabled is because of the
innodb_log_file_size setting not matching the files on disk.

root@cmsdb01:/var/log# grep innodb_log_file_size /etc/mysql/my.cnf
innodb_log_file_size            = 512M
root@cmsdb01:/var/log# ls -lh /var/lib/mysql/ib_logfile*
-rw-rw---- 1 mysql mysql 5.0M 2006-12-19 18:39 /var/lib/mysql/ib_logfile0
-rw-rw---- 1 mysql mysql 5.0M 2006-12-19 18:39 /var/lib/mysql/ib_logfile1

Rumour has it that you can just stop MySQL, delete these log files
and start MySQL again. I’m yet to try this as the server in question is
in production use. The alternative is to change the
innodb_log_file_size setting to match the file.

So in summary the problems with MySQL are:

  • Not logging warnings anywhere useful.
  • Converting engine types with a warning rather than throwing an
    error. This can be fixed by setting sql_mode to include
    NO_ENGINE_SUBSTITUTION.
  • Starting up and disabling InnoDB when there is a problem rather than
    failing to start, giving a false impression that everything is
    working.

MySQL has not impressed me this week.

If you’re trying to import a dump file created using
mysqldump and you get an error like:

ERROR 1005 (HY000): Can't create table './Database/Table.frm' (errno: 150)

Then you’ve just been bitten by mysqldump being far too stupid. The
problem occurs because mysqldump includes foreign key constraints in the
initial CREATE TABLE command, so if a table refers to a table
that doesn’t currently exist, it throws an error. mysqldump does
correctly disable the contraints when inserting data into the tables. The correct way for
this would be for mysqldump to create all the tables without the
constraints, use ALTER TABLE to add the constraints to the
tables, and then importing the data into the tables.

The workaround for this problem is to use:

SET FOREIGN_KEY_CHECKS = 0;
source dump.sql
SET FOREIGN_KEY_CHECKS = 1;

Update: Someone has pointed out that it appears that
mysql 5 has fixed
this problem
by including the above statements in the dump.

MySQL cleverly maps

CREATE INDEX foo_bar ON Foo(Bar);

to

LOCK TABLE Foo WRITE;
CREATE TEMPORARY TABLE A-Foo ( .... INDEX foo_bar (Bar));
INSERT INTO A-Foo SELECT * FROM Foo;
ALTER TABLE Foo RENAME TO B-Foo;
ALTER TABLE A-Foo RENAME TO Foo;
DROP TABLE B-Foo;

If you have a very large table, expect this operation to take a) a
lot of disk space, b) a very very long time and c) block any writes to the table
in the process. I don’t recommend adding indexes or altering any very
large tables that are in production on MySQL, because you won’t be in
production for quite some time.

Update: Tom Haddon asked me if this applied to
recent versions of MySQL or to PostgreSQL. Looking at the docs, it
appears to still apply to 5.1:

  • http://dev.mysql.com/doc/refman/5.1/en/create-index.html
  • http://dev.mysql.com/doc/refman/5.1/en/alter-table.html

    In some cases, no temporary table is necessary:

    • If you use ALTER TABLE tbl_name RENAME TO new_tbl_name without any other
      options, MySQL simply renames any files that correspond to the table
      tbl_name. (You can also use the RENAME TABLE statement to rename tables. See
      Section 13.1.16, “RENAME TABLE Syntax”.)

    • ALTER TABLE … ADD PARTITION creates no temporary table except for MySQL
      Cluster. ADD or DROP operations for RANGE or LIST partitions are immediate
      operations or nearly so. ADD or COALESCE operations for HASH or KEY partitions
      copy data between changed partitions; unless LINEAR HASH/KEY was used, this is
      much the same as creating a new table (although the operation is done partition
      by partition). REORGANIZE operations copy only changed partitions and do not
      touch unchanged ones.

    If other cases, MySQL creates a temporary table, even if the data wouldn’t
    strictly need to be copied (such as when you change the name of a column).

  • http://dev.mysql.com/doc/refman/5.1/en/alter-table-problems.html

As far as PostgreSQL is concerned, it doesn’t mention anything about
doing the same thing, but does mention that it does a full sequential
scan of the table. During this time writes are blocked. You can use the
CONCURRENTLY keyword to allow writes to happen, but it does two scans
and will take longer, but you can still use your database.

http://www.postgresql.org/docs/8.2/interactive/sql-createindex.html

I decided it was probably time that I started backing up my
PostgreSQL databases and ended up writing a quick script to do just
that. I did have a look at pg_dumpall, but it appears
to only write to stdout, meaning I couldn’t have a file per database
like I wanted.

#!/bin/bash

set -u
set -e

dumpdir=/var/backups/postgresql

date=$(date +"%Y-%m-%d")
mkdir -p $dumpdir
find $dumpdir -name "*.sql" -mtime +10 -exec rm {} ;
for database in $(su - postgres -c "psql -l -t" | cut -d'|' -f 1 | grep -v template0);
do
        su - postgres -c "pg_dump $database" > $dumpdir/$database-$date.sql;
done

su - postgres -c "pg_dumpall -g" > $dumpdir/global-$date.sql

Just stick that in /etc/cron.daily/postgresql-backups and
you should end up with 10 days worth of backups. I’m not sure if
postgreSQL can have spaces in database names. I’m assuming it can’t.
The only other issue to deal with is not storing it on the same machine
as the database server, let alone on the same physical hardware or even
worse, the same partition.

Update: Corrected typo and added dumping global
objects to the script.

Sorry to anyone that got spammed by my change to pyblosxom
earlier today. I changed the base url, which changed the link of the
entries. I should sort out having proper guids.

Just a quick one. If you’ve ever created a table using the number of
seconds since 1970 and realised, after populating it with data, that you
really need it in a TIMESTAMP type? If so, you can quickly convert it
using this SQL:

ALTER TABLE entries
   ALTER COLUMN created TYPE TIMESTAMP WITH TIME ZONE
      USING TIMESTAMP WITH TIME ZONE 'epoch' + created *interval '1 second';

With thanks to the PostgreSQL
manual
for saving me hours working out
how to do this.

Over the weekend, I managed to revive my main desktop machine, which
as spent the last 15 months turned off under a desk because it was
showing some odd behaviour and didn’t have time to fix it. I’ve upgraded
it to the latest sid and given it to my girlfriend to use instead of the
Windows machine she had been using. She appears to have fallen in love
with tuxpaint. 🙂

In the process of setting it up I discovered printconf, which
automatically sets up parallel and USB printers under cups. Plugged in
my printer, went to print in firefox and there was the printer. These
things just get easier and easier. Gone are the days when you spent
hours writing a printcap entry for your printer. One thing I would like
is for DBus support in CUPS so I know when the print job has finished.

Just finished writing an article on PostgreSQL
user administration
. Go read it.