After my entry yesterday about MySQL truncating data, several people
have pointed out that MySQL 4.1 or later gives you a warning. Yes, this is true. You
can even see it in the example I gave:

Query OK, 1 row affected, 1 warning (0.00 sec)

I ignored mentioning this, but perhaps should have said something
about it. I reason I didn’t mention it was because I didn’t feel that a
warning really helped anyone. Developers have enough problems
remembering to check for errors, let along remembering to check in case
there was a warning as well. Plus, they’d then have to work out if the
warning was something serious or something they could ignore. There’s
also the question of how well the language bindings present this
information. Take for example, PHP. The mysqli extension gained support
for checking for warnings in PHP5 and gives the following
code
as an
example of getting warnings:

mysqli_query($link, $query);

if (mysqli_warning_count($link)) {
   if ($result = mysqli_query($link, "SHOW WARNINGS")) {
      $row = mysqli_fetch_row($result);
      printf("%s (%d): %sn", $row[0], $row[1], $row[2]);
      mysqli_free_result($result);
   }
}

Hardly concise code. As of 5.1.0, there is also mysqli_get_warnings(),
but is undocumented beyond noting its existence. The MySQL extension
does not support getting warning information. The PDO wrapper doesn’t
provide any way to get this information.

In perl, DBD::mysql has a mysql_warning_count()
function, but presumably would have to call "SHOW WARNINGS"
like in the PHP example. Seems Python’s MYSQLdb module will raise an
exception on warnings in certain cases. Mostly using the Cursor
object.

In java, you can set the jdbcCompliantTruncation connection
parameter to make the driver throw java.sql.DataTruncation
exceptions, as per the JDBC spec, which makes you wonder why this isn’t
set by default. Unfortunately this setting is usually outside the
programmer’s control. There is also the
java.sql.Statement.getWarnings(), but once again, you need to
check this after every statement. Not sure if ORM tools like Hibernate
check this or not.

So, yes MySQL does give you a warning, but in practice is useless.

MySQL in its standard configuration has this wonderful “feature” of
truncating your data if it can’t fit in the field.

mysql> create table foo (bar varchar(4));
Query OK, 0 rows affected (0.00 sec)

mysql> insert into foo (bar) values ("12345");
Query OK, 1 row affected, 1 warning (0.00 sec)

In comparison, PostgeSQL does:

psql=> create table foo (bar varchar(4));
CREATE TABLE
psql=> insert into foo (bar) values ('12345');
ERROR:  value too long for type character varying(4)

You can make MySQL do the right thing by setting the SQL
Mode
option to
include STRICT_TRANS_TABLES or STRICT_ALL_TABLES. The difference is that the
former will only enable it for transactional data storage engines. As much as
I’m loathed to say it, I don’t recommend using STRICT_ALL_TABLES, as an error
during updating a non-transational table will result in a partial
update, which is probably worse than a truncated field. Setting the mode
to TRADITIONAL includes both these and a couple of related issues
(NO_ZERO_IN_DATE, NO_ZERO_DATE,
ERROR_FOR_DIVISION_BY_ZERO) You can set the
mode using:

  • On the command line:

    --sql-mode="TRADITIONAL"
  • In /etc/mysql/my.cnf:

    sql-mode="TRADITIONAL"
  • At runtime:

    SET GLOBAL sql_mode="TRADITIONAL"
    SET SESSION sql_mode="TRADITIONAL"

Just say no to databases that happily throw away your data

I should have mentioned that my previous
blog posting
was using MySQL
4.0(4.0.23_Debian-3ubuntu2-log). It seems that in 5.0.2 they changed the
precedence of the NOT operator to be lower than LIKE.
From the manual:

The precedence shown for NOT is as of MySQL 5.0.2. For
earlier versions, or from 5.0.2 on if the HIGH_NOT_PRECEDENCE SQL mode
is enabled, the precedence of NOT is the same as that of the !
operator.

Using 5.0 (5.0.22-Debian_0ubuntu6.06.8-log), and a slightly smaller
dataset, I get:

mysql> select count(*) from Table where blobid is null or not blobid like '%-%';
+----------+
| count(*) |
+----------+
|   199057 |
+----------+
1 row in set (3.26 sec)

mysql> select count(*) from Table where blobid is null or blobid not like '%-%';
+----------+
| count(*) |
+----------+
|   199057 |
+----------+
1 row in set (0.96 sec)

Jim Kingdon
experimented with other databases and was unable to reproduce this
problem. My test with PostgreSQL 8.3:

quux=> create table foo (blobid varchar(255));
CREATE TABLE
quux=> insert into foo (blobid) values
   ('5cd1237469cc4b52ca094e215156c582-9ef460ac4134c600a4d2382c4b0acee7'),
   (NULL),
   ('d20cb4037f8f9ab1de5de264660f005c-2c34209dcfb39251cf7c16bb6754bbd2'),
   ('845a8d06719d8bad521455a8dd47745c-095d9a0831433c92cd269e14e717b3a9'),
   ('9580ed23f34dd68d35da82f7b2a293d6-bf39df7509d977a1de767340536ebe80'),
   ('06c9521472cdac02a2d4b2a18f8bec0f-0a8a28d3b63df54860055f1d1de92969'),
   ('ed3cd0dd9b55f76db7544eeb64f3cfa0-80a6a3eb6d73c0a58f88b7c332866d5c'),
   (NULL),
   ('b339f6545651fbfa49fa500b7845c4ce-6defb5ffc188b8f72f1aa10bbd5c6bec'),
   ('642075963d6f69bb11c35a110dd07c2c8db54ac2d2accae7fa4a22db1d6caae9');
INSERT 0 10
quux=> select count(*) from foo
   where blobid is null or blobid not like '%-%';
 count
-------
     3
(1 row)

quux=> select count(*) from foo
   where blobid is null or not blobid like '%-%';
 count
-------
     3
(1 row)

quux=> select not blobid from foo limit 10;
ERROR:  argument of NOT must be type boolean, not type character varying

This appears to have been the case since at least 7.4

Problems like this is going to make the transition from MySQL 4.0 to 5.x all the more fun when we get around to doing it.

Dear lazyweb,

I’m possibly being stupid, but can someone explain the differences
between these two queries?

mysql> select count(*) from Table
   where blobid is null or not blobid like '%-%';
+----------+
| count(*) |
+----------+
| 15262487 |
+----------+
1 row in set (25 min 4.18 sec)

mysql> select count(*) from Table
   where blobid is null or blobid not like '%-%';
+----------+
| count(*) |
+----------+
| 20044216 |
+----------+
1 row in set (24 min 54.06 sec)

For reference:

mysql> select count(*) from Table where blobid is null;
+----------+
| count(*) |
+----------+
| 15262127 |
+----------+
1 row in set (24 min 7.15 sec)

Update: It turns out that the former was doing (not blobid) like '%-%' which turns out to not do anything sensible:

mysql> select not blobid from Table limit 10;
+------------+
| not blobid |
+------------+
|          0 |
|       NULL |
|          1 |
|          0 |
|          0 |
|          0 |
|          1 |
|       NULL |
|          1 |
|          0 |
+------------+
10 rows in set (0.02 sec)

mysql> select blobid from Table limit 10;
+-------------------------------------------------------------------+
| blobid                                                            |
+-------------------------------------------------------------------+
| 5cd1237469cc4b52ca094e215156c582-9ef460ac4134c600a4d2382c4b0acee7 |
| NULL                                                              |
| d20cb4037f8f9ab1de5de264660f005c-2c34209dcfb39251cf7c16bb6754bbd2 |
| 845a8d06719d8bad521455a8dd47745c-095d9a0831433c92cd269e14e717b3a9 |
| 9580ed23f34dd68d35da82f7b2a293d6-bf39df7509d977a1de767340536ebe80 |
| 06c9521472cdac02a2d4b2a18f8bec0f-0a8a28d3b63df54860055f1d1de92969 |
| ed3cd0dd9b55f76db7544eeb64f3cfa0-80a6a3eb6d73c0a58f88b7c332866d5c |
| NULL                                                              |
| b339f6545651fbfa49fa500b7845c4ce-6defb5ffc188b8f72f1aa10bbd5c6bec |
| 642075963d6f69bb11c35a110dd07c2c-8db54ac2d2accae7fa4a22db1d6caae9 |
+-------------------------------------------------------------------+
10 rows in set (0.00 sec)

The documentation
says Logical NOT. Evaluates to 1 if the operand is 0, to 0 if the
operand is non-zero, and NOT NULL returns NULL. but doesn’t
describe the behaviour of NOT 'string'. It would appear that a
string starting with a number returns 0 and a string starting with a
letter returns 1. Either way, neither has a hyphen in.

Regular viewers will know that I don’t think favourably of MySQL.
Here is yet another reason. Let’s create an InnoDB table:

mysql> CREATE TABLE `User_` (
mysql> ...
mysql> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Query OK, 0 rows affected, 1 warning (0.04 sec) 

One warning, but we’re running this as part of an import, so we’ll
fail to spot this and even if we did, we wouldn’t be able to get it back
out of mysql because SHOW WARNINGS only shows the last command.
So let’s look at the table we just created:

mysql> show create table User_G
*************************** 1. row ***************************
       Table: User_
Create Table: CREATE TABLE `User_` (
...
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Eh? what’s going on? We asked for InnoDB, but have got a MyISAM
table. Lets look at the engines available.

mysql> show engines;
+------------+----------+----------------------------------------------------------------+
| Engine     | Support  | Comment                                                        |
+------------+----------+----------------------------------------------------------------+
| MyISAM     | DEFAULT  | Default engine as of MySQL 3.23 with great performance         |
| MEMORY     | YES      | Hash based, stored in memory, useful for temporary tables      |
| InnoDB     | DISABLED | Supports transactions, row-level locking, and foreign keys     |
| BerkeleyDB | NO       | Supports transactions and page-level locking                   |
| BLACKHOLE  | NO       | /dev/null storage engine (anything you write to it disappears) |
| EXAMPLE    | NO       | Example storage engine                                         |
| ARCHIVE    | YES      | Archive storage engine                                         |
| CSV        | YES      | CSV storage engine                                             |
| ndbcluster | DISABLED | Clustered, fault-tolerant, memory-based tables                 |
| FEDERATED  | YES      | Federated MySQL storage engine                                 |
| MRG_MYISAM | YES      | Collection of identical MyISAM tables                          |
| ISAM       | NO       | Obsolete storage engine                                        |
+------------+----------+----------------------------------------------------------------+
12 rows in set (0.00 sec)

Oh, so innodb has been disabled. We can fix that easily by removing
skip-innodb from my.cnf.

root@cmsdb01:/var/log# grep skip-innodb /etc/mysql/my.cnf
root@cmsdb01:/var/log#

But hang on a second, that’s not in the config file. What’s going on?
It turns out that the reason InnoDB is disabled is because of the
innodb_log_file_size setting not matching the files on disk.

root@cmsdb01:/var/log# grep innodb_log_file_size /etc/mysql/my.cnf
innodb_log_file_size            = 512M
root@cmsdb01:/var/log# ls -lh /var/lib/mysql/ib_logfile*
-rw-rw---- 1 mysql mysql 5.0M 2006-12-19 18:39 /var/lib/mysql/ib_logfile0
-rw-rw---- 1 mysql mysql 5.0M 2006-12-19 18:39 /var/lib/mysql/ib_logfile1

Rumour has it that you can just stop MySQL, delete these log files
and start MySQL again. I’m yet to try this as the server in question is
in production use. The alternative is to change the
innodb_log_file_size setting to match the file.

So in summary the problems with MySQL are:

  • Not logging warnings anywhere useful.
  • Converting engine types with a warning rather than throwing an
    error. This can be fixed by setting sql_mode to include
    NO_ENGINE_SUBSTITUTION.
  • Starting up and disabling InnoDB when there is a problem rather than
    failing to start, giving a false impression that everything is
    working.

MySQL has not impressed me this week.

If you’re trying to import a dump file created using
mysqldump and you get an error like:

ERROR 1005 (HY000): Can't create table './Database/Table.frm' (errno: 150)

Then you’ve just been bitten by mysqldump being far too stupid. The
problem occurs because mysqldump includes foreign key constraints in the
initial CREATE TABLE command, so if a table refers to a table
that doesn’t currently exist, it throws an error. mysqldump does
correctly disable the contraints when inserting data into the tables. The correct way for
this would be for mysqldump to create all the tables without the
constraints, use ALTER TABLE to add the constraints to the
tables, and then importing the data into the tables.

The workaround for this problem is to use:

SET FOREIGN_KEY_CHECKS = 0;
source dump.sql
SET FOREIGN_KEY_CHECKS = 1;

Update: Someone has pointed out that it appears that
mysql 5 has fixed
this problem
by including the above statements in the dump.

MySQL cleverly maps

CREATE INDEX foo_bar ON Foo(Bar);

to

LOCK TABLE Foo WRITE;
CREATE TEMPORARY TABLE A-Foo ( .... INDEX foo_bar (Bar));
INSERT INTO A-Foo SELECT * FROM Foo;
ALTER TABLE Foo RENAME TO B-Foo;
ALTER TABLE A-Foo RENAME TO Foo;
DROP TABLE B-Foo;

If you have a very large table, expect this operation to take a) a
lot of disk space, b) a very very long time and c) block any writes to the table
in the process. I don’t recommend adding indexes or altering any very
large tables that are in production on MySQL, because you won’t be in
production for quite some time.

Update: Tom Haddon asked me if this applied to
recent versions of MySQL or to PostgreSQL. Looking at the docs, it
appears to still apply to 5.1:

  • http://dev.mysql.com/doc/refman/5.1/en/create-index.html
  • http://dev.mysql.com/doc/refman/5.1/en/alter-table.html

    In some cases, no temporary table is necessary:

    • If you use ALTER TABLE tbl_name RENAME TO new_tbl_name without any other
      options, MySQL simply renames any files that correspond to the table
      tbl_name. (You can also use the RENAME TABLE statement to rename tables. See
      Section 13.1.16, “RENAME TABLE Syntax”.)

    • ALTER TABLE … ADD PARTITION creates no temporary table except for MySQL
      Cluster. ADD or DROP operations for RANGE or LIST partitions are immediate
      operations or nearly so. ADD or COALESCE operations for HASH or KEY partitions
      copy data between changed partitions; unless LINEAR HASH/KEY was used, this is
      much the same as creating a new table (although the operation is done partition
      by partition). REORGANIZE operations copy only changed partitions and do not
      touch unchanged ones.

    If other cases, MySQL creates a temporary table, even if the data wouldn’t
    strictly need to be copied (such as when you change the name of a column).

  • http://dev.mysql.com/doc/refman/5.1/en/alter-table-problems.html

As far as PostgreSQL is concerned, it doesn’t mention anything about
doing the same thing, but does mention that it does a full sequential
scan of the table. During this time writes are blocked. You can use the
CONCURRENTLY keyword to allow writes to happen, but it does two scans
and will take longer, but you can still use your database.

http://www.postgresql.org/docs/8.2/interactive/sql-createindex.html

Today, I discovered SQL::Translator,
which seems to have some very interesting use cases. Basically, it is a
perl module for translating a database schema from one of a number of
formats and turning it into another format. Parsers include:

  • Live querying of DB2, MySQL, DBI-PostgreSQL, SQLite and Sybase databases
  • Access
  • Excel
  • SQL for DB2, MySQL, Oracle, PostgreSQL, SQLite and Sybase
  • Storable
  • XML
  • YAML

Output formats include:

  • Class::DBI
  • SQL for MySQL, Oracle, PostgreSQL, SQLServer, SQLite and Sybase
  • Storable, XML and YAML
  • POD, Diagram, GraphViz and HTML

Several things spring to mind with this:

  1. Defining your Schema in XML and using SQL::Translator to convert it
    into SQL for several databases and a set of classes for Class::DBI,
    which would make your application immediately target any of the
    supported databases.
  2. Documenting an existing database for which you’ve lost existing
    documentation by pointing it at a running database instance and
    outputting HTML page and, thanks to the Diagram output module, visual
    representation of the structure.
  3. Convert one database from product to another. Point it at a MySQL
    database and generate SQL for postgresql. If you generated some
    Class::DBI stuff you could possibly quickly write a script to copy data
    too.
  4. Using the sqlt-diff script, compare you current SQL to what is
    running on the database and generate a SQL script to upgrade the
    database structure using ALTER TABLE etc. Presumably you’d need
    to convert any data yourself, but is still a time saver for large
    databases.

I’m sure other people could think of some interesting uses for this.
Having looked at the Class::DBI stuff, I think it could do with some
improvements. I can’t see a way to set the class names, although I
haven’t spent that much time looking and it insists on having all the
classes in one file. Also the XML and YAML formats
generated are rather verbose and I haven’t looked to see how much I
could cut them down to use as the source definition. I suspect that I
can make it a lot shorter and rely on sensible defaults.

My initial reason for wanting to use SQL::Translator is that
Class::DBI::Pg has a large start up time and isn’t really suitable for
CGI use if you have a complex database. This might be mitigated by using
mod_perl, but in the mean time I was hoping I could speed up startup by
telling Class::DBI my column names, rather than it querying the
database. SQL::Translator should allow me to save duplicating the
database structure, whilst allowing me to support multiple backend
databases. If I get this working, I’ll write up a short HOWTO.