MySQL cleverly maps

CREATE INDEX foo_bar ON Foo(Bar);

to

LOCK TABLE Foo WRITE;
CREATE TEMPORARY TABLE A-Foo ( .... INDEX foo_bar (Bar));
INSERT INTO A-Foo SELECT * FROM Foo;
ALTER TABLE Foo RENAME TO B-Foo;
ALTER TABLE A-Foo RENAME TO Foo;
DROP TABLE B-Foo;

If you have a very large table, expect this operation to take a) a
lot of disk space, b) a very very long time and c) block any writes to the table
in the process. I don’t recommend adding indexes or altering any very
large tables that are in production on MySQL, because you won’t be in
production for quite some time.

Update: Tom Haddon asked me if this applied to
recent versions of MySQL or to PostgreSQL. Looking at the docs, it
appears to still apply to 5.1:

  • http://dev.mysql.com/doc/refman/5.1/en/create-index.html
  • http://dev.mysql.com/doc/refman/5.1/en/alter-table.html

    In some cases, no temporary table is necessary:

    • If you use ALTER TABLE tbl_name RENAME TO new_tbl_name without any other
      options, MySQL simply renames any files that correspond to the table
      tbl_name. (You can also use the RENAME TABLE statement to rename tables. See
      Section 13.1.16, “RENAME TABLE Syntax”.)

    • ALTER TABLE … ADD PARTITION creates no temporary table except for MySQL
      Cluster. ADD or DROP operations for RANGE or LIST partitions are immediate
      operations or nearly so. ADD or COALESCE operations for HASH or KEY partitions
      copy data between changed partitions; unless LINEAR HASH/KEY was used, this is
      much the same as creating a new table (although the operation is done partition
      by partition). REORGANIZE operations copy only changed partitions and do not
      touch unchanged ones.

    If other cases, MySQL creates a temporary table, even if the data wouldn’t
    strictly need to be copied (such as when you change the name of a column).

  • http://dev.mysql.com/doc/refman/5.1/en/alter-table-problems.html

As far as PostgreSQL is concerned, it doesn’t mention anything about
doing the same thing, but does mention that it does a full sequential
scan of the table. During this time writes are blocked. You can use the
CONCURRENTLY keyword to allow writes to happen, but it does two scans
and will take longer, but you can still use your database.

http://www.postgresql.org/docs/8.2/interactive/sql-createindex.html

I decided it was probably time that I started backing up my
PostgreSQL databases and ended up writing a quick script to do just
that. I did have a look at pg_dumpall, but it appears
to only write to stdout, meaning I couldn’t have a file per database
like I wanted.

#!/bin/bash

set -u
set -e

dumpdir=/var/backups/postgresql

date=$(date +"%Y-%m-%d")
mkdir -p $dumpdir
find $dumpdir -name "*.sql" -mtime +10 -exec rm {} ;
for database in $(su - postgres -c "psql -l -t" | cut -d'|' -f 1 | grep -v template0);
do
        su - postgres -c "pg_dump $database" > $dumpdir/$database-$date.sql;
done

su - postgres -c "pg_dumpall -g" > $dumpdir/global-$date.sql

Just stick that in /etc/cron.daily/postgresql-backups and
you should end up with 10 days worth of backups. I’m not sure if
postgreSQL can have spaces in database names. I’m assuming it can’t.
The only other issue to deal with is not storing it on the same machine
as the database server, let alone on the same physical hardware or even
worse, the same partition.

Update: Corrected typo and added dumping global
objects to the script.

Sorry to anyone that got spammed by my change to pyblosxom
earlier today. I changed the base url, which changed the link of the
entries. I should sort out having proper guids.

Just a quick one. If you’ve ever created a table using the number of
seconds since 1970 and realised, after populating it with data, that you
really need it in a TIMESTAMP type? If so, you can quickly convert it
using this SQL:

ALTER TABLE entries
   ALTER COLUMN created TYPE TIMESTAMP WITH TIME ZONE
      USING TIMESTAMP WITH TIME ZONE 'epoch' + created *interval '1 second';

With thanks to the PostgreSQL
manual
for saving me hours working out
how to do this.

Does your Oracle client hang when connecting? Are you using Oracle
10.2.0.1? Do you get the
following if you strace the process?

gettimeofday({1129717666, 622797}, NULL) = 0
access("/etc/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("./network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("./network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
fcntl64(155815832, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
.
.
.

Has your client been up for more that 180 days? Well done; you’ve
just come across the same bug that has bitten two of our customers in
the last week. Back in the days of Oracle 8, there was a fairly imfamous
bug in the Oracle client where new connections would fail if the client had
been up for 248 days or more. This got fixed, and wasn’t a problem with
Oracle 9i at all. Now Oracle have managed to introduce a similar bug in
10.2.0.1, although in my experience the number of days appears to be
shorter (180+).

Thankfully, this has been fixed in the 10.2.0.2
Instant Client
. More information can be found on forums.oracle.com
and www.redhat.com.