dpkg has a very useful feature where if you delete a conffile (pretty
much everything under /etc and a few other files) it isn’t
replaced when you upgrade the package[0]. This behaviour was
confusing me for a while until I realised what was happening. I was
attempting to reinstall a package to get the default configuration
files back that had been accidentally deleted, but no matter what I
tried, the files didn’t exist after running dpkg. Once I
figured out that dpkg had this behaviour the solution was
simple; use the --force-confmiss command line argument.

root@quux:~# dpkg --force-confmiss -i /tmp/foo_2.0.0-build.14_all.deb
(Reading database ... 33418 files and directories currently installed.)
Preparing to replace foo 2.0.0-build.14 (using .../foo_2.0.0-build.14_all.deb) ...
Unpacking replacement foo ...
Setting up foo (2.0.0-build.14) ...

Configuration file `/etc/foo/foo.xml', does not exist on system.
Installing new config file as you request.
root@quux:~#
[0] If the file didn’t exist in
the previously installed version, it is installed, so you get new
configuration files.

If you’ve got various indenting and text wrapping options turned on in vim, pasting
text into the editor results in screwed up results. You can get around
this by turning on paste mode using :set paste and off with
:set nopaste. To make things a little easier, you can use the
following snippet in your .vimrc to allow you to toggle paste
on and off using a single keypress:

nmap <F4> :set invpaste paste?<CR>
imap <F4> <C-O>:set invpaste<CR>
set pastetoggle=<F4>

(Warning: my vim settings have organically grown over the last
10 years, so they may not be the best or modern way of achieving an
effect.)

If you’re trying to import a dump file created using
mysqldump and you get an error like:

ERROR 1005 (HY000): Can't create table './Database/Table.frm' (errno: 150)

Then you’ve just been bitten by mysqldump being far too stupid. The
problem occurs because mysqldump includes foreign key constraints in the
initial CREATE TABLE command, so if a table refers to a table
that doesn’t currently exist, it throws an error. mysqldump does
correctly disable the contraints when inserting data into the tables. The correct way for
this would be for mysqldump to create all the tables without the
constraints, use ALTER TABLE to add the constraints to the
tables, and then importing the data into the tables.

The workaround for this problem is to use:

SET FOREIGN_KEY_CHECKS = 0;
source dump.sql
SET FOREIGN_KEY_CHECKS = 1;

Update: Someone has pointed out that it appears that
mysql 5 has fixed
this problem
by including the above statements in the dump.

Firefox 2 is an improvement on previous versions, but one thing
annoys me is the new tab style. I don’t like having a close button on
each tab and I don’t like it hiding tabs after you have a certain number
open. Fortunately you can fix this. Go to about:config in the URL and
then set browser.tabs.closeButtons to 3 and browser.tabs.tabMinWidth to
0 and now you should have a close button on the right and all tabs
displayed.

MySQL cleverly maps

CREATE INDEX foo_bar ON Foo(Bar);

to

LOCK TABLE Foo WRITE;
CREATE TEMPORARY TABLE A-Foo ( .... INDEX foo_bar (Bar));
INSERT INTO A-Foo SELECT * FROM Foo;
ALTER TABLE Foo RENAME TO B-Foo;
ALTER TABLE A-Foo RENAME TO Foo;
DROP TABLE B-Foo;

If you have a very large table, expect this operation to take a) a
lot of disk space, b) a very very long time and c) block any writes to the table
in the process. I don’t recommend adding indexes or altering any very
large tables that are in production on MySQL, because you won’t be in
production for quite some time.

Update: Tom Haddon asked me if this applied to
recent versions of MySQL or to PostgreSQL. Looking at the docs, it
appears to still apply to 5.1:

  • http://dev.mysql.com/doc/refman/5.1/en/create-index.html
  • http://dev.mysql.com/doc/refman/5.1/en/alter-table.html

    In some cases, no temporary table is necessary:

    • If you use ALTER TABLE tbl_name RENAME TO new_tbl_name without any other
      options, MySQL simply renames any files that correspond to the table
      tbl_name. (You can also use the RENAME TABLE statement to rename tables. See
      Section 13.1.16, “RENAME TABLE Syntax”.)

    • ALTER TABLE … ADD PARTITION creates no temporary table except for MySQL
      Cluster. ADD or DROP operations for RANGE or LIST partitions are immediate
      operations or nearly so. ADD or COALESCE operations for HASH or KEY partitions
      copy data between changed partitions; unless LINEAR HASH/KEY was used, this is
      much the same as creating a new table (although the operation is done partition
      by partition). REORGANIZE operations copy only changed partitions and do not
      touch unchanged ones.

    If other cases, MySQL creates a temporary table, even if the data wouldn’t
    strictly need to be copied (such as when you change the name of a column).

  • http://dev.mysql.com/doc/refman/5.1/en/alter-table-problems.html

As far as PostgreSQL is concerned, it doesn’t mention anything about
doing the same thing, but does mention that it does a full sequential
scan of the table. During this time writes are blocked. You can use the
CONCURRENTLY keyword to allow writes to happen, but it does two scans
and will take longer, but you can still use your database.

http://www.postgresql.org/docs/8.2/interactive/sql-createindex.html

Does your Oracle client hang when connecting? Are you using Oracle
10.2.0.1? Do you get the
following if you strace the process?

gettimeofday({1129717666, 622797}, NULL) = 0
access("/etc/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("./network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("./network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
fcntl64(155815832, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
.
.
.

Has your client been up for more that 180 days? Well done; you’ve
just come across the same bug that has bitten two of our customers in
the last week. Back in the days of Oracle 8, there was a fairly imfamous
bug in the Oracle client where new connections would fail if the client had
been up for 248 days or more. This got fixed, and wasn’t a problem with
Oracle 9i at all. Now Oracle have managed to introduce a similar bug in
10.2.0.1, although in my experience the number of days appears to be
shorter (180+).

Thankfully, this has been fixed in the 10.2.0.2
Instant Client
. More information can be found on forums.oracle.com
and www.redhat.com.

Had an interesting problem today with one of our servers at work.
First thing I noticed was yesterday that an upgrade of Apache2 didn’t
complete properly because /etc/init.d/apache2 stop didn’t
return. Killing it and starting apache allowed the upgrade to
finish. I noticed there was a zombie process but didn’t think too much
off it.

Then this morning got an email from the MD saying that various
internal services websites were down (webmail, wiki etc). My manager
noticed that it was due to logrotate hanging, again because restarting
apache had hung. Looking at the server I noticed a few more zombie
processes. One thing I’d noticed was that all these processes had
reparented themselves under init and a quick web search later confirmed
that init(1) should be reaping these processes. I thought maybe
restarting init would clear the zombies. I tried running
telinit q to reread the config file, but that returned an error
message about timing out on /dev/initctl named pipe. I checked that file
existed and everything looked fine. The next thing I checked was the
other end of the named pipe by running lsof -p 1. This
showed that init had /dev/console rather than
/dev/initctl as fd 0. I tried running kill -SIGHUP
1
, but that didn’t do anything. Then I tried kill
-SIGUSR1 1
, but that didn’t do anything either. I checked the
console, but there wasn’t enough scrollback to see the system booting
and decided to schedule a reboot for this evening.

Rebooting the server presented me with an interesting challenge.
Normally the shutdown command signals to init to change
to runlevel 0 or 6 to shutdown or reboot using /dev/initctl. Of
course init wasn’t listening on this file, so that was out. Sending it
an SIGINT signal (the same signal init gets on ctrl-alt-delete) had no
response. Obviously telinit 0 wasn’t going to work
either. I decided to start shutting services down manually with the help
of Brett Parker. The idea
was to stop all non-essential services, unexporting nfs exports,
remounting disks read-only and then either using sysrq or a hardware
reset. Unfortunately someone accidentally ran /etc/init.d/halt
stop
, hanging the server, but he is suffering from a bad cold today so I forgive
him. The server restarted without a hitch (thank god for ext3) and
running lsof -p 1 showed init having
/dev/initctl open. I don’t know what happened to init the last
reboot on Monday, but a reboot seemed to fix it. Odd bug, but thankfully
it was a nice simple fix. I could have spent the evening debugged init.
🙂

Since upgrading to GNOME 2.14, I have been revisited by an annoying
problem with gnome-terminal. Gnome-terminal sets your character encoding
to being the same as your locale by default, which unfortunately was
being detected as ANSI_X3.4-1968, while I had my $LANG set to
en_GB.UTF-8 in my ~/.bash_profile. The reason it wasn’t being
detected was because nothing between logging in and starting
gnome-terminal looked at that file, so gnome-terminal thought the locale
was C.

The result was corrupt
display when programs attempted to display unicode characters. I could
fix it by changing the character encoding using the menu, but I’d have
to do this for every tab, which quickly becomes annoying. Time to find a
fix.

Turns out that you need to tell gdm to set the right locale, which
you can do by configuring ~/.dmrc. Mine now looks like:

[Desktop]
Session=gnome
Language=en_GB.UTF-8

Obviously, the important section is the Language line. You
need to set it to a locale that exists on your system, which you can
find using locale -a. Once
you’ve set that and logged in again, everything should be working
correctly.

So you read the documentation for IO::File and see:

open( FILENAME [,MODE [,PERMS]] )
open( FILENAME, IOLAYERS )

so you write:

my $rules = new IO::File('debian/rules','w', 0755);

and wonder why it hasn’t changed the permissions from 0666. Stracing
confirms it is opened 0666:

open("./debian/rules", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 4

A bit further reading of the documentation you discover:

If IO::File::open receives a Perl mode string (“>”, “+<“, etc.) or an
ANSI C fopen() mode string (“w”, “r+”, etc.), it uses the basic Perl
open operator (but protects any special characters).

If IO::File::open is given a numeric mode, it passes that mode and the
optional permissions value to the Perl sysopen operator. The permissions
default to 0666.

If IO::File::open is given a mode that includes the : character, it
passes all the three arguments to the three-argument open operator.

For convenience, IO::File exports the O_XXX constants from the Fcntl
module, if this module is available.

and the correct way to write this is

my $rules = new IO::File('debian/rules',O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0755)

Thank you very much perl for ignoring the permission parameter when
you feel like it.

Update: Steinar,
yes sorry, I did have 0755 rather than "0755"
originally, but changed it just to check that didn’t make a difference
and copied the wrong version. I’ve changed the post to have the right
thing.

% strace -eopen perl -MIO::File 
   -e 'my $rules = new IO::File("foo","w", 0755);' 2>&1 | grep foo
open("./foo", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
% strace -eopen perl -MIO::File 
   -e 'my $rules = new IO::File("foo","w", "0755");' 2>&1 | grep foo
open("./foo", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3

Incidentally, 0755 and "0755" are different:

 perl -e 'printf("%d %dn", 0755, "0755");'
493 755

About a year ago I had a problem with udev crashing during startup on
my powerpc box. Somehow I managed to muddle on with this problem,
probably by not rebooting the box. 🙂 Last summer I had to reoot it
again so I did a bit more research and discovered that udev was trying
to looking up the nvram group, not finding it in /etc/group and then
trying ldap, which, of course, failed because we have no networking yet.

Adding the group fixed the bug and filed a bug
against udev saying that udev should add any groups it used. Carrying
out further debugging revealed that the crash was during nss_wins. The
general order of events were:

  1. udev looks up a user or group.
  2. Group doesn’t exist in compat.
  3. Lookup in ldap.
  4. Ldap attempts to resolve the name of the ldap server or client.
    (server is 127.0.0.1 so confused about this point.)
  5. Network and/or dns server isn’t up so dns fails fails.
  6. Attempts to look up host in wins.
  7. udevstart crashes.

I didn’t have time to debug this any further and proceeded to forget
the problem, but last night my fileserver started having the same
problem. Removing ldap from passwd, group and shadow resolved the udev
problem, but then I didn’t have any users. Late last night I booted
without ldap and then changed nsswitch.conf to add ldap, and went to
bed.

This morning I had an epiphany in the shower. Not only did I remember
what the bug was, but also a sensible workaround. The problem wasn’t
with the passwd et al lines, but the hosts line. I did have

hosts: files dns mdns wins

The solution is to return if dns isn’t available and changed the line
to:

hosts: files dns [UNAVAIL=return] mdns wins

Now all I need to do is to debug nss_wins and get to the bottom of
the crash. It might be worth filing a bug against nss_ldap for trying to
do a lookup against an ip address.