One really nice feature of maven is the dependency resolution stuff
that it does. The dependency plugin also has an analyse goal that can
detect a number of problems with your dependencies. It can detect
libraries you use but haven’t declared in your POM, but work through
transitive dependencies. This can cause build problems when you remove
the library that was dragging in the undeclared dependency. It can also
work out which dependencies you are no longer using, but have a declared
dependency.

mojo-jojo david% mvn dependency:analyze
[INFO] Scanning for projects...
...
[INFO] [dependency:analyze]
[WARNING] Used undeclared dependencies found:
[WARNING]    commons-collections:commons-collections:jar:3.2:compile
[WARNING]    commons-validator:commons-validator:jar:1.3.1:compile
[WARNING]    org.apache.myfaces.core:myfaces-api:jar:1.2.6:compile
[WARNING] Unused declared dependencies found:
[WARNING]    javax.faces:jsf-api:jar:1.2_02:compile
...

How not to configure your DNS

david% dig -x 190.208.19.230

; <<>> DiG 9.4.2-P2 <<>> -x 190.208.19.230
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35398
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;230.19.208.190.in-addr.arpa.   IN      PTR

;; ANSWER SECTION:
230.19.208.190.in-addr.arpa. 3600 IN    PTR     190.208.19.230.

;; Query time: 253 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Apr 10 10:00:21 2009
;; MSG SIZE  rcvd: 73

Whoops

On off the biggest problems with developing servlets under a
container like Tomcat is the amount of time taken to build your code,
deploy it to the container and restart it to pick up any changes. Maven
and the Jetty plugin allow you to cut down on this cycle considerably.
The first step is to allow you to start your application in maven by
running:

mvn jetty:run

We do this by configuring the jetty plugin inside our
pom.xml:

<plugin>
   <groupId>org.mortbay.jetty</groupId>
   <artifactId>maven-jetty-plugin</artifactId>
   <version>6.1.10</version>
</plugin>

Now when you run mvn jetty:run your application will start
up. But we can improve on this. The Jetty plugin can be configured to
scan your project every so often and rebuild it and reload it if
anything changes. We do this by changing our pom.xml to read:

<plugin>
   <groupId>org.mortbay.jetty</groupId>
   <artifactId>maven-jetty-plugin</artifactId>
   <version>6.1.10</version>
   <configuration>
      <scanIntervalSeconds>10</scanIntervalSeconds>
   </configuration>
</plugin>

Now when you save a file in your IDE, by the time you’ve switched to
your web browser, Jetty is already running your updated code. Your
development cycle is almost up to the same speed as Perl or PHP.

You can find more information at the plugin page.

After my entry yesterday about MySQL truncating data, several people
have pointed out that MySQL 4.1 or later gives you a warning. Yes, this is true. You
can even see it in the example I gave:

Query OK, 1 row affected, 1 warning (0.00 sec)

I ignored mentioning this, but perhaps should have said something
about it. I reason I didn’t mention it was because I didn’t feel that a
warning really helped anyone. Developers have enough problems
remembering to check for errors, let along remembering to check in case
there was a warning as well. Plus, they’d then have to work out if the
warning was something serious or something they could ignore. There’s
also the question of how well the language bindings present this
information. Take for example, PHP. The mysqli extension gained support
for checking for warnings in PHP5 and gives the following
code
as an
example of getting warnings:

mysqli_query($link, $query);

if (mysqli_warning_count($link)) {
   if ($result = mysqli_query($link, "SHOW WARNINGS")) {
      $row = mysqli_fetch_row($result);
      printf("%s (%d): %sn", $row[0], $row[1], $row[2]);
      mysqli_free_result($result);
   }
}

Hardly concise code. As of 5.1.0, there is also mysqli_get_warnings(),
but is undocumented beyond noting its existence. The MySQL extension
does not support getting warning information. The PDO wrapper doesn’t
provide any way to get this information.

In perl, DBD::mysql has a mysql_warning_count()
function, but presumably would have to call "SHOW WARNINGS"
like in the PHP example. Seems Python’s MYSQLdb module will raise an
exception on warnings in certain cases. Mostly using the Cursor
object.

In java, you can set the jdbcCompliantTruncation connection
parameter to make the driver throw java.sql.DataTruncation
exceptions, as per the JDBC spec, which makes you wonder why this isn’t
set by default. Unfortunately this setting is usually outside the
programmer’s control. There is also the
java.sql.Statement.getWarnings(), but once again, you need to
check this after every statement. Not sure if ORM tools like Hibernate
check this or not.

So, yes MySQL does give you a warning, but in practice is useless.

MySQL in its standard configuration has this wonderful “feature” of
truncating your data if it can’t fit in the field.

mysql> create table foo (bar varchar(4));
Query OK, 0 rows affected (0.00 sec)

mysql> insert into foo (bar) values ("12345");
Query OK, 1 row affected, 1 warning (0.00 sec)

In comparison, PostgeSQL does:

psql=> create table foo (bar varchar(4));
CREATE TABLE
psql=> insert into foo (bar) values ('12345');
ERROR:  value too long for type character varying(4)

You can make MySQL do the right thing by setting the SQL
Mode
option to
include STRICT_TRANS_TABLES or STRICT_ALL_TABLES. The difference is that the
former will only enable it for transactional data storage engines. As much as
I’m loathed to say it, I don’t recommend using STRICT_ALL_TABLES, as an error
during updating a non-transational table will result in a partial
update, which is probably worse than a truncated field. Setting the mode
to TRADITIONAL includes both these and a couple of related issues
(NO_ZERO_IN_DATE, NO_ZERO_DATE,
ERROR_FOR_DIVISION_BY_ZERO) You can set the
mode using:

  • On the command line:

    --sql-mode="TRADITIONAL"
  • In /etc/mysql/my.cnf:

    sql-mode="TRADITIONAL"
  • At runtime:

    SET GLOBAL sql_mode="TRADITIONAL"
    SET SESSION sql_mode="TRADITIONAL"

Just say no to databases that happily throw away your data

Was attempting to merge a branch in one of my projects and upon
committing the merge, I kept getting this error:

mojo-jojo david% svn commit -m "merge in the maven branch"
Sending        trunk
Sending        trunk/.classpath
Sending        trunk/.project
Adding         trunk/.settings
svn: Commit failed (details follow):
svn: Server sent unexpected return value (502 Bad Gateway) in response
to COPY request for '/svn/eddie/!svn/bc/314/branches/maven/.settings'

A quick search found several other people having the same problem.
Seems it only happens for https repositories using mod_dav_svn.
The solution is to make sure that your virtual host in apache has
explicit SSL config options, even if you are using an SSL config from a
default virtual host. For example, I added the following to my
subversion vhost, which was just copied from my default vhost:

SSLEngine on
SSLCertificateFile /etc/apache2/ssl/catnip.org.uk.crt
SSLCertificateKeyFile /etc/apache2/ssl/catnip.org.uk.key

I keep writing code to talk to databases in perl and I’m forever
forgetting the correct runes for talking to databases, so I thought I’d
stick it here for easy reference.

use DBI;

my $db_driver = "Pg" # Pg or mysql (or others)
my $db_name = "database";
my $db_host = "localhost";
my $db_user = "username";
my $db_pass = "password";


my $dbh = DBI->connect("dbi:$db_driver:dbname=$db_name;host=$db_host",
   $db_user, $db_pass);

It’s probably handy to give an example of a common database read
operation

my $sth = $dbh->prepare( "SELECT * FROM table WHERE id  = ?")
      or die $dbh->errstr;

$sth->execute($id) or die $dbh->errstr;

while (my $hashref = $sth->fetchrow_hashref) {
   print $hashref->{id};
}

I’ve just set up syntax highlighting for Puppet manifest files,
and thought I’d share the simple steps. The first thing to do is
download the syntax file from http://www.reductivelabs.com/downloads/puppet/puppet.vim
and save this to ~/.vim/syntax/puppet.vim. Now when the
filetype is set to “puppet”, vim will use this syntax file.

That’s useful, it it would be even nicer if we could make vim know
that files ending in .pp were puppet files. Turns out this is
very easy to do. You need to create a file to detect the correct
filetype when you open a file. You need to put the following lines in
~/.vim/ftdetect/puppet.vim:

au BufRead,BufNewFile *.pp   setfiletype puppet

Now when you load a file ending in .pp, you should get nice syntax
highlighting. You can also make vim use special settings for the puppet
filetype by creating a vim script file in one of
~/.vim/ftplugin/puppet.vim, ~/.vim/ftplugin/puppet_*.vim and/or
~/.vim/ftplugin/puppet/*.vim. Vim has a lot of flexible hooks
to enable file type specific configuration; hopefully it should be
fairly easy to modify these examples for other file formats.

I’ve recently had to set up a new machine, but didn’t have an install
cdrom available, so I decided to use the easiest method for installing
Ubuntu; PXE booting. Here’s how I did it. PXE involves setting up two
simple technologies, DHCP and TFTP. We start by setting up TFTP.

TFTP is Trivial
File Transfer Protocol
, a cut down version of FTP. There are a
number of TFTP servers in Debian and Ubuntu, but not all of them support
the extensions that the pxelinux bootloader used by debian-installer
need. Experience has shown that tftpd-hpa works correctly, so we’ll want
to install that.

ace root% apt-get install tftpd-hpa

Note: If this installs an inetd at the same time, you may need to
restart the inetd so it enables the tftpd service.

The tftpd will serve files out of /var/lib/tftpboot, so we
need to add some files for it to serve. You can use this script to fetch
various netboot installers from Ubuntu’s servers.

#!/bin/bash

set -u
set -e

cd /var/lib/tftpboot

for dist in dapper feisty gutsy hardy intrepid; do
    mkdir -p $dist
    for arch in amd64 i386; do
        mkdir -p $dist/$arch/
        (cd $dist/$arch/ && ncftpget -RT 
           ftp://archive.ubuntu.com/ubuntu/dists/$dist/main/installer-$arch/current/images/netboot/)
    done
done

Download ubuntu-tftp-update.sh

Now we need to alter our dhcpd configuration. (You are using DHCP
aren’t you?) All we need to add is a group declaration to your subnet
declaration, adding a next-server and a filename
parameter. You can then add a host declaration for any machine you want
to netboot into the installer.

group { # intrepid amd64
     next-server 10.0.0.1;
     filename "intrepid/amd64/pxelinux.0";
     host foobar { hardware ethernet 00:22:15:45:cc:fa; fixed-address foobar.example.com; }
}

You’ll need to restart the dhcp server so it picks up the new
setting. The next-server parameter is the name or IP address of your
tftp server. filename is the path to the bootloader. Obviously,
you can use this to pick which version of the installer you want to
run. If you do a lot of installations, it might be worth configuring
every installer you’re likely to use and then move hosts in and out of
the suitable group as and when you need to install them.

All that’s left to do now is to boot the computer and set it to boot
from the network and enjoy medialess installation.

Dear Lazyweb,

We have a web application that has quite a large database and
reasonable usage. Back in the dim and distant past, we scaled the
application by the age-old method of using several read-only slave
databases to prevent reads on the master swamping writes. This worked
well for several years, and then we introduced memcached into the mix to
improve performance by reducing the number of reads from the database.
This improved our database capacity even further.

Now the question has
arisen about reducing or even removing the code to read from the slaves.
I’m trying to come up with some compelling reasons to keep the
application reading from the slaves. The pros and cons I currently have
for removing the code are:

Pros
  • Reduces code complexity
  • Removes consistency problems due to latency in the replication. This is less of a
    problem than it used to be after we solved a problem with our
    replication
Cons
  • Reduces our existing capacity
  • Cache flushes would cause huge spikes on our master server until the
    cache filled up again
  • Caches wouldn’t help queries with unique critera

I would appreciate any additional reasons, pro or cons.
We already have an existing non-live slave for backups and slow
queries by developers. We would retain a slave for redundancy in the
case of master failure. I’m only looking for issues that would affect
the application.