Xerces is an XML library for several languages, but if a very common library in Java. 

I recently came across a problem with code intermittently throwing a NullPointerException inside the library:

[sourcecode lang=”text”]java.lang.NullPointerException
at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
at org.apache.xerces.dom.ParentNode.item(Unknown Source)
at com.example.xml.Element.getChildren(Element.java:377)
at com.example.xml.Element.newChildElementHelper(Element.java:229)
at com.example.xml.Element.newChildElement(Element.java:180)

[/sourcecode]You may also find the NullPointerException in ParentNode.nodeListGetLength() and other locations in ParentNode.

Debugging this was not helped by the fact that the xercesImpl.jar is stripped of line numbers, so I couldn’t find the exact issue. After some searching, it appeared that the issue was down to the fact that Xerces is not thread-safe. ParentNode caches iterations through the NodeList of children to speed up performance and stores them in the Node’s Document object. In multi-threaded applications, this can lead to race conditions and NullPointerExceptions.  And because it’s a threading issue, the problem is intermittent and hard to track down.

The solution is to synchronise your code on the DOM, and this means the Document object, everywhere you access the nodes. I’m not certain exactly which methods need to be protected, but I believe it needs to be at least any function that will iterate a NodeList. I would start by protecting every access and testing performance, and removing some if needed.

[sourcecode lang=”java”]/**
* Returns the concatenation of all the text in all child nodes
* of the current element.
*/
public String getText() {
StringBuilder result = new StringBuilder();

synchronized ( m_element.getOwnerDocument()) {
NodeList nl = m_element.getChildNodes();
for (int i = 0; i < nl.getLength(); i++) {
Node n = nl.item(i);

if (n != null && n.getNodeType() == org.w3c.dom.Node.TEXT_NODE) {
result.append(((CharacterData) n).getData());
}
}
}

return result.toString();
}[/sourcecode]Notice the “synchronized ( m_element.getOwnerDocument()) {}” block around the section that deals with the DOM. The NPE would normally be thrown on the nl.getLength() or nl.item() calls.

Since putting in the synchronized blocks, we’ve gone from having 78 NPEs between 2:30am and 3:00am, to having zero in the last 12 hours, so I think it’s safe to say, this has drastically reduced the problem. 

Because I couldn’t find the information anywhere else, if you want to
use maven with Grails 1.2 snapshot, use:

mvn org.apache.maven.plugins:maven-archetype-plugin:2.0-alpha-4:generate
-DarchetypeGroupId=org.grails
-DarchetypeArtifactId=grails-maven-archetype
-DarchetypeVersion=1.2-SNAPSHOT     -DgroupId=uk.org.catnip
-DartifactId=armstrong
-DarchetypeRepository=http://snapshots.maven.codehaus.org/maven2

One really nice feature of maven is the dependency resolution stuff
that it does. The dependency plugin also has an analyse goal that can
detect a number of problems with your dependencies. It can detect
libraries you use but haven’t declared in your POM, but work through
transitive dependencies. This can cause build problems when you remove
the library that was dragging in the undeclared dependency. It can also
work out which dependencies you are no longer using, but have a declared
dependency.

mojo-jojo david% mvn dependency:analyze
[INFO] Scanning for projects...
...
[INFO] [dependency:analyze]
[WARNING] Used undeclared dependencies found:
[WARNING]    commons-collections:commons-collections:jar:3.2:compile
[WARNING]    commons-validator:commons-validator:jar:1.3.1:compile
[WARNING]    org.apache.myfaces.core:myfaces-api:jar:1.2.6:compile
[WARNING] Unused declared dependencies found:
[WARNING]    javax.faces:jsf-api:jar:1.2_02:compile
...

On off the biggest problems with developing servlets under a
container like Tomcat is the amount of time taken to build your code,
deploy it to the container and restart it to pick up any changes. Maven
and the Jetty plugin allow you to cut down on this cycle considerably.
The first step is to allow you to start your application in maven by
running:

mvn jetty:run

We do this by configuring the jetty plugin inside our
pom.xml:

<plugin>
   <groupId>org.mortbay.jetty</groupId>
   <artifactId>maven-jetty-plugin</artifactId>
   <version>6.1.10</version>
</plugin>

Now when you run mvn jetty:run your application will start
up. But we can improve on this. The Jetty plugin can be configured to
scan your project every so often and rebuild it and reload it if
anything changes. We do this by changing our pom.xml to read:

<plugin>
   <groupId>org.mortbay.jetty</groupId>
   <artifactId>maven-jetty-plugin</artifactId>
   <version>6.1.10</version>
   <configuration>
      <scanIntervalSeconds>10</scanIntervalSeconds>
   </configuration>
</plugin>

Now when you save a file in your IDE, by the time you’ve switched to
your web browser, Jetty is already running your updated code. Your
development cycle is almost up to the same speed as Perl or PHP.

You can find more information at the plugin page.

I keep writing code to talk to databases in perl and I’m forever
forgetting the correct runes for talking to databases, so I thought I’d
stick it here for easy reference.

use DBI;

my $db_driver = "Pg" # Pg or mysql (or others)
my $db_name = "database";
my $db_host = "localhost";
my $db_user = "username";
my $db_pass = "password";


my $dbh = DBI->connect("dbi:$db_driver:dbname=$db_name;host=$db_host",
   $db_user, $db_pass);

It’s probably handy to give an example of a common database read
operation

my $sth = $dbh->prepare( "SELECT * FROM table WHERE id  = ?")
      or die $dbh->errstr;

$sth->execute($id) or die $dbh->errstr;

while (my $hashref = $sth->fetchrow_hashref) {
   print $hashref->{id};
}

Perl doesn’t enforce access to modules’ namespaces. This would
usually be considered a bad thing, but sometimes it allows us to work
around problems in modules without changing their code. Here’s a perfect
example:

I’ve been writing a script to talk to an XML-RPC endpoint, using
Frontier::Client but for
one of the requests, the script throws the following error:

wanted a data type, got `ex:i8'

Turning on debugging showed the response type was indeed ex:i8, which
isn’t one of the types that Frontier::Client supports.

<?xml version="1.0" encoding="UTF-8"?>
<methodResponse xmlns:ex="http://ws.apache.org/xmlrpc/namespaces/extensions">
  <params>
    <param>
      <value>
        <ex:i8>161</ex:i8>
      </value>
    </param>
  </params>
</methodResponse>

Searching through the code shows Frontier::Client is a wrapper around
Frontier::RPC2 and the error message happens at the following
section:

   } elsif ($scalars{$tag}) {
       $expat->{'rpc_text'} = "";
       push @{ $expat->{'rpc_state'} }, 'cdata';
   } else {
       Frontier::RPC2::die($expat, "wanted a data type, got `$tag'n");
   }

So we can see that it’s looking up the tag into a hash called
%scalars to see if the type is a scalar type, otherwise throws
the error we saw. Looking at the top, we can see this hash:

%scalars = (
    'base64' => 1,
    'boolean' => 1,
    'dateTime.iso8601' => 1,
    'double' => 1,
    'int' => 1,
    'i4' => 1,
    'string' => 1,
);

So, if we could add ex:i8 to this scalar, we could fix the
problem. We could fix the module, but that would require every user of
the script to patch their copy of the module. The alternative is to
inject something into that hash across module boundaries, which we can
do by just refering to the hash by it’s complete name including the
package name. We can use:

$Frontier::RPC2::scalars{'ex:i8'} = 1;

Now when we run the script, everything works. It’s not nice and it’s
dependent on Frontier::RPC2 not changing. but it allows us to get on
with our script.

Archive::Tar
is a useful library for working with tar archives from Perl.
Unfortunately, one thing it doesn’t allow is using data from memory as
the archive. From the TODO section:

Allow archives to be passed in as string

Currently, we only allow opened filehandles or filenames, but not
strings. The internals would need some reworking to facilitate
stringified archives.

Fortunately, it does allow you to use a filehandle. I’ve previously
mentioned
about how useful the IO::Handle subsystem in perl is. And we
should be able to use it in this case. The module we’ll want is
IO::String, which is a IO::Handle over a perl scalar. We can use it:

my $tar = new Archive::Tar(new IO::String($data));

Unfortunately when we run this now we get:

Cannot read compressed format in tar-mode at Foo.pm line 41
No data could be read from file at Foo.pm line 41

It turns out that this is because Archive::Tar uses IO::Zlib
internally if the file isn’t uncompressed, but this doesn’t provide the
ability to uncompress from a filehandle. The answer is to uncompress the
data before passing it to Archive::Tar and the easiest way to do this is
to use the IO::Uncompress::Gunzip module, so we can rewrite our code
to:

my $tar = new Archive::Tar(new IO::Uncompress::Gunzip(new IO::String($data)));

Now when you run the script, Archive::Tar has an uncompressed tar
stream. Yet another situation where IO::Handles comes to the rescue.

Because I always forget when I need to create a new class in
perl:

package Foo::Bar;

use strict;
use warnings;

sub new {
   my $this = shift;
   my $class = ref($this) || $this;
   my $self = {};
   bless $self, $class;
   $self->initialize(@_);
   return $self;
}

sub initialize {
   my $self = shift;
}

1;

If you have any useful additions I’d love to know.

In my
article
on Perl’s IO::Handle objects I talked briefly about IO::AtomicFile
and IO::Digest. I’ve just had reason to use these very useful modules to
create a script which edits a file in place. These modules allowed me to
do the rewrite atomically and optionally make a backup if the contents
have changed. The example assumes you have a function called
perform_rewrite that takes two file handles as the first two
parameters.

use File::Copy;
use IO::File;
use IO::AtomicFile;
use IO::Digest;

sub rewrite_file {
   my $file = shift;
   my $sub = shift;
   my $input = new IO::File($file,'r');
   my $input_md5 = new IO::Digest($input, 'MD5');
   my $output = new IO::AtomicFile($file,'w');
   my $output_md5 = new IO::Digest($output, 'MD5');

   $sub->($input, $output, @_);

   if ($input_md5->hexdigest ne $output_md5->hexdigest) {
           copy ("$file", "$file.bak");
           $output->close();
   } else {
           # we haven't changed so don't bother updating
           $output->delete();
   }
   $input->close();
}

rewrite_file("/foo/bar", &perform_rewrite, $baz, $quux);

Do you ever feel you should implement equals(),
hashCode() and toString, but just can’t be bothered to
do it for every class? Well, if you aren’t bothered by speed, you can
use Jakarta Commons Lang to do it for you. Just add this to your class:

import org.apache.commons.lang.builder.ToStringBuilder;
import org.apache.commons.lang.builder.EqualsBuilder;
import org.apache.commons.lang.builder.HashCodeBuilder;

class Foo {
   public int hashCode() {
      return HashCodeBuilder.reflectionHashCode(this);
   }
   public boolean equals(Object other) {
      return EqualsBuilder.reflectionEquals(this,other);
   }
   public String toString() {
      return ToStringBuilder.reflectionToString(this);
   }
}

And that’s it. Your class will just do the right thing. As you can
probably guess from the function names, it uses reflection, so may be
suboptimal. If you need performance, you can use tell it to use
particular members, but I think I’ll leave that up to a future article.
I also recommend you don’t use this technique if you are using something
like Hibernate, which does
things behind the scenes on member access; you may find it does
undesirable things. 🙂