Java has a nice IO subsystem. In particular, it has been designed
such that input streams can optionally support a feature where a
programmer can mark a position in the stream and at a later
stage return to that point to read the data again. Programmers can check
for this support by calling InputStream.markSupported().
Unfortunately I’ve had the need for this support, but I haven’t managed
to find a stream which supports this. Not even
ByteArrayInputStream sees to support it. Fortunately it’s
fairly trivial to wrap an InputStream in another class which
will add this support. Here is my quick adaptor, which seems to work
for most cases.

import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;

public class MarkableInputStream extends InputStream {
    private InputStream inputstream;
    private int maxpos = 0;
    private int curpos = 0;
    private int mark = 0;
    private ArrayList<Integer> buffer = new ArrayList<Integer>();
    public MarkableInputStream(InputStream is) {
        inputstream = is;
    }

    @Override
    public int read() throws IOException {
        int data;
        if(curpos == maxpos) {
            data = inputstream.read();
            buffer.add(data); maxpos++;curpos++;
        } else {
            data = buffer.get(curpos++);
        }
        return data;
    }

    @Override
    public synchronized void mark(int readlimit) {
        mark = curpos;
    }

    @Override
    public boolean markSupported() {
        return true;
    }

    @Override
    public synchronized void reset() throws IOException {
        curpos = mark;
    }
}

You can use it like:

if (!istream.markSupported()) {
   istream = new MarkableInputStream(istream);
}

This could probably be improved on, most notably by not using an
ArrayList. I’m not sure what performance penalty that adds. It
should be possible to use a normal array as the readlimit
parameter to mark() says how many bytes the stream should
record before throwing old data away in favour of new input. The class
above will record all data from the start of the stream, so could result
in a significant amount of memory usage. Hope you find it useful.

For the last few weeks, we’ve had a couple of pigeons nesting in one
corner of the small triangular piece of glass I delude myself into
calling a garden. Yesterday the egg they’ve been incubating hatched and
we now have a small yellow ball of fluff my girlfriend has named
Armstrong.

Armstrong
and Mother/Father

Imagine you’ve got some text you’ve been told is ASCII and you’ve
told java that it’s ASCII using:

Reader reader = new InputStreamReader(inputstream, "ASCII");

Imagine your surprise when it happily reads in non-ascii values, say
UTF-8 or ISO8859-1, and converts them to a random character.

import java.io.*;

public class Example1 {

   public static void main(String[] args) {
      try{
         FileInputStream is = new FileInputStream(args[0]);
         BufferedReader reader
            = new BufferedReader(new InputStreamReader(is, args[1]));
         String line;
         while ((line = reader.readLine()) != null) {
            System.out.println(line);
         }
      } catch (Exception e) {
         System.out.println(e);
      }
   }
}
beebo david% java Example1 utf8file.txt ascii
I��t��rn��ti��n��liz��ti��n
beebo david% java Example1 utf8file.txt utf8
Iñtërnâtiônàlizætiøn

So, I hear
you ask, how do you get Java to be strict about the conversion. Well, answer
is to lookup a Charset object, ask it for a CharsetDecoder object and
then set the onMalformedInput option to
CodingErrorAction.REPORT. The resulting code is:

import java.io.*;
import java.nio.charset.*;

public class Example2 {

   public static void main(String[] args) {
      try{
         FileInputStream is = new FileInputStream(args[0]);
         Charset charset = Charset.forName(args[1]);
         CharsetDecoder csd = charset.newDecoder();
         csd.onMalformedInput(CodingErrorAction.REPORT);
         BufferedReader reader
            = new BufferedReader(new InputStreamReader(is, csd));
         String line;
         while ((line = reader.readLine()) != null) {
            System.out.println(line);
         }
      } catch (Exception e) {
         System.out.println(e);
      }
   }
}

This time when we run it,we get:

beebo david% java Example2 utf8file.txt ascii
java.nio.charset.MalformedInputException: Input length = 1
beebo david% java Example2 utf8file.txt utf8
Iñtërnâtiônàlizætiøn

On a slightly related note, if anyone knows how to get Java to decode
UTF32, VISCII, TCVN-5712, KOI8-U or KOI8-T, I would love to know.

Update: (2007-01-26) Java 6 has support for UTF32
and KOI8-U.

Class::DBI is
a very nice database abstraction layer for perl. It allows you to define
your tables and columns and it magically provides you with classes with
accessors/mutators for those columns. With something like
Class::DBI::Pg, you don’t even need to tell it your columns; it
asls the database on startup. It’s all very cool mojo and massively
decreases the development time on anything database related in perl.

Unfortunately, as far as I can tell, it has a massive performance
problem in its design. One of the features of Class::DBI is lazy
population of data. It won’t fetch data from the database until you try
to use one of the accessors. This isn’t normally a problem, except with
retrieve_all(). Basically this function returns a list of
objects for every row in your table. Unfortunately, due to the lazy
loading of data, retrieve_all() calls SELECT id FROM
table;
and then every time you use an object it calls SELECT *
FROM table WHERE id = n;
. For a small table, this isn’t too bad, but
for a large table, it’s a killer.

I did a little benchmark today to see just how much slower it is over
plain DBI. I wrote two functions which iterate over a table, assigning
one value to a function (forcing Class::DBI to fetch the data). The
table in question contains 635 rows. The code I used was:

use strict;
use warnings;

use Benchmark qw(:all) ;

use Foo;

use DBI;

sub class_dbi {
   for my $foo (Foo->retrieve_all()) {
      my $bar = $foo->bar;
   }
}

sub dbi {
   my $dbh = DBI->connect("dbi:Pg:dbname=$db;host=$host",$user, $passwd);
   my $sth = $dbh->prepare("SELECT * FROM foos;");
   $sth->execute();
   while(my $row = $sth->fetchrow_hashref()) {
      my $bar = $row->{bar};
   }
}
cmpthese(100, {
      'Class::DBI' => 'class_dbi();',
      'DBI' => 'dbi();',
   });

The results:

brick david% perl benchmark.pl
           s/iter Class::DBI        DBI
Class::DBI   10.3         --       -97%
DBI         0.351      2845%         --

Class::DBI is more than 28 times slower than using DBI directly. I’m
hoping that someone will now tell me “Oh you just do blah”, otherwise
I’m going to have to rewrite some of my code. One thing to learn from
this is that reduction in development time can often cost you more in
other areas, and it’s often runtime performance.

Update: It appears that the bug is that
Class::DBI::Pg does’t set the Essential list of columns, so
Class::DBI uses the primary column. you can fix this by adding the
following to your database modules:

__PACKAGE__->columns(Essential => __PACKAGE__->columns);

Remember you’ll need to do that for each of your modules; you won’t
be able to do it in your superclass, as you won’t have discovered your
columns yet. This has increased performance, but not massively. New
timings (with the addition of using Class::DBI through an iterator):

              s/iter Class::DBI it    Class::DBI           DBI
Class::DBI it   6.35            --           -2%          -94%
Class::DBI      6.23            2%            --          -94%
DBI            0.350         1714%         1680%            --

Update 2: It appears that further speedgains can be
made by not using Class::DBI::Plugin::DateTime::Pg to convert the three
timestamp columns in my table into DateTime objects.:

              s/iter Class::DBI it    Class::DBI           DBI
Class::DBI it   1.26            --          -11%          -72%
Class::DBI      1.12           12%            --          -69%
DBI            0.350          260%          220%            --

Just a quick one. If you’ve ever created a table using the number of
seconds since 1970 and realised, after populating it with data, that you
really need it in a TIMESTAMP type? If so, you can quickly convert it
using this SQL:

ALTER TABLE entries
   ALTER COLUMN created TYPE TIMESTAMP WITH TIME ZONE
      USING TIMESTAMP WITH TIME ZONE 'epoch' + created *interval '1 second';

With thanks to the PostgreSQL
manual
for saving me hours working out
how to do this.

Does your Oracle client hang when connecting? Are you using Oracle
10.2.0.1? Do you get the
following if you strace the process?

gettimeofday({1129717666, 622797}, NULL) = 0
access("/etc/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("./network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("./network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
fcntl64(155815832, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
.
.
.

Has your client been up for more that 180 days? Well done; you’ve
just come across the same bug that has bitten two of our customers in
the last week. Back in the days of Oracle 8, there was a fairly imfamous
bug in the Oracle client where new connections would fail if the client had
been up for 248 days or more. This got fixed, and wasn’t a problem with
Oracle 9i at all. Now Oracle have managed to introduce a similar bug in
10.2.0.1, although in my experience the number of days appears to be
shorter (180+).

Thankfully, this has been fixed in the 10.2.0.2
Instant Client
. More information can be found on forums.oracle.com
and www.redhat.com.

A few hours ago I got stressed about the lack of leg room under my
desk and ended up spending the next few tidying and moving all of my
computers to under the next desk. I also made the mistake of starting to
remove keys from my keyboard to clean something sticky and found myself
surrounded by keys and a keyless keyboard. It’s now nice and shiny,
which is more than can be said for the rest of the flat, which is now
overrun with all the crap that was around my desk.

Another thing that could do with a tidy up is Eddie, my
Java liberal feed parsing library. After the initial coding sprint, I’ve had time
to sit back and look at the design of the library and clean up any thing
that sticks out. As mentioned in a previous
entry
, one of the things that has bothered me is that when ever you
need to call an object method, you need to be certain that the object is
not null. The means you end up with code like:

if (string != null && strong.equals("string")) {

This quickly becomes tiresome and the test for null distracts from
the meaning of the code. Fortunately I was reminded of an improvement
for string objects. Ideally, we should all be writing comparison
conditionals like rvalue == lvalue. (an rvalue mostly is an expresion
you can’t assign to). The most common rvalue is a literal value like a
string constant. The
advantage of getting into the habit of writing code like this is that
you’ll discover at compile time when you accidentally write =
rather than ==. Because you can’t assign to an rvalue, the
compiler will complain. What makes this interesting from a java string
point of view is that you can call methods on string literals. Comparing
a variable to a string literal, rather than calling .equals()
on a variable is that the string literal is not going to be null, so you
can remove the test for null and simplify the code:

if("string".equals(string)) {

I know it’s not everyone’s cup of tea, but I prefer it to testing for
null every time I look at a string. The other thing is that I’ve been
reading Hardcore Java by Robert Simmons at work. Considering I’ve only
got a few pages in so far. I’ve received a surprisingly large number of
ideas to improve my code.

The one that sticks in my head is using assert for doing
post and pre conditions on your functions. Using asserts have number of
advantages over throwing exceptions, including the fact they get
optimised away when you do a production release. In Eddie, during a
<feed> element I determine the version of Atom that we are
parsing. This had a number of nested if/else if/else blocks. At
the end of the function, I wanted to make sure I had set the version
string to something, so had the following code:

if (!this.feed.has("format")) {
   throw new SAXParseException("Failed to detect Atom format", this.locator);
}

However, using assertions I can write this as

assert(this.feed.has("format")) : "Failed to detect Atom format";

I highly recommend the Hardcore java book if you want to improve your java
programming. It includes sections on the new features of Java 1.5 and
using collections. I’ve made a couple of other cleanups including going through member
variable access specifiers to make sure they are right and making
several public methods and variables and making them
priavte. I also have a couple of ideas about refactoring some
of the code to clean it up. Redesigning and refactoring code is almost
more fun than writing it in the first place. You get to be in
competition with yourself, challenging yourself to write better code
and end up with cleaner code in the process.

A couple of things I want to do in the near future is use a profiler
and code coverage tools. If anyone has recommendations for either of
these tools that integrates nicely with eclipse, I’d love to know.

Just when you thought Perl couldn’t get more unreadable, someone[0] comes up with something like this:

print join ", ", map ord, split //, $foo;

This mess of perl might be easier to understand if I put the brackets
in:

print join (", ", map( ord, split( //, $foo)));

What this does is split $foo into a list of characters. It then uses
map to run ord() on each item in the list to return a new list
containing the numeric character values. We then join these again with
“, ” to make the output easier to read.

david% perl -e 'print join ", ", map ord, split //, "word";'
119, 111, 114, 100

The map function is familiar to functional programmers and is very
powerful, but beware it can reduce the clarity of your code.

[0] Me