Tue, 27 Jun 2006

How Not To Implement Spam Filtering (and web forms)

A "friend" of mine has recently been sending me forwarded jokes and other assorted crap we all grew out of sending about 5 minutes after we learnt how to send emails. This I can cope with, but recently, for every email she sends, I've been receiving an automated email from a server somewhere telling me that it's blocked an attachment, once for each attachment in the original email. That's crime number one. Looking at it further it appears that its not the sender's mail servers doing it, but once of the recipient's mail servers. When it gets an email with an attachment it's blocked, it emails everyone in the To: header to tell them, irrespective of whether they are local users or not. That's crime number two.

I thought I'd email postmaster@capita.co.uk to tell them of this problem, but unsurprisingly it bounced. That's crime number three. Looking on their website, I couldn't find any technical contacts, which wasn't really surprising. I did how ever find a "general enquiries" form, so I filled that in. Unfortunately, they used the following html for the message box:

<textarea name="Feedback1:fldEnquiry" rows="6" cols="1"
   id="Feedback1_fldEnquiry" class="enquiryTable"></textarea>

The result is that you get a text box 6 rows high and one column across, which is basically unuseable. Interestingly they appear to add style="width: 350px;" in IE, which makes it work. I'll make that crimes 4 and 5, cos doing different things for different browsers is a crime in itself.

I await a phone call or email from them.

[rfc2821,wtf] | # Read Comments (1) |

Comments

Intelligently Designed

Some one on a mailing list pointed out this cartoon and thought I'd share it with people:

cartoon
[] | # Read Comments (2) |

Comments

Fri, 23 Jun 2006

Where Do You Want To Go?

On the way home, someone asked me where I wanted to go when I died. My initial reaction was Tahiti. After thinking about it, I'd quite like to see New Zealand.

[] | # Read Comments (0) |

Comments

Sun, 18 Jun 2006

Markable InputStreams

Java has a nice IO subsystem. In particular, it has been designed such that input streams can optionally support a feature where a programmer can mark a position in the stream and at a later stage return to that point to read the data again. Programmers can check for this support by calling InputStream.markSupported(). Unfortunately I've had the need for this support, but I haven't managed to find a stream which supports this. Not even ByteArrayInputStream sees to support it. Fortunately it's fairly trivial to wrap an InputStream in another class which will add this support. Here is my quick adaptor, which seems to work for most cases.

import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;

public class MarkableInputStream extends InputStream {
    private InputStream inputstream;
    private int maxpos = 0;
    private int curpos = 0;
    private int mark = 0;
    private ArrayList<Integer> buffer = new ArrayList<Integer>();
    public MarkableInputStream(InputStream is) {
        inputstream = is;
    }

    @Override
    public int read() throws IOException {
        int data;
        if(curpos == maxpos) {
            data = inputstream.read();
            buffer.add(data); maxpos++;curpos++;
        } else {
            data = buffer.get(curpos++);
        }
        return data;
    }

    @Override
    public synchronized void mark(int readlimit) {
        mark = curpos;
    }

    @Override
    public boolean markSupported() {
        return true;
    }

    @Override
    public synchronized void reset() throws IOException {
        curpos = mark;
    }
}

You can use it like:

if (!istream.markSupported()) {
   istream = new MarkableInputStream(istream);
}

This could probably be improved on, most notably by not using an ArrayList. I'm not sure what performance penalty that adds. It should be possible to use a normal array as the readlimit parameter to mark() says how many bytes the stream should record before throwing old data away in favour of new input. The class above will record all data from the start of the stream, so could result in a significant amount of memory usage. Hope you find it useful.

[] | # Read Comments (1) |

Comments

New Delivery

For the last few weeks, we've had a couple of pigeons nesting in one corner of the small triangular piece of glass I delude myself into calling a garden. Yesterday the egg they've been incubating hatched and we now have a small yellow ball of fluff my girlfriend has named Armstrong.

Armstrong and Mother/Father
[] | # Read Comments (4) |

Comments

Sat, 17 Jun 2006

Invalid Characters in Encoding with JAVA

Imagine you've got some text you've been told is ASCII and you've told java that it's ASCII using:

Reader reader = new InputStreamReader(inputstream, "ASCII");

Imagine your surprise when it happily reads in non-ascii values, say UTF-8 or ISO8859-1, and converts them to a random character.

import java.io.*;

public class Example1 {

   public static void main(String[] args) {
      try{
         FileInputStream is = new FileInputStream(args[0]);
         BufferedReader reader 
            = new BufferedReader(new InputStreamReader(is, args[1]));
         String line;
         while ((line = reader.readLine()) != null) {
            System.out.println(line);
         }
      } catch (Exception e) {
         System.out.println(e);
      }
   }
}
beebo david% java Example1 utf8file.txt ascii
I��t��rn��ti��n��liz��ti��n
beebo david% java Example1 utf8file.txt utf8
Iñtërnâtiônàlizætiøn

So, I hear you ask, how do you get Java to be strict about the conversion. Well, answer is to lookup a Charset object, ask it for a CharsetDecoder object and then set the onMalformedInput option to CodingErrorAction.REPORT. The resulting code is:

import java.io.*;
import java.nio.charset.*;

public class Example2 {

   public static void main(String[] args) {
      try{
         FileInputStream is = new FileInputStream(args[0]);
         Charset charset = Charset.forName(args[1]);
         CharsetDecoder csd = charset.newDecoder();
         csd.onMalformedInput(CodingErrorAction.REPORT);
         BufferedReader reader 
            = new BufferedReader(new InputStreamReader(is, csd));
         String line;
         while ((line = reader.readLine()) != null) {
            System.out.println(line);
         }
      } catch (Exception e) {
         System.out.println(e);
      }
   }
}

This time when we run it,we get:

beebo david% java Example2 utf8file.txt ascii
java.nio.charset.MalformedInputException: Input length = 1
beebo david% java Example2 utf8file.txt utf8
Iñtërnâtiônàlizætiøn

On a slightly related note, if anyone knows how to get Java to decode UTF32, VISCII, TCVN-5712, KOI8-U or KOI8-T, I would love to know.

Update: (2007-01-26) Java 6 has support for UTF32 and KOI8-U.

[] | # Read Comments (0) |

Comments

Sun, 11 Jun 2006

Class::DBI performance

Class::DBI is a very nice database abstraction layer for perl. It allows you to define your tables and columns and it magically provides you with classes with accessors/mutators for those columns. With something like Class::DBI::Pg, you don't even need to tell it your columns; it asls the database on startup. It's all very cool mojo and massively decreases the development time on anything database related in perl.

Unfortunately, as far as I can tell, it has a massive performance problem in its design. One of the features of Class::DBI is lazy population of data. It won't fetch data from the database until you try to use one of the accessors. This isn't normally a problem, except with retrieve_all(). Basically this function returns a list of objects for every row in your table. Unfortunately, due to the lazy loading of data, retrieve_all() calls SELECT id FROM table; and then every time you use an object it calls SELECT * FROM table WHERE id = n;. For a small table, this isn't too bad, but for a large table, it's a killer.

I did a little benchmark today to see just how much slower it is over plain DBI. I wrote two functions which iterate over a table, assigning one value to a function (forcing Class::DBI to fetch the data). The table in question contains 635 rows. The code I used was:

use strict;
use warnings;

use Benchmark qw(:all) ;

use Foo;

use DBI;

sub class_dbi {
   for my $foo (Foo->retrieve_all()) {
      my $bar = $foo->bar;
   }
}

sub dbi {
   my $dbh = DBI->connect("dbi:Pg:dbname=$db;host=$host",$user, $passwd);
   my $sth = $dbh->prepare("SELECT * FROM foos;");
   $sth->execute();
   while(my $row = $sth->fetchrow_hashref()) {
      my $bar = $row->{bar};
   }
}
cmpthese(100, {
      'Class::DBI' => 'class_dbi();',
      'DBI' => 'dbi();',
   });

The results:

brick david% perl benchmark.pl
           s/iter Class::DBI        DBI
Class::DBI   10.3         --       -97%
DBI         0.351      2845%         --

Class::DBI is more than 28 times slower than using DBI directly. I'm hoping that someone will now tell me "Oh you just do blah", otherwise I'm going to have to rewrite some of my code. One thing to learn from this is that reduction in development time can often cost you more in other areas, and it's often runtime performance.

Update: It appears that the bug is that Class::DBI::Pg does't set the Essential list of columns, so Class::DBI uses the primary column. you can fix this by adding the following to your database modules:

__PACKAGE__->columns(Essential => __PACKAGE__->columns);

Remember you'll need to do that for each of your modules; you won't be able to do it in your superclass, as you won't have discovered your columns yet. This has increased performance, but not massively. New timings (with the addition of using Class::DBI through an iterator):

              s/iter Class::DBI it    Class::DBI           DBI
Class::DBI it   6.35            --           -2%          -94%
Class::DBI      6.23            2%            --          -94%
DBI            0.350         1714%         1680%            --

Update 2: It appears that further speedgains can be made by not using Class::DBI::Plugin::DateTime::Pg to convert the three timestamp columns in my table into DateTime objects.:

              s/iter Class::DBI it    Class::DBI           DBI
Class::DBI it   1.26            --          -11%          -72%
Class::DBI      1.12           12%            --          -69%
DBI            0.350          260%          220%            --
[] | # Read Comments (2) |

Comments

Wed, 07 Jun 2006

Converting Epoch Time values To Timestamps

Just a quick one. If you've ever created a table using the number of seconds since 1970 and realised, after populating it with data, that you really need it in a TIMESTAMP type? If so, you can quickly convert it using this SQL:

ALTER TABLE entries 
   ALTER COLUMN created TYPE TIMESTAMP WITH TIME ZONE
      USING TIMESTAMP WITH TIME ZONE 'epoch' + created *interval '1 second';

With thanks to the PostgreSQL manual for saving me hours working out how to do this.

[database,PostgreSQL] | # Read Comments (1) |

Comments

Oracle 10.2.0.1 Instant Client hanging?

Does your Oracle client hang when connecting? Are you using Oracle 10.2.0.1? Do you get the following if you strace the process?

gettimeofday({1129717666, 622797}, NULL) = 0
access("/etc/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("./network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("./network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
fcntl64(155815832, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
times(NULL) = -1808543702
.
.
.

Has your client been up for more that 180 days? Well done; you've just come across the same bug that has bitten two of our customers in the last week. Back in the days of Oracle 8, there was a fairly imfamous bug in the Oracle client where new connections would fail if the client had been up for 248 days or more. This got fixed, and wasn't a problem with Oracle 9i at all. Now Oracle have managed to introduce a similar bug in 10.2.0.1, although in my experience the number of days appears to be shorter (180+).

Thankfully, this has been fixed in the 10.2.0.2 Instant Client. More information can be found on forums.oracle.com and www.redhat.com.

[Oracle,gotchas] | # Read Comments (1) |

Comments

Thu, 01 Jun 2006

Tidying Up

A few hours ago I got stressed about the lack of leg room under my desk and ended up spending the next few tidying and moving all of my computers to under the next desk. I also made the mistake of starting to remove keys from my keyboard to clean something sticky and found myself surrounded by keys and a keyless keyboard. It's now nice and shiny, which is more than can be said for the rest of the flat, which is now overrun with all the crap that was around my desk.

Another thing that could do with a tidy up is Eddie, my Java liberal feed parsing library. After the initial coding sprint, I've had time to sit back and look at the design of the library and clean up any thing that sticks out. As mentioned in a previous entry, one of the things that has bothered me is that when ever you need to call an object method, you need to be certain that the object is not null. The means you end up with code like:

if (string != null && strong.equals("string")) {

This quickly becomes tiresome and the test for null distracts from the meaning of the code. Fortunately I was reminded of an improvement for string objects. Ideally, we should all be writing comparison conditionals like rvalue == lvalue. (an rvalue mostly is an expresion you can't assign to). The most common rvalue is a literal value like a string constant. The advantage of getting into the habit of writing code like this is that you'll discover at compile time when you accidentally write = rather than ==. Because you can't assign to an rvalue, the compiler will complain. What makes this interesting from a java string point of view is that you can call methods on string literals. Comparing a variable to a string literal, rather than calling .equals() on a variable is that the string literal is not going to be null, so you can remove the test for null and simplify the code:

if("string".equals(string)) {

I know it's not everyone's cup of tea, but I prefer it to testing for null every time I look at a string. The other thing is that I've been reading Hardcore Java by Robert Simmons at work. Considering I've only got a few pages in so far. I've received a surprisingly large number of ideas to improve my code.

The one that sticks in my head is using assert for doing post and pre conditions on your functions. Using asserts have number of advantages over throwing exceptions, including the fact they get optimised away when you do a production release. In Eddie, during a <feed> element I determine the version of Atom that we are parsing. This had a number of nested if/else if/else blocks. At the end of the function, I wanted to make sure I had set the version string to something, so had the following code:

if (!this.feed.has("format")) {
   throw new SAXParseException("Failed to detect Atom format", this.locator);
}

However, using assertions I can write this as

assert(this.feed.has("format")) : "Failed to detect Atom format";

I highly recommend the Hardcore java book if you want to improve your java programming. It includes sections on the new features of Java 1.5 and using collections. I've made a couple of other cleanups including going through member variable access specifiers to make sure they are right and making several public methods and variables and making them priavte. I also have a couple of ideas about refactoring some of the code to clean it up. Redesigning and refactoring code is almost more fun than writing it in the first place. You get to be in competition with yourself, challenging yourself to write better code and end up with cleaner code in the process.

A couple of things I want to do in the near future is use a profiler and code coverage tools. If anyone has recommendations for either of these tools that integrates nicely with eclipse, I'd love to know.

[] | # Read Comments (5) |

Comments

Join Map Ord Split

Just when you thought Perl couldn't get more unreadable, someone[0] comes up with something like this:

print join ", ", map ord, split //, $foo;

This mess of perl might be easier to understand if I put the brackets in:

print join (", ", map( ord, split( //, $foo)));

What this does is split $foo into a list of characters. It then uses map to run ord() on each item in the list to return a new list containing the numeric character values. We then join these again with ", " to make the output easier to read.

david% perl -e 'print join ", ", map ord, split //, "word";'
119, 111, 114, 100

The map function is familiar to functional programmers and is very powerful, but beware it can reduce the clarity of your code.

[0] Me

[] | # Read Comments (7) |

Comments