Thu, 24 Apr 2008

Violating Perl Module Namespaces

Perl doesn't enforce access to modules' namespaces. This would usually be considered a bad thing, but sometimes it allows us to work around problems in modules without changing their code. Here's a perfect example:

I've been writing a script to talk to an XML-RPC endpoint, using Frontier::Client but for one of the requests, the script throws the following error:

wanted a data type, got `ex:i8'

Turning on debugging showed the response type was indeed ex:i8, which isn't one of the types that Frontier::Client supports.

<?xml version="1.0" encoding="UTF-8"?>
<methodResponse xmlns:ex="http://ws.apache.org/xmlrpc/namespaces/extensions">
  <params>
    <param>
      <value>
        <ex:i8>161</ex:i8>
      </value>
    </param>
  </params>
</methodResponse>

Searching through the code shows Frontier::Client is a wrapper around Frontier::RPC2 and the error message happens at the following section:

   } elsif ($scalars{$tag}) {
       $expat->{'rpc_text'} = "";
       push @{ $expat->{'rpc_state'} }, 'cdata';
   } else {
       Frontier::RPC2::die($expat, "wanted a data type, got \`$tag'\n");
   }

So we can see that it's looking up the tag into a hash called %scalars to see if the type is a scalar type, otherwise throws the error we saw. Looking at the top, we can see this hash:

%scalars = (
    'base64' => 1,
    'boolean' => 1,
    'dateTime.iso8601' => 1,
    'double' => 1,
    'int' => 1,
    'i4' => 1,
    'string' => 1,
);

So, if we could add ex:i8 to this scalar, we could fix the problem. We could fix the module, but that would require every user of the script to patch their copy of the module. The alternative is to inject something into that hash across module boundaries, which we can do by just refering to the hash by it's complete name including the package name. We can use:

$Frontier::RPC2::scalars{'ex:i8'} = 1;

Now when we run the script, everything works. It's not nice and it's dependent on Frontier::RPC2 not changing. but it allows us to get on with our script.

[perl] | # Read Comments (1) |

Comments

Thu, 17 Apr 2008

Using In-memory tarballs with Archive::Tar

Archive::Tar is a useful library for working with tar archives from Perl. Unfortunately, one thing it doesn't allow is using data from memory as the archive. From the TODO section:

Allow archives to be passed in as string

Currently, we only allow opened filehandles or filenames, but not strings. The internals would need some reworking to facilitate stringified archives.

Fortunately, it does allow you to use a filehandle. I've previously mentioned about how useful the IO::Handle subsystem in perl is. And we should be able to use it in this case. The module we'll want is IO::String, which is a IO::Handle over a perl scalar. We can use it:

my $tar = new Archive::Tar(new IO::String($data));

Unfortunately when we run this now we get:

Cannot read compressed format in tar-mode at Foo.pm line 41
No data could be read from file at Foo.pm line 41

It turns out that this is because Archive::Tar uses IO::Zlib internally if the file isn't uncompressed, but this doesn't provide the ability to uncompress from a filehandle. The answer is to uncompress the data before passing it to Archive::Tar and the easiest way to do this is to use the IO::Uncompress::Gunzip module, so we can rewrite our code to:

my $tar = new Archive::Tar(new IO::Uncompress::Gunzip(new IO::String($data)));

Now when you run the script, Archive::Tar has an uncompressed tar stream. Yet another situation where IO::Handles comes to the rescue.

[] | # Read Comments (1) |

Comments

Boilerplate code for a perl class

Because I always forget when I need to create a new class in perl:

package Foo::Bar;

use strict;
use warnings;

sub new {
   my $this = shift;
   my $class = ref($this) || $this;
   my $self = {};
   bless $self, $class;
   $self->initialize(@_);
   return $self;
}

sub initialize {
   my $self = shift;
}

1;

If you have any useful additions I'd love to know.

[] | # Read Comments (5) |

Comments

Wed, 13 Jun 2007

Atomic in-place rewriting of files with backup in perl

In my article on Perl's IO::Handle objects I talked briefly about IO::AtomicFile and IO::Digest. I've just had reason to use these very useful modules to create a script which edits a file in place. These modules allowed me to do the rewrite atomically and optionally make a backup if the contents have changed. The example assumes you have a function called perform_rewrite that takes two file handles as the first two parameters.

use File::Copy;
use IO::File;
use IO::AtomicFile;
use IO::Digest;

sub rewrite_file {
   my $file = shift;
   my $sub = shift;
   my $input = new IO::File($file,'r');
   my $input_md5 = new IO::Digest($input, 'MD5');
   my $output = new IO::AtomicFile($file,'w');
   my $output_md5 = new IO::Digest($output, 'MD5');

   $sub->($input, $output, @_);

   if ($input_md5->hexdigest ne $output_md5->hexdigest) {
           copy ("$file", "$file.bak");
           $output->close();
   } else {
           # we haven't changed so don't bother updating
           $output->delete();
   }
   $input->close();
}

rewrite_file("/foo/bar", \&perform_rewrite, $baz, $quux);

[] | # Read Comments (0) |

Comments

Sun, 28 Jan 2007

Lazy Class Infrastructure

Do you ever feel you should implement equals(), hashCode() and toString, but just can't be bothered to do it for every class? Well, if you aren't bothered by speed, you can use Jakarta Commons Lang to do it for you. Just add this to your class:

import org.apache.commons.lang.builder.ToStringBuilder;
import org.apache.commons.lang.builder.EqualsBuilder;
import org.apache.commons.lang.builder.HashCodeBuilder;

class Foo {
   public int hashCode() {
      return HashCodeBuilder.reflectionHashCode(this);
   }
   public boolean equals(Object other) {
      return EqualsBuilder.reflectionEquals(this,other);
   }
   public String toString() {
      return ToStringBuilder.reflectionToString(this);
   }
}

And that's it. Your class will just do the right thing. As you can probably guess from the function names, it uses reflection, so may be suboptimal. If you need performance, you can use tell it to use particular members, but I think I'll leave that up to a future article. I also recommend you don't use this technique if you are using something like Hibernate, which does things behind the scenes on member access; you may find it does undesirable things. :)

[] | # Read Comments (0) |

Comments

Fri, 26 Jan 2007

Eddie 0.2 RSS and Atom Parser

I noticed today that Mark Pilgrim linked to Eddie, my liberal RSS and Atom parsing library for Java, so I figured I should make a new release. It's been a few months since I did any serious work on the parser, but in the last few days I've reduced the number of test case failures to less than 100 out of 3502 test cases which come as part of Mark's Feedparser parser for python. The majority of the failures are in the date parsing routines and due to bugs in the Jython library which cause literal dictionaries not to match with classes inherited fro PyDictionary.

Improvements in this version include:

  • Massively improved support for different character encodings. With Java 6, it also has support for UTF32 feeds.
  • CDF Support.
  • Optional support of TagSoup for sanitizing of HTML in entries.
  • Improved support for different input sources including String, InputStream and byte[].
  • Numerous bug fixes, with 97% of test cases passing, up from 90%

If you use Eddie, drop me an email. I'd like to thank Mark Pilgrim again for providing the community with a fantastic and comprehensive suite of test cases, extensive documentation and a first class Python library.

[] | # Read Comments (0) |

Comments

Thu, 25 Jan 2007

Speedy Java 6

I was quietly minding my own business, fixing some encoding bugs in Eddie, my liberal RSS and Atom parser, when I noticed that Java 6 included support for UTF-32, which is one of the encoding tests that was failing. I downloaded and installed the Ubuntu packages and installed it, and decided to run a quick benchmark using my unit tests.

First up was the Sun Java 5 JVM. I'd been running the unit tests all night, but timed it this time,and got these results:

Ran 3502 tests
Passed 3322 tests
Failed 180 tests

real    1m10.293s
user    0m40.375s
sys     0m3.632s

Next I tried the Sun Java 6 JVM, using the same jar files and got;

Ran 3502 tests
Passed 3326 tests
Failed 176 tests

real    0m56.059s
user    0m39.198s
sys     0m4.212s

One thing to note was that it spend a couple of seconds noticing new jars to read, so I decided to run it again and got:

Ran 3502 tests
Passed 3326 tests
Failed 176 tests

real    0m45.317s
user    0m34.770s
sys     0m3.516s

Wow, I'd gone from 70 seconds to 45 seconds using the new runtime, and interestingly enough, past 4 more tests in the process. I'm assuming they are the UTF-32 tests, although I have't checked yet. The other thing for me to try is recompiling the code to see if that has any additional benefits.

Update: Got around to checking what Java 6 fixed and it turned out it was the additional support for koi-u and cspc862latinhebrew encodings. After I fixed the UTF32 support in Eddie, it passed an additional 16 tests. Down to just 160 out of 3502. I just wish they would add support for some of the stranger encodings. Maybe this will happen when it's open source.

[] | # Read Comments (1) |

Comments

Sun, 18 Jun 2006

Markable InputStreams

Java has a nice IO subsystem. In particular, it has been designed such that input streams can optionally support a feature where a programmer can mark a position in the stream and at a later stage return to that point to read the data again. Programmers can check for this support by calling InputStream.markSupported(). Unfortunately I've had the need for this support, but I haven't managed to find a stream which supports this. Not even ByteArrayInputStream sees to support it. Fortunately it's fairly trivial to wrap an InputStream in another class which will add this support. Here is my quick adaptor, which seems to work for most cases.

import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;

public class MarkableInputStream extends InputStream {
    private InputStream inputstream;
    private int maxpos = 0;
    private int curpos = 0;
    private int mark = 0;
    private ArrayList<Integer> buffer = new ArrayList<Integer>();
    public MarkableInputStream(InputStream is) {
        inputstream = is;
    }

    @Override
    public int read() throws IOException {
        int data;
        if(curpos == maxpos) {
            data = inputstream.read();
            buffer.add(data); maxpos++;curpos++;
        } else {
            data = buffer.get(curpos++);
        }
        return data;
    }

    @Override
    public synchronized void mark(int readlimit) {
        mark = curpos;
    }

    @Override
    public boolean markSupported() {
        return true;
    }

    @Override
    public synchronized void reset() throws IOException {
        curpos = mark;
    }
}

You can use it like:

if (!istream.markSupported()) {
   istream = new MarkableInputStream(istream);
}

This could probably be improved on, most notably by not using an ArrayList. I'm not sure what performance penalty that adds. It should be possible to use a normal array as the readlimit parameter to mark() says how many bytes the stream should record before throwing old data away in favour of new input. The class above will record all data from the start of the stream, so could result in a significant amount of memory usage. Hope you find it useful.

[] | # Read Comments (1) |

Comments

Sat, 17 Jun 2006

Invalid Characters in Encoding with JAVA

Imagine you've got some text you've been told is ASCII and you've told java that it's ASCII using:

Reader reader = new InputStreamReader(inputstream, "ASCII");

Imagine your surprise when it happily reads in non-ascii values, say UTF-8 or ISO8859-1, and converts them to a random character.

import java.io.*;

public class Example1 {

   public static void main(String[] args) {
      try{
         FileInputStream is = new FileInputStream(args[0]);
         BufferedReader reader 
            = new BufferedReader(new InputStreamReader(is, args[1]));
         String line;
         while ((line = reader.readLine()) != null) {
            System.out.println(line);
         }
      } catch (Exception e) {
         System.out.println(e);
      }
   }
}
beebo david% java Example1 utf8file.txt ascii
I��t��rn��ti��n��liz��ti��n
beebo david% java Example1 utf8file.txt utf8
Iñtërnâtiônàlizætiøn

So, I hear you ask, how do you get Java to be strict about the conversion. Well, answer is to lookup a Charset object, ask it for a CharsetDecoder object and then set the onMalformedInput option to CodingErrorAction.REPORT. The resulting code is:

import java.io.*;
import java.nio.charset.*;

public class Example2 {

   public static void main(String[] args) {
      try{
         FileInputStream is = new FileInputStream(args[0]);
         Charset charset = Charset.forName(args[1]);
         CharsetDecoder csd = charset.newDecoder();
         csd.onMalformedInput(CodingErrorAction.REPORT);
         BufferedReader reader 
            = new BufferedReader(new InputStreamReader(is, csd));
         String line;
         while ((line = reader.readLine()) != null) {
            System.out.println(line);
         }
      } catch (Exception e) {
         System.out.println(e);
      }
   }
}

This time when we run it,we get:

beebo david% java Example2 utf8file.txt ascii
java.nio.charset.MalformedInputException: Input length = 1
beebo david% java Example2 utf8file.txt utf8
Iñtërnâtiônàlizætiøn

On a slightly related note, if anyone knows how to get Java to decode UTF32, VISCII, TCVN-5712, KOI8-U or KOI8-T, I would love to know.

Update: (2007-01-26) Java 6 has support for UTF32 and KOI8-U.

[] | # Read Comments (0) |

Comments

Sun, 11 Jun 2006

Class::DBI performance

Class::DBI is a very nice database abstraction layer for perl. It allows you to define your tables and columns and it magically provides you with classes with accessors/mutators for those columns. With something like Class::DBI::Pg, you don't even need to tell it your columns; it asls the database on startup. It's all very cool mojo and massively decreases the development time on anything database related in perl.

Unfortunately, as far as I can tell, it has a massive performance problem in its design. One of the features of Class::DBI is lazy population of data. It won't fetch data from the database until you try to use one of the accessors. This isn't normally a problem, except with retrieve_all(). Basically this function returns a list of objects for every row in your table. Unfortunately, due to the lazy loading of data, retrieve_all() calls SELECT id FROM table; and then every time you use an object it calls SELECT * FROM table WHERE id = n;. For a small table, this isn't too bad, but for a large table, it's a killer.

I did a little benchmark today to see just how much slower it is over plain DBI. I wrote two functions which iterate over a table, assigning one value to a function (forcing Class::DBI to fetch the data). The table in question contains 635 rows. The code I used was:

use strict;
use warnings;

use Benchmark qw(:all) ;

use Foo;

use DBI;

sub class_dbi {
   for my $foo (Foo->retrieve_all()) {
      my $bar = $foo->bar;
   }
}

sub dbi {
   my $dbh = DBI->connect("dbi:Pg:dbname=$db;host=$host",$user, $passwd);
   my $sth = $dbh->prepare("SELECT * FROM foos;");
   $sth->execute();
   while(my $row = $sth->fetchrow_hashref()) {
      my $bar = $row->{bar};
   }
}
cmpthese(100, {
      'Class::DBI' => 'class_dbi();',
      'DBI' => 'dbi();',
   });

The results:

brick david% perl benchmark.pl
           s/iter Class::DBI        DBI
Class::DBI   10.3         --       -97%
DBI         0.351      2845%         --

Class::DBI is more than 28 times slower than using DBI directly. I'm hoping that someone will now tell me "Oh you just do blah", otherwise I'm going to have to rewrite some of my code. One thing to learn from this is that reduction in development time can often cost you more in other areas, and it's often runtime performance.

Update: It appears that the bug is that Class::DBI::Pg does't set the Essential list of columns, so Class::DBI uses the primary column. you can fix this by adding the following to your database modules:

__PACKAGE__->columns(Essential => __PACKAGE__->columns);

Remember you'll need to do that for each of your modules; you won't be able to do it in your superclass, as you won't have discovered your columns yet. This has increased performance, but not massively. New timings (with the addition of using Class::DBI through an iterator):

              s/iter Class::DBI it    Class::DBI           DBI
Class::DBI it   6.35            --           -2%          -94%
Class::DBI      6.23            2%            --          -94%
DBI            0.350         1714%         1680%            --

Update 2: It appears that further speedgains can be made by not using Class::DBI::Plugin::DateTime::Pg to convert the three timestamp columns in my table into DateTime objects.:

              s/iter Class::DBI it    Class::DBI           DBI
Class::DBI it   1.26            --          -11%          -72%
Class::DBI      1.12           12%            --          -69%
DBI            0.350          260%          220%            --
[] | # Read Comments (2) |

Comments

Thu, 01 Jun 2006

Tidying Up

A few hours ago I got stressed about the lack of leg room under my desk and ended up spending the next few tidying and moving all of my computers to under the next desk. I also made the mistake of starting to remove keys from my keyboard to clean something sticky and found myself surrounded by keys and a keyless keyboard. It's now nice and shiny, which is more than can be said for the rest of the flat, which is now overrun with all the crap that was around my desk.

Another thing that could do with a tidy up is Eddie, my Java liberal feed parsing library. After the initial coding sprint, I've had time to sit back and look at the design of the library and clean up any thing that sticks out. As mentioned in a previous entry, one of the things that has bothered me is that when ever you need to call an object method, you need to be certain that the object is not null. The means you end up with code like:

if (string != null && strong.equals("string")) {

This quickly becomes tiresome and the test for null distracts from the meaning of the code. Fortunately I was reminded of an improvement for string objects. Ideally, we should all be writing comparison conditionals like rvalue == lvalue. (an rvalue mostly is an expresion you can't assign to). The most common rvalue is a literal value like a string constant. The advantage of getting into the habit of writing code like this is that you'll discover at compile time when you accidentally write = rather than ==. Because you can't assign to an rvalue, the compiler will complain. What makes this interesting from a java string point of view is that you can call methods on string literals. Comparing a variable to a string literal, rather than calling .equals() on a variable is that the string literal is not going to be null, so you can remove the test for null and simplify the code:

if("string".equals(string)) {

I know it's not everyone's cup of tea, but I prefer it to testing for null every time I look at a string. The other thing is that I've been reading Hardcore Java by Robert Simmons at work. Considering I've only got a few pages in so far. I've received a surprisingly large number of ideas to improve my code.

The one that sticks in my head is using assert for doing post and pre conditions on your functions. Using asserts have number of advantages over throwing exceptions, including the fact they get optimised away when you do a production release. In Eddie, during a <feed> element I determine the version of Atom that we are parsing. This had a number of nested if/else if/else blocks. At the end of the function, I wanted to make sure I had set the version string to something, so had the following code:

if (!this.feed.has("format")) {
   throw new SAXParseException("Failed to detect Atom format", this.locator);
}

However, using assertions I can write this as

assert(this.feed.has("format")) : "Failed to detect Atom format";

I highly recommend the Hardcore java book if you want to improve your java programming. It includes sections on the new features of Java 1.5 and using collections. I've made a couple of other cleanups including going through member variable access specifiers to make sure they are right and making several public methods and variables and making them priavte. I also have a couple of ideas about refactoring some of the code to clean it up. Redesigning and refactoring code is almost more fun than writing it in the first place. You get to be in competition with yourself, challenging yourself to write better code and end up with cleaner code in the process.

A couple of things I want to do in the near future is use a profiler and code coverage tools. If anyone has recommendations for either of these tools that integrates nicely with eclipse, I'd love to know.

[] | # Read Comments (5) |

Comments

Join Map Ord Split

Just when you thought Perl couldn't get more unreadable, someone[0] comes up with something like this:

print join ", ", map ord, split //, $foo;

This mess of perl might be easier to understand if I put the brackets in:

print join (", ", map( ord, split( //, $foo)));

What this does is split $foo into a list of characters. It then uses map to run ord() on each item in the list to return a new list containing the numeric character values. We then join these again with ", " to make the output easier to read.

david% perl -e 'print join ", ", map ord, split //, "word";'
119, 111, 114, 100

The map function is familiar to functional programmers and is very powerful, but beware it can reduce the clarity of your code.

[0] Me

[] | # Read Comments (7) |

Comments

Wed, 31 May 2006

Eddie RSS and Atom Feed Parser

I'd like to announce the initial release of Eddie, a feed parser library written in Java. It's taken me over 100 hours, but it now correctly parses 90% of the FeedParser unit tests, including all the rss and atom tests. It's GPLed, with an exception allowing you to use it in any open sourced program. Get it at my website. Need to add documentation and character set and encoding support. Also need to separate the testing infrastructure from the rest of the code.

This is the first time I've done any java programming in anger, and I have to say I'm surprised to discover I quite like it. In many ways it seems a very quick language to program in. It seems almost like programming in a scripting language, but stronger typed. This is probably due to not having to worry about memory management. Certainly I don't think I could have written this quite so quickly in C++.

Having said that, there are a couple things that I don't like about Java. Everything is a pointer. This is useful at times, but it means that every time you want to call a method on an object you have to test whether it is null or you run the risk of getting the dreaded NullPointerException. Java also doesn't have keywords for and, or and not. I know not everyone likes these, but I keep finding myself trying to use them.

I'm sure there are other things I hated, but I can't remember them now. I think I'll end up doing more java programming in the future.

[] | # Read Comments (1) |

Comments

Thu, 25 May 2006

Pathological Date Parser in Java

I've recently had cause to parse some date values in Java. As a result I've produced a class which can manage to parse an awful lot of date formats. I thought I'd better document it in case someone found it useful. Certainly there doesn't appear to be anything elsewhere which shows you how to parse lots of formats. I have found the order of date_formats to be very brittle, so I don't recommend you change it without an awful lot of test cases.

Anyway, without further to do, I present to you, the Pathological Date Parser for Java

// Copyright 2006 David Pashley <david@davidpashley.com>
// Licensed under the GPL version 2
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.TimeZone;

public class Date {
    private Calendar date;

    static String[] date_formats = {
            "yyyy-MM-dd'T'kk:mm:ss'Z'",        // ISO
            "yyyy-MM-dd'T'kk:mm:ssz",          // ISO
            "yyyy-MM-dd'T'kk:mm:ss",           // ISO
            "EEE, d MMM yy kk:mm:ss z",        // RFC822
            "EEE, d MMM yyyy kk:mm:ss z",      // RFC2882
            "EEE MMM  d kk:mm:ss zzz yyyy",    // ASC
            "EEE, dd MMMM yyyy kk:mm:ss",   //Disney Mon, 26 January 2004 16:31:00 ET
            "-yy-MM",
            "-yyMM",
            "yy-MM-dd",
            "yyyy-MM-dd",
            "yyyy-MM",
            "yyyy-D",
            "-yyMM",
            "yyyyMMdd",
            "yyMMdd",
            "yyyy", 
            "yyD"
    
    };
    public Date(String d) {
        SimpleDateFormat formatter = new SimpleDateFormat();
        d = d.replaceAll("([-+]\\d\\d:\\d\\d)", "GMT$1"); // Correct W3C times
        d = d.replaceAll(" ([ACEMP])T$", " $1ST"); // Correct Disney timezones
        for (int i = 0; i < date_formats.length; i++) {
           try {
              formatter.applyPattern(date_formats[i]);
              formatter.parse(d);
              date = formatter.getCalendar();
              break;
           } catch(Exception e) {
              // Oh well. We tried
           }
        }
        
    }
}

The only date formats I can't get it to parse are <4-digit year>-<day of year> and <2digit year><day of year> (e.g. 2003-335 and 03335 for 2003-12-01). If you can add support for those and other date formats I'll gladly take patches.

[] | # Read Comments (4) |

Comments

Sat, 20 May 2006

Choosing Member Functions at Runtime in Java

Have you ever wanted to call a member function in your class, but not known what it will be at compile time? I'm writing a SAX parser and would like a function for every element name. I could write a massive switch statement in the startElement function, but this will quite quickly become unmanagable for a large schema. The alternative is to look to see if a particular member function exists and call it.

To do this little bit of magic we need to use Java's introspection API. The first thing to do is to get a Class object for our class. We can do that by calling:

Class klass = this.getClass();

We can then look up the method we are looking for using Class.getMethod, but this function requires an array of types that the method we are looking for takes as parameters, so we get the right version of an overloaded method. We can do this with:

Class[] arguments = { Int.class, String.class, URL.class};
Method method = klass.getMethod("foo", arguments);

Now we have our method, we can call it using the Method.invoke() call. This takes an object as the first parameter, which we can use this, and an array of Objects for the parameters.

Object[] values = {bar, baz, quux};
method.invoke(this, values);

But what happens if our class has no member method called foo()? Well, Class.getMethod() will throw a NoSuchMethodException, so we can just throw a try/catch block around the code to deal with unhandled functions. It's worth pointing out that Class.getMethod() also throws SecurityException and Method.invoke() throws IllegalAccessException, IllegalArgumentException and InvocationTargetException, so you'll want to catch Exception too.

We can chain some of these calls together and the result for my SAX parser is:

public void startElement(String uri, String localName, String qName, Attributes atts) 
            throws SAXException {
   try {
       Class[] argTypes = { String.class, String.class, String.class,
               Attributes.class };
       Object[] values = { uri, localName, qName, atts };
       this.getClass().getMethod("startElement_" + localName, argTypes)
               .invoke(this, values);
   } catch (NoSuchMethodException e) {
       log.debug("unhandled element " + localName);
   } catch (Exception e) {
       e.printStackTrace();
   }
}

With this arrangement, when I want to handle a new element in my code I can just make a function like:

public void startElement_foo(String uri, String localName, String qName, Attributes atts)
            throws SAXException {
   ... 
}
[] | # Read Comments (4) |

Comments

Fri, 19 May 2006

IO::Handle

Another article for your viewing pleasure. This article describes how to use Perl's IO based IO::Handle IO system and a couple of modules that allow you do to some interesting things like seamlessly handle compressed files and calculate MD5 sums as you read a file in.

use IO::Digest;

my $fh = new IO::file($filename, "r");
my $iod = new IO::Digest($fh, ’MD5’);

read_and_parse($fh);

print $iod->hexdigest
[] | # Read Comments (0) |

Comments

Wed, 17 May 2006

The case against backticks

I hate backticks. They have no place in modern shell programming. Here is a couple of reasons why this is the case.

Not nestable

backticks, by their nature, are not nestable. You can not have a command expansion inside another with back ticks.

command `foo ` bar ` baz`

Should the shell expand foo and baz or bar and then foo <output of bar> baz? As it happens it will run the first one. Using the modern command expansion syntax you can write:

command $(foo $( bar ) baz)
command $(foo ) bar $( baz)

Backticks are invisible

The backtick symbol is too small and too easily confused with a single quote to be used for writing maintainable code. The alternative is significantly larger and therefore more obvious.

ls "'`!!`'"
ls "'$(!!)'"

Here is a zoomed version of the line above in my terminal:

Please consider using the bracketed version of command expansion rather than backticks and make the world a nicer place

[] | # Read Comments (6) |

Comments