Thu, 25 May 2006

Pathological Date Parser in Java

I've recently had cause to parse some date values in Java. As a result I've produced a class which can manage to parse an awful lot of date formats. I thought I'd better document it in case someone found it useful. Certainly there doesn't appear to be anything elsewhere which shows you how to parse lots of formats. I have found the order of date_formats to be very brittle, so I don't recommend you change it without an awful lot of test cases.

Anyway, without further to do, I present to you, the Pathological Date Parser for Java

// Copyright 2006 David Pashley <david@davidpashley.com>
// Licensed under the GPL version 2
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.TimeZone;

public class Date {
    private Calendar date;

    static String[] date_formats = {
            "yyyy-MM-dd'T'kk:mm:ss'Z'",        // ISO
            "yyyy-MM-dd'T'kk:mm:ssz",          // ISO
            "yyyy-MM-dd'T'kk:mm:ss",           // ISO
            "EEE, d MMM yy kk:mm:ss z",        // RFC822
            "EEE, d MMM yyyy kk:mm:ss z",      // RFC2882
            "EEE MMM  d kk:mm:ss zzz yyyy",    // ASC
            "EEE, dd MMMM yyyy kk:mm:ss",   //Disney Mon, 26 January 2004 16:31:00 ET
            "-yy-MM",
            "-yyMM",
            "yy-MM-dd",
            "yyyy-MM-dd",
            "yyyy-MM",
            "yyyy-D",
            "-yyMM",
            "yyyyMMdd",
            "yyMMdd",
            "yyyy", 
            "yyD"
    
    };
    public Date(String d) {
        SimpleDateFormat formatter = new SimpleDateFormat();
        d = d.replaceAll("([-+]\\d\\d:\\d\\d)", "GMT$1"); // Correct W3C times
        d = d.replaceAll(" ([ACEMP])T$", " $1ST"); // Correct Disney timezones
        for (int i = 0; i < date_formats.length; i++) {
           try {
              formatter.applyPattern(date_formats[i]);
              formatter.parse(d);
              date = formatter.getCalendar();
              break;
           } catch(Exception e) {
              // Oh well. We tried
           }
        }
        
    }
}

The only date formats I can't get it to parse are <4-digit year>-<day of year> and <2digit year><day of year> (e.g. 2003-335 and 03335 for 2003-12-01). If you can add support for those and other date formats I'll gladly take patches.

[] | # Read Comments (4) |

Comments

man Date::Parse
Posted by foo at Thu May 25 21:51:03 2006
In germany, the most common date formats are "d.m.yy" and "dd.mm.yyyy" - consider having some localization support.
Posted by Erich at Fri May 26 00:46:42 2006
I agree with foo that you should look into Perl.  It already has everything you are looking for and it's less wordy.
Posted by bar at Fri May 26 08:39:39 2006
Are there any easy ways of embedding perl in Java that don't impose huge runtime costs? In the real world, you don't have the option of saying to your manager "lol, im leeet lets stop using java in our product and use perl rofl lol luser!11!!". In your parents basement you can probably get away with that though.
Posted by AJ at Fri May 26 11:58:29 2006

Name:


E-mail:


URL:


Comment:


Please enter "fudge" to prove you are a human