I’ve recently had cause to parse some date values in Java. As a
result I’ve produced a class which can manage to parse an awful lot of
date formats. I thought I’d better document it in case someone found it
useful. Certainly there doesn’t appear to be anything elsewhere which
shows you how to parse lots of formats. I have found the order of
date_formats to be very brittle, so I don’t recommend you
change it without an awful lot of test cases.
Anyway, without further to do, I present to you, the Pathological
Date Parser for Java
// Copyright 2006 David Pashley <david@davidpashley.com>
// Licensed under the GPL version 2
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.TimeZone;
public class Date {
private Calendar date;
static String[] date_formats = {
"yyyy-MM-dd'T'kk:mm:ss'Z'", // ISO
"yyyy-MM-dd'T'kk:mm:ssz", // ISO
"yyyy-MM-dd'T'kk:mm:ss", // ISO
"EEE, d MMM yy kk:mm:ss z", // RFC822
"EEE, d MMM yyyy kk:mm:ss z", // RFC2882
"EEE MMM d kk:mm:ss zzz yyyy", // ASC
"EEE, dd MMMM yyyy kk:mm:ss", //Disney Mon, 26 January 2004 16:31:00 ET
"-yy-MM",
"-yyMM",
"yy-MM-dd",
"yyyy-MM-dd",
"yyyy-MM",
"yyyy-D",
"-yyMM",
"yyyyMMdd",
"yyMMdd",
"yyyy",
"yyD"
};
public Date(String d) {
SimpleDateFormat formatter = new SimpleDateFormat();
d = d.replaceAll("([-+]\d\d:\d\d)", "GMT$1"); // Correct W3C times
d = d.replaceAll(" ([ACEMP])T$", " $1ST"); // Correct Disney timezones
for (int i = 0; i < date_formats.length; i++) {
try {
formatter.applyPattern(date_formats[i]);
formatter.parse(d);
date = formatter.getCalendar();
break;
} catch(Exception e) {
// Oh well. We tried
}
}
}
}
The only date formats I can’t get it to parse are <4-digit
year>-<day of year> and <2digit year><day of
year> (e.g. 2003-335 and 03335 for
2003-12-01). If you can add support for those and other date formats
I’ll gladly take patches.