Wed, 26 Jul 2006

Use Atom

Andreas, use atom rather than RSS. It has a <updated/> element for the last time the entry was updated and a <published/> element for the date the entry was published.

If you want to stick with RSS, you can use the <dcterms:issued/> element for the initial published date and one of <pubdate/>, <dc:date/> or <dcterms:modified/>. Don't forget to include the xml namespace for dc and dcterms.

[, ] | # Read Comments (0) |

Comments

Mon, 24 Jul 2006

Strict feed parsers are useless

Erich, I'm not entirely sure what you did to break Planet, but using a strict feed parser will just result in you missing a significant number of entries. People sadly don't produce valid feeds and will blame your software rather than their feeds. It doesn't help that a number of validators aren't entirely strict and that RSS doesn't have a very comprehensive spec. RSS is a lot worse than Atom, in part thanks to the Atom validator and very well thought out spec. It's for this reason that I ended up writing Eddie rather than using ROME as it was a DOM parser and just failed to get any information out of a non-wellformed feed. Eddie on the other hand is a SAX-based parser. In a recent comparison, an Eddie based aggregator managed to correctly parse several more entries than a ROME based aggregator one particular day.

You also have major aggregators being liberal. Sam Ruby discussed this recently with Bloglines becoming the defacto validator; if bloglines parses it, then it's valid. We had the same problem with HTML with people making sure their pages worked in a browser rather than met the spec.

I suspect the problem you had with Planet is that you failed to close a tag, causing the rest of the page to be in bold or be a link etc. This is fairly easily solvable and in fact has been with FeedParser, which is the feed parsing library Planet uses. It has support for using HTMLTidy and similar libraries for fixing unbalanced elements. Eddie uses TagSoup to do a similar thing. As a result I've not noticed any particular entry leaking markup and breaking the page. Parhaps Planet Debian just needs to install one of the markup cleaning libraries.

I agree that people should use XML tools where possible. Unfortunately, most blogging tools use text based templating systems, which makes producing non-wellformed XML too easy. To deal with this I pass all my output through an XSLT filter, which means that everything is either well formed or doesn't output at all. Unfortunately I don't think everyone would be capable or willing to use XSLT.

[, , , , , , ] | # Read Comments (3) |

Comments

Wed, 19 Jul 2006

I feel dirty

I've just installed a bunch of RPM packages that were built on CentOS and targetting Redhat-like linux distributions onto a Solaris server. Even scarier, it worked.

I feel dirty.

[, ] | # Read Comments (0) |

Comments

Killall sshd considered stupid

I was using our Fedora 3 server and decided to restart sshd running in a chroot:

[root@winky /]# /etc/init.d/sshd restart
Stopping sshd:/etc/init.d/sshd: line 212: kill: (1483) - No such process
Connection to winky closed by remote host.
Connection to winky closed.
mimsy david% ssh winky -l root
ssh: connect to host winky port 22: Connection refused

Thank you very much Fedora.

[, , , ] | # Read Comments (0) |

Comments