POSTS FOR 2004

The Battle of Dunstan vs. Andrei vs. Mark

Projects and Code378 words2 minutes to read

…well, their syndication feeds anyways. Here’s the problem:

While working on SimplePie initially, I used copies of Dunstan’s Atom and RSS feeds because I felt that they’d be representative of most people’s decently well-formed feeds. I know that some people have worse feeds, and that Mark Pilgrim’s feeds are a bit too “academically” correct.

Dunstan has a problem with his feed. He uses the numeric entity for a “smart-apostrophe” in his feed’s <title> tag. This happens to be a UTF–8 character. For whatever reason, parsing his feed in every PHP-based feed reader I’ve ever used displays that smart-apostrophe as a question mark. In wanting to build a “feed parser for the rest of us”, I decided to be smart and wrap a CDATA section around the contents of the <title>, <link>, and <description> on the fly for those that don’t already have them. Dunstan’s question mark becomes the character that it’s supposed to be. No problem.

On the other hand, Andrei also has a problem with his feed. Well, not really… it’s just that the fix I put in place to fix Dunstan’s feed broke Andrei’s feed. Andrei does a fake-out with his CDATA sections. He closes the CDATA section in <description>, then has one last bit of content before closing the tag. This is just enough to get past SimplePie’s logic. Wonderful.

Since Dunstan’s issue is only in the feed’s <title> tag, I went ahead and changed how SimplePie handles the feeds by removing the code for wrapping CDATA sections around <link> and <description>. Both Dunstan and Andrei have working feeds again.

Then, I go and test it on Mark’s Feed Parser project feed. SimplePie breaks down again. Well, crap. Instead of using <title> like normal people, Mark has to be all cool by using <title type="text/plain">. Argh.

So, I’m off to find some code that can resolve this little quarrel. I’m thinking about going through and looking for UTF–8 entities (4-digit, typically begins with an 8: &#8217;) and wrapping CDATA sections around those entities alone, which will probably work. I don’t want to release this software as 1.0 until it performs satisfactorily with every single feed in my entire reading list.

Version 0.92 is on it’s way here folks. Andrew, how’s that WordPress plug-in coming along?

Ryan Parman

Ryan Parman is an experienced software engineer, open source evangelist, and passionate user advocate currently living in Seattle. He is the creator of and , and worked on DevOps and Security at . He is now bringing learning into the digital age as an Engineering Lead and Site Reliability Engineer at . Ryan's aptly-named blog, , is where he writes about ideas longer than .