.

 

Twitter RSS in XSLT FTW

I've decided to geek out on Wednesdays. Today, I want to include a Twitter RSS feed (which, inspired by Joshua Allen, I tentatively plan to fill not with news of my clever doings but rather with filth about a fictional evil family) on the front page here at Ftrain. I'll do the coding in XSLT2.

(What I build below is just a toy. A grown-up XSLT2.0 RSS reader with a hardcore RFC 822 munger can be found over here. You can also look at the XSL FAQ.)

The Output and Input

First, here's the HTML output I'm hoping to create:

Tuesday, 21 Aug

7:24 p.m. — Grandpa Dieter up all night screaming take me to Maple St. synagogue so I can apologize. I'm like, for what? He just shakes his head. Crazy.

Now let's look at the input. Here's a typical Twitter RSS item:


    <item>
      <title>Paul Ford: Grandpa Dieter up all night screaming take me to Maple St. synagogue so I can apologize. I'm like, for what? He just shakes his head. Crazy.</title>
      <description>Paul Ford: Grandpa Dieter up all night screaming take me to Maple St. synagogue so I can apologize. I'm like, for what? He just shakes his head. Crazy.</description>
      <pubDate>Tue, 21 Aug 2007 23:24:45 +0000</pubDate>
      <guid>http://twitter.com/paul_e_ford/statuses/218935552</guid>
      <link>http://twitter.com/paul_e_ford/statuses/218935552</link>
    </item>

You know how the Iraq War solved 9/11? That's how RSS 2.0 solved syndication. As you can see above, dates in RSS 2.0 are mistakenly formatted according to RFC 822, a standard created in the early months of the Reagan Administration. I'll make this mistake livable by writing some functions to turn RFC 822 dates into date strings that XSLT can understand.

Date Parser

First, since months in RFC 822 are indicated as abbreviated English month-names, i.e. “Jan” and “Feb,” I need a way to know the numeric value of each month. I create a global variable called $months.


  <xsl:variable name="months">
    <xsl:for-each select="tokenize('Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec', ' ')">
      <f:m pos="{format-number(position(),'00')}"><xsl:value-of select="."/></f:m>
    </xsl:for-each>
  </xsl:variable>

Don't worry about what that means just yet. What it does is make a temporary XML tree that I can use for my own purposes:


  <f:m pos="01">Jan</f:m>
  <f:m pos="02">Feb</f:m>
  <f:m pos="03">Mar</f:m>
  <f:m pos="04">Apr</f:m>
  <f:m pos="05">May</f:m>
  <f:m pos="06">Jun</f:m>
  <f:m pos="07">Jul</f:m>
  <f:m pos="08">Aug</f:m>
  <f:m pos="09">Sep</f:m>
  <f:m pos="10">Oct</f:m>
  <f:m pos="11">Nov</f:m>
  <f:m pos="12">Dec</f:m>

Now, if I need to know which month Jan represents, I just use this XPath statement:

$months//f:m[.='Jan']/@pos

The question that answers is “what is the value of the @pos attribute where the element <f:m> has a value equal to Jan?” The answer is always 01.

So now that I can find out the numeric value of the month I'll write a function that takes a more-or-less normal RFC 822 date/time string and returns a string that can be easily converted to a date by XSLT. The idea here is to take

Tue, 21 Aug 2007 23:24:45 +0000

for example, and turn it into

2007-08-20T23:24:45+00:00 .


  <xsl:function name="f:rfc822-to-xs">
    <xsl:param name="rfc822"/>
    <xsl:variable name="date-list" select="tokenize($rfc822 ,' ')"/>
    <xsl:variable name="year" select="$date-list[4]"/>
    <xsl:variable name="month" select="$months//f:m[.=$date-list[3]]/@pos"/>
    <xsl:variable name="day" select="format-number(xs:integer($date-list[2]),'00')"/>
    <xsl:variable name="time" select="$date-list[5]"/>
    <xsl:variable name="zone-hour" select="substring($date-list[6],1,3)"/>
    <xsl:variable name="zone-minute" select="substring($date-list[6],3,2)"/>

    <xsl:value-of select="concat($year,'-',$month,'-',$day,'T',$time,$zone-hour,':',$zone-minute)"/>

  </xsl:function>

The tokenize function splits the incoming string into a list, which is stored inside the variable $date-list. I chop up the elements inside $date-list a little more, formatting numbers, getting month values from the $months variable we created above, and cutting substrings out of strings to create the time zone; then I assign them to new variables.

Once I'm done slicing I put everything into the proper sequence with concat. Concat is the seaweed paper in my dateTime sushi. The resulting string can be parsed as a date in XSLT. (Of course this doesn't prepare us for all sorts of things that can go wrong—two-digit dates and so forth. But it works for Twitter RSS, so far.)

That function is never called directly. Instead, I write two functions to produce actual xs:date or xs:dateTime values.


  <xsl:function name="f:rfc822-to-dateTime">
    <xsl:param name="rfc822"/>
    <xsl:value-of select="adjust-dateTime-to-timezone(xs:dateTime(f:rfc822-to-xs($rfc822)))"/>
  </xsl:function>

  <xsl:function name="f:rfc822-to-date">
    <xsl:param name="rfc822"/>
    <xsl:value-of select="xs:date(substring(xs:string(f:rfc822-to-dateTime($rfc822)),1,10))"/>
  </xsl:function>

The first function, f:rfc822-to-dateTime(), calls f:rfc822-to-xs() to turn the RFC 822 date into a more XSL-friendly format. Then it turns that into a xs:dateTime, and adjusts that xs:dateTime to the current timezone. I'm eastern standard, so this turns the time we started with from 2007-08-20T23:24:45+00:00 to 2007-08-21T19:24:45-04:00.

The second function, f:rfc822-to-date(), repeats all that by calling the first function (we want to adjust the time zone properly first thing), then slices off the first ten characters in the xs:dateTime and turns that into an xs:date. So you give it 2007-08-20T23:24:45+00:00, it turns that into the xs:dateTime 2007-08-21T19:24:45-04:00, turns that into text, cuts that down to 2007-08-21, and turns that into an xs:date. As far as I can tell there is no way to cast (convert) an xs:dateTime to an xs:date directly. You might ask: “why not?” But the serious XSLT practictioner does not ask but waits to learn. (I honestly have no idea. I'm sure there's a good reason.)

The Feed

All righty. Now we can deal with the feed itself. First I create two variables, one containing the address for the RSS feed and the other with my name (so that I can strip it out of the text).


  <xsl:variable name="rss-uri" select="'http://twitter.com/statuses/user_timeline/6981492.rss'"/>
  <xsl:variable name="rss-to-strip" select="'Paul Ford: '"/>

Then I create a skeleton function that will turn an RSS feed into a sidebar here on Ftrain. It takes two parameters corresponding to the two variables we just defined.


  <xsl:function name="f:rss-to-sidebar">
    <xsl:param name="rss"/>
    <xsl:param name="strip"/>
  </xsl:function>

Now we need a root template to call the function.


  <xsl:template match="/">
    <div><xsl:sequence select="f:rss-to-sidebar(document($rss-uri), $rss-to-strip)"/></div>
  </xsl:template>

Which says, “fetch the document in $rss-uri and pass that, along with the contents of $rss-to-strip, into f:rss-to-sidebar.”

What I want to do next is take the flat list of RSS items that I passed to f:rss-to-sidebar and group the items by individual day. So back to f:rss-to-sidebar().


  <xsl:function name="f:rss-to-sidebar">
    <xsl:param name="rss"/>
    <xsl:param name="strip"/>

    <xsl:for-each-group select="$rss//item" group-by="xs:date(f:rfc822-to-date(pubDate))">
      <xsl:sort select="current-grouping-key()" order="descending"/>
      <h3><xsl:value-of select="format-date(current-grouping-key(),'[FNn], [D01] [MNn,*-3]')"/></h3>
    </xsl:for-each-group>

  </xsl:function>


I use the for-each-group function to do this, and as my grouping key I use the date-processing function I wrote above (xs:date(f:rfc822-to-date(pubDate))). XSLT takes all the <item>s from the RSS feed and turns their <pubDate>s into real xs:dates. Then it groups the items together by date—it turns the list of RSS <item>s into a list of days and each day is associated with a list of <item>s.

Next, inside an <h3> tag, I format and print the grouping key (which represents the current date) using the “picture string[FNn], [D01] [MNn,*-3]. Figuring that out is left as an exercise for the reader (it's 11 p.m. and I've got to get on the train and go home). But that turns 2007-08-21 into “Tuesday, 21 Aug.”

Now, inside of the for-each-group, I add the code that actually displays the individual RSS items, which are sitting there grouped up into current-group(), waiting to be used.


  <xsl:function name="f:rss-to-sidebar">
    <xsl:param name="rss"/>
    <xsl:param name="strip"/>
    <xsl:for-each-group select="$rss//item" group-by="xs:date(f:rfc822-to-date(pubDate))">
      <xsl:sort select="current-grouping-key()" order="descending"/>
      <h3><xsl:value-of select="format-date(current-grouping-key(),'[FNn], [D01] [MNn,*-3]')"/></h3>

      <xsl:for-each select="current-group()">
        <xsl:sort select="f:rfc822-to-dateTime(pubDate)" order="descending"/>
        <p><a href="{link}"><xsl:value-of select="format-dateTime(f:rfc822-to-dateTime(pubDate), '[h1]:[m01] [P]')"/></a> - <xsl:value-of select="replace(description, $strip, '')"/></p>
      </xsl:for-each>

    </xsl:for-each-group>
  </xsl:function>

This takes every item in the current-group() and in descending chronological order (thanks to the xsl:sort), spits out a paragraph with a link to the item on Twitter, the time it was published (another picture string here), and the actual text with my name stripped out.

The Code

The whole thing appears below. The content it produces appears in a column on the home page. The code is in the public domain if you want it.

I run it like this:

$ java -jar /Users/paul/bin/saxon8.jar rss2html.xsl rss2html.xsl

Command-line XSLT requires you to specify an XML file on which to operate. In this case I'm going outside to Twitter for our XML, so I just pass the xsl file itself (which is valid XML) as the source file.

So what's it good for? It's good for me. It'd be pretty easy to extend it to eat up a bunch of feeds and generate a page, but you probably already have a solution for that. XSLT2 is good for a lot of other stuff—I didn't use recursive templates or keys or any of the things that make the language awesome (nor did I type my params or do some other best-practice stuff, so do not kill me). But we have time, so what the hell. Maybe next week I'll go into some more for-each-group tricks for drawing complex HTML tables.

.  .  .  .  .  

File: rss2html.xsl:


<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
  xmlns:f="http://ftrain.com/"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/1999/xhtml"
  version="2.0"
  exclude-result-prefixes="f xsl xs"
>
  <xsl:output method="xml" indent="yes"/>

  <xsl:variable name="rss-uri" select="'http://twitter.com/statuses/user_timeline/6981492.rss'"/>
  <xsl:variable name="rss-to-strip" select="'Paul Ford: '"/>

  <xsl:variable name="months">
    <xsl:for-each select="tokenize('Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec', ' ')">
      <f:m pos="{format-number(position(),'00')}"><xsl:value-of select="."/></f:m>
    </xsl:for-each>
  </xsl:variable>

  <xsl:template match="/">
    <div>
      <xsl:sequence select="f:rss-to-sidebar(document($rss-uri), $rss-to-strip)"/>
    </div>
  </xsl:template>

  <xsl:function name="f:rss-to-sidebar">
    <xsl:param name="rss"/>
    <xsl:param name="strip"/>
    <xsl:for-each-group select="$rss//item" group-by="xs:date(f:rfc822-to-date(pubDate))">
      <xsl:sort select="current-grouping-key()" order="descending"/>
      <h3><xsl:value-of select="format-date(current-grouping-key(),'[FNn], [D01] [MNn,*-3]')"/></h3>
      <xsl:for-each select="current-group()">
        <xsl:sort select="f:rfc822-to-dateTime(pubDate)" order="descending"/>
        <p>
          <a href="{link}">
            <xsl:value-of select="format-dateTime(f:rfc822-to-dateTime(pubDate), '[h1]:[m01] [P]')"/>
          </a> 
          — <xsl:value-of select="replace(description, $strip, '')"/>
        </p>
      </xsl:for-each>
    </xsl:for-each-group>
  </xsl:function>

  <xsl:function name="f:rfc822-to-dateTime">
    <xsl:param name="rfc822"/>
    <xsl:value-of select="adjust-dateTime-to-timezone(xs:dateTime(f:rfc822-to-xs($rfc822)))"/>
  </xsl:function>

  <xsl:function name="f:rfc822-to-date">
    <xsl:param name="rfc822"/>
    <xsl:value-of select="xs:date(substring(xs:string(f:rfc822-to-dateTime($rfc822)),1,10))"/>
  </xsl:function>

  <xsl:function name="f:rfc822-to-xs">
    <xsl:param name="rfc822"/>

    <xsl:variable name="date-list" select="tokenize($rfc822 ,' ')"/>
    <xsl:variable name="year" select="$date-list[4]"/>
    <xsl:variable name="month" select="$months//f:m[.=$date-list[3]]/@pos"/>
    <xsl:variable name="day" select="format-number(xs:integer($date-list[2]),'00')"/>
    <xsl:variable name="time" select="$date-list[5]"/>
    <xsl:variable name="zone-hour" select="substring($date-list[6],1,3)"/>
    <xsl:variable name="zone-minute" select="substring($date-list[6],3,2)"/>

    <xsl:value-of select="concat($year,'-',$month,'-',$day,'T',$time,$zone-hour,':',$zone-minute)"/>

  </xsl:function>

</xsl:stylesheet>



That's it.


[Top]

Ftrain.com

PEEK

Ftrain.com is the website of Paul Ford and his pseudonyms. It is showing its age. I'm rewriting the code but it's taking some time.

FACEBOOK

There is a Facebook group.

TWITTER

You will regret following me on Twitter here.

EMAIL

Enter your email address:

A TinyLetter Email Newsletter

About the author: I've been running this website from 1997. For a living I write stories and essays, program computers, edit things, and help people launch online publications. (LinkedIn). I wrote a novel. I was an editor at Harper's Magazine for five years; then I was a Contributing Editor; now I am a free agent. I was also on NPR's All Things Considered for a while. I still write for The Morning News, and some other places.

If you have any questions for me, I am very accessible by email. You can email me at ford@ftrain.com and ask me things and I will try to answer. Especially if you want to clarify something or write something critical. I am glad to clarify things so that you can disagree more effectively.

POKE


Syndicate: RSS1.0, RSS2.0
Links: RSS1.0, RSS2.0

Contact

© 1974-2011 Paul Ford

Recent

Recent Offsite Work: Code and Prose. As a hobby I write. (January 14)

Rotary Dial. (August 21)

10 Timeframes. (June 20)

Facebook and Instagram: When Your Favorite App Sells Out. (April 10)

Why I Am Leaving the People of the Red Valley. (April 7)

Welcome to the Company. (September 21)

“Facebook and the Epiphanator: An End to Endings?”. Forgot to tell you about this. (July 20)

“The Age of Mechanical Reproduction”. An essay for TheMorningNews.org. (July 11)

Woods+. People call me a lot and say: What is this new thing? You're a nerd. Explain it immediately. (July 10)

Reading Tonight. Reading! (May 25)

Recorded Entertainment #2, by Paul Ford. (May 18)

Recorded Entertainment #1, by Paul Ford. (May 17)

Nanolaw with Daughter. Why privacy mattered. (May 16)

0h30m w/Photoshop, by Paul Ford. It's immediately clear to me now that I'm writing again that I need to come up with some new forms in order to have fun here—so that I can get a rhythm and know what I'm doing. One thing that works for me are time limits; pencils up, pencils down. So: Fridays, write for 30 minutes; edit for 20 minutes max; and go whip up some images if necessary, like the big crappy hand below that's all meaningful and evocative because it's retro and zoomed-in. Post it, and leave it alone. Can I do that every Friday? Yes! Will I? Maybe! But I crave that simple continuity. For today, for absolutely no reason other than that it came unbidden into my brain, the subject will be Photoshop. (Do we have a process? We have a process. It is 11:39 and...) (May 13)

That Shaggy Feeling. Soon, orphans. (May 12)

Antilunchism, by Paul Ford. Snack trams. (May 11)

Tickler File Forever, by Paul Ford. I'll have no one to blame but future me. (May 10)

Time's Inverted Index, by Paul Ford. (1) When robots write history we can get in trouble with our past selves. (2) Search-generated, "false" chrestomathies and the historical fallacy. (May 9)

Bantha Tracks. (May 5)

The Moral Superiority of the Streetcar. (1) Long-form journalism fixes everything. (2) The moral superiority of the streetcar. (3) I like big bus and I cannot lie. (May 4)

More...
Tables of Contents