Apache Abdera and ROME: alea jacta est!
Posted by pat 108 days ago
Abdera: Atom Stream Parsing for the rest of us!
James Snell sent the following proposal to the ROME developer list last week:
Hello,
I wanted to take a moment to point y'all to a new proposal I and my
colleagues Sam Ruby, and Robert Yates have submitted to the Apache
Incubator PMC to create a new incubation project focusing on the
development of a fully-featured Atom 1.0 format and publishing protocol
implementation. We're calling it "Abdera" for now.
http://wiki.apache.org/incubator/AbderaProposal
...
Early on in our development, we ran some comparisons between the Abdera
and ROME parsers and noted a significant difference. When parsing Tim
Bray's Atom feed, for instance, Rome's JDOM implementation consumed over
6.5 MB of RAM and used over 600+ million CPU cycles. Abdera's
Axiom/StAX based implementation used around 750k of RAM and around 90
million cycles.
...
In any case, let's talk. Is there a possibility that we can combine
efforts on a single project under the Apache banner?My answer is: Wow! when can we start moving the codebase? It's been a while I wanted to look at StAX, Tim Bray repeatedly asking me when we'll get a streaming parser for ROME: it will be much easier to leverage Jame's work and agree on a common bean model for Atom feeds and entries. Also we've been talking about proposing ROME to Apache since last year, but did not do anyhting since a year. It seems the time is ripe for this move, if Apache is interested.
Dave Johnson, who implemented Atom 1.0 support in ROME, seems to be on game for it: this is great, since it is my understanding that the Atom client and server libraries that he presented at JavaOne 2006 live in Apache right now.
I haven't had much time to play with Jame's new API much yet, just build it, run the samples, and skim through the very nice code (he's kind enough to have included an eclipse project in the codebase, which makes it easier to get started)
Not all good things come for free, Abdera carries quite a lot of dependencies with it compared to 1 in ROME:
~/goocode/eclipse/abdera] pat% find dependencies -name '*.jar' | wc -l
15
But if you get a 10 times decrease in CPU cycles and memory consumption for that price, what's not to like? It sounds like a good tradeoff to me.
Because I'm a lazy guy I added a new run target to the ant build file, to let me run the samples without worying about the classpath... and was able to parse an Atom feed successfully.
eclipse/abdera] pat% ant -f build/build.xml -Drun.class=simple.Parse run
Buildfile: build/build.xml
init:
[echo]
[echo] =====================================================
[echo] abdera, 0.1
[echo] Working directory: /Users/pat/goocode/eclipse/abdera/build/work
[echo] =====================================================
[echo]
compile.core:
compile.parser:
compile.test:
[copydir] DEPRECATED - The copydir task is deprecated. Use copy instead.
build:
run:
[java] Example Feed
[java] TEXT
[java] http://example.org/
[java] Sat Dec 13 10:30:02 PST 2003
[java] John Doe
[java] urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6
[java] Atom-Powered Robots Run Amok
[java] TEXT
[java] http://example.org/2003/12/13/atom03
[java] urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a
[java] Sat Dec 13 10:30:02 PST 2003
[java] Some text.
[java] TEXT
BUILD SUCCESSFUL
Total time: 1 second
I chatted with Alejandro Abdelnur yesterday and he seems happy to collaborate, and this has nothing to do with the fact that he shares the first 4 letters of his name with the project codename:-)
Nick Lothian, who created the ROME Fetcher, and is also an Apache commiter, seems to like the idea as well.
Abdera is the native city of the greek philosopher Democritus, the father of Atomism:
Nothing exists except atoms and empty space; everything else is opinion.
This is a very apt name for the project. One way I recently described the importance of the Atom Publishing Protocol to a journalist recently was to say that with the Web, Tim Berners Lee created the InfoSpace equivalent of nature.
Roy Fielding synthesized that defining an early attempts at a web physics called REST. REST defines an architectural style where entities are resources with uris, and you use http verbs to act upon them.
Tim Berners Lee then switched directly to the equivalent of quantum physics with the Semantic Web vision, but it did not catch on because the folks implementing the web are mainly engineers, not physicists, and this was too big a conceptual jump. He tried to jump from Parmenides to Einstein without a pause at Newton.
The Atom Publishing Protocol builds on REST by defining what it is that you interact with your http verbs: Atom feed and documents. It's the small Democritus advance that we need to build a web physics that engineers can build a world on. I'm sure Newton is on its way somewhere on the web.
To come back to ROME and Abdera, I propose in this post to the ROME developer community to move the project to Apache, and merge ROME with Abdera, and Dave Johnson's blog tools, in order to create a single coherent stack of Java tools to deal with syndication data.
The roadmap I can see for the common project would look like this:
- Finish ROME 1.0 and end the beta
- Start ROME 2.0 using JDK 1.5 (Generics) with Abdera integration
- Agree on a common bean format for Atom
- Make Abdera a ROME parser for Atom
- Make streaming parsers for the other feed types
- Create a common repository of test cases gainst which to be "liberal"
- Create a Universal feed service: it is an Atom store that can be used for servers or client workstations: handles feed fetching, with a query API, the ROME Mano filtering API, and the Google Data API. On the server developers can use that to create any kind of GData server, on the client it could become the Linux alternative to Microsoft's excellent client side RSS APIs and infrastructure.
- Create a ROME module repository ala maven where a set of extension modules will be maintained ala maven repository.
- JSON serializer for Mano
- Semantic web extensions, to make it easier to consume/generate semweb data from regular feeds. Danny Ayers could help with this.
- Build an aggregator on top of the server: Roller aggregator may be a good start.
- Microformat parsers and translators: build parsers to detect microformats in Atom payloads and tranlate them to popular xml extensions (ex: hCalendar to Google Data Calendar).
Building this infrastructure was what we had in mind when we started ROME at Sun 2 years ago: I hope that collaborating with Apache will help us achieve this vision.


