Remus: a new RSS/Atom feed parser for Clojure

Recently, I found plenty of code that I usually borrow from project to project without hosting it on GitHub. I think I may share some of that since it might be useful to anybody.

The first library I’d like to start with is a feed parser. There are already some Clojure wrappers for ROME and they even solve default cases. But mine wrapper is special since it supports lots of additional options and tweaks. So if you play with RSS a lot, try Remus, the library I’m introducing in this post.

It’s also built on top of ROME but uses clj-http for HTTP communication with all its features, e.g. redirects settings, connection pooling, conditional requests by Etag.

Remus fetches all the possible data from a feed including non-standard tags for further processing. Some feeds (YouTube, for example) distribute useful data with custom XML tags, e.g video statistics or geographical coordinates.

Start with including the library into a project:

:dependencies [[remus "0.1.0"]]

Import it:

(ns your.project
  (:require [remus :refer [parse-url parse-file parse-stream]]))

;;;;
;; or
;;;;

(require '[remus :refer [parse-url parse-file parse-stream]])

Now parse a URL:

(def result (parse-url "http://planet.clojure.in/atom.xml"))

The result will be a map with :response and :feed keys. The first one holds an HTTP response. You’ll need it to save some of the headers for further HTTP communication. The :feed key contains a huge map representing a parsed feed:

(def feed (:feed result))

(println feed)

;;;;
;; a small subset of the origin feed
;;;;

{:description nil,
 :feed-type "atom_1.0",
 :entries
 [{:description nil,
   :updated-date #inst "2018-08-13T10:00:00.000-00:00",
   :published-date nil,
   :title
   "PurelyFunctional.tv Newsletter 287: DataScript, GraphQL, CRDTs",
   :author "Eric Normand",
   :link
   "https://purelyfunctional.tv/issues/purelyfunctional-tv-newsletter-287-datascript-graphql-crdts/",
   :contributors (),
   :uri "https://purelyfunctional.tv/?p=28660",
   :contents
   ({:type "html",
     :mode nil,
     :value "long HTML here....."
     }),
 :extra {:tag :extra, :attrs nil, :content ()},
 :published-date #inst "2018-08-13T11:59:11.000-00:00",
 :entry-links
 ({:rel "alternate",
   :type nil,
   :href "http://planet.clojure.in/",
   :title nil,
   :length 0}
  {:rel "self",
   :type nil,
   :href "http://planet.clojure.in/atom.xml",
   :title nil,
   :length 0}),
 :title "Planet Clojure",
 :link "http://planet.clojure.in/",
 :webmaster nil,
 :uri "http://planet.clojure.in/atom.xml",
 :authors ()}

To parse a file:

(def feed (parse-file "/path/to/some/atom.xml"))

This function returns just a feed data structure.

Remus supports options to specify how to communicate via HTTP; how to deal with broken or wrong encoded feeds; how to prevent receiving data you’ve already processed. I described all of this in the readme file and do not want to clone that text here. Instead, please look through that file and examples. I hope you’ll find the library helpful and will use it one day.

← Что делать с залитым Маком

Clojure extended: Java interop book →