tkych / cl-feed-parser

Parse Atom and RSS feeds in Common Lisp.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Last modified : 2013-08-17 21:41:58 tkych

version 0.0.29 (alpha)

TODO:

  • Add: test! test! test!
  • Add: doc! doc! doc!
  • Change: namespace handling, Add: parse-option valid-namespaces
  • Add: parsing CDF
  • Add: microformats-parser like Beautiful Soup
  • Add: relative link resolution
  • Add: character encoding detection
  • !! TEST: Authorization for password-protected feed !!

CL-Feed-Parser

CL-feed-parser is a feed parser library for Common Lisp. The function parse-feed takes a url, pathname, string or stream as an argument, and returns hash-table stored feed data. The goal of cl-feed-parser is to treat the feed as the one format by hiding the difference of various feed formats (atom1.0, rss1.0, rss2.0, etc.).

This project is inspired by python feedparser (a.k.a. Universal Feed Parser). The feedparser is created by Mark Pilgrim, and bringing up by Kurt McKee.

Though cl-feed-parser's implementation is different from feedparser's, api is almost the same.

Difference:

cl-feed-parser is/has:

  • lazy eval,
  • accessor-function ref,
  • pased-date-time is universal time,

Depends-on

Installation

  1. SHELL$ git clone https://github.com/tkych/cl-feed-parser.git
  2. CL-REPL> (push #p"/path-to-cl-feed-parser/cl-feed-parser/" asdf:*central-registry*)
  3. CL-REPL> (ql:quickload :cl-feed-parser) or (asdf:load-system :cl-feed-parser)

Examples

The following example is available at https://gist.github.com/tkych/6255855

;; Let's make a minimum feed reader!

CL-REPL> (let ((cache (make-hash-table :test #'equal)))
           (defun parse (feed-spec)
             (or (gethash feed-spec cache nil)
                 (setf (gethash feed-spec cache)
                       (feed-parser:parse-feed feed-spec)))))

CL-REPL> (defun show-entry-titles (feed-spec)
           (let ((f (parse feed-spec)))
             (loop
                :for i :from 0
                :for title := (feed-parser:ref f :entries i :title)
                :until (null title)
                :do (format t "~&[~D]: ~A" i title))))

CL-REPL> (defun read-nth-entry (nth feed-spec)
           (princ (or (feed-parser:ref (parse feed-spec)
                                       :entries nth :description)
                      (feed-parser:ref (parse feed-spec)
                                       :entries nth :summary)))
           nil)

CL-REPL> (defparameter f "http://www.whitehouse.gov/feed/press")

CL-REPL> (show-entry-titles f)

CL-REPL> (read-nth-entry 2 f)

CL-REPL> (setf f "http://planet.lisp.org/rss20.xml")

CL-REPL> (show-entry-titles f)

CL-REPL> (read-nth-entry 1 f)

Manual

[Function] PARSE-FEED input &key etag modified agent referer handlers request-heders response-headers => hash-table

Parses feed, and returns hash-table stored feed-data. input must be ether url-string, xml-string, pathname or stream. If input is url-string, first fetches the feed before parsing.

[Function] REF feed-data &rest keys => feed-value

Get value from feed-data for keys like a method-chain. feed-data is a hash-table stored feed data or a value of it. keys are a string or a keyword which designates feed element. If a key is keyword, then the key is automatically converted to a string. If value is not exists, return NIL.

Examples: (suppose f is a feed-stored-hash-table)

(feed:ref f :entries 0 :title)
<=> (gethash "title" (nth 0 (gethash "entries" f)))

(feed:ref f :entries most-positive-fixnum :title)
=> NIL ;probably

Note:

  • The element names of RSS and Atom are interchangable.

    • i.e. you can get RSS value with Atom element name (or vice versa).
    • e.g. (ref parsed-feed "items") <=> (ref parsed-feed "entries").
  • If element is lazy object, force it.

[Function] TO-ALIST feed-hash-table => NIL

Convert the feed-hash-table into the alist.

[Macro] BE-LAZY form => lazy-object

Make lazy object. For making a custum-sanitizer or custum-date-time-parser.

[Function] LAZY-P x => boolean

Check whether x is lazy-object, or not. For making a custum-accessor function.

[Function] FORCE lazy-object => forced-value

Force lazy-object. For making a custum-accessor function.

[Special Variable] *USER-AGENT*

*USER-AGENT* is strored the string which tells the server who requests (i.e. User-Agent header value). If you are embedding cl-feed-parser in a larger software, you should change the value of *USER-AGENT* to your software name and URL.

[Special Variable] *DATE-TIME-PARSER*

*DATE-TIME-PARSER* is strored the function which parses date-time-string into universal time. The default value is the following:

  (lambda (date-time-string)
    (be-lazy
     (ignore-errors
       (reduce #'+ (multiple-value-list
                    (cl-date-time-parser:parse-date-time
                     date-time-string))))))

The above lazy function returns universal-time (plus fraction if exists). If you want to use the another date-time-parser or never to parse date-time, you could set this variable to the another parser or #'identity. If date-time-string is not date-time format, it returns NIL.

[Special Variable] *SANITIZER*

*SANITIZER* is strored the function which sanitizes html-string. The default value is the following:

  (lambda (html-string)
    (be-lazy (ignore-errors (sanitize:clean html-string))))

The above lazy function returns sanitized html-string. If you want to use the another sanitize-function or never to sanitize, you could set this variable to the another sanitize-function or #'identity. If html-string is not html format, it returns NIL.

Author, License, Copyright

About

Parse Atom and RSS feeds in Common Lisp.


Languages

Language:Common Lisp 100.0%