jimberlage / lambda-soup

Functional HTML scraping and rewriting with CSS in OCaml.

Home Page:http://aantron.github.io/lambda-soup

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lambda Soup version 0.6 Documentation BSD license Travis status

Lambda Soup is a functional HTML scraping and manipulation library for OCaml aimed at being easy to use.

Lambda Soup usage example

Here is a trivial self-contained example:

"<p class='Hello'>World!</p>" |> parse $ ".Hello" |> R.leaf_text;;
- : string = "World!"

And, a mutation:

let soup = parse "<p class='Hello'>World!</p>" in
wrap (soup $ ".Hello" |> R.child) (create_element "strong");
soup |> to_string;;
- : string = "<p class=\"Hello\"><strong>World!</strong></p>"

For some more examples, see the Lambda Soup postprocessor that runs on Lambda Soup's own documentation after it is generated by ocamldoc.

Lambda Soup is simple. It provides a set of elementary traversals for getting from node to node, familiar functional combinators such as filter, map, and fold, and support for all CSS selectors that still make sense when not running in a browser (and a few obvious extensions on top of that).

The library is tested thoroughly.

Installing

opam install lambdasoup

Starting from scratch

To use Lambda Soup interactively as in the GIF at the top of this README, you need to have done something like this:

your-package-manager install ocaml opam
opam init
eval `opam config env`          # Or restart your shell
opam install lambdasoup

and make sure your ~/.ocamlinit file looks something like this:

let () =
  try Topdirs.dir_directory (Sys.getenv "OCAML_TOPLEVEL_PATH")
  with Not_found -> ()
;;

#use "topfind";;

Then, run ocaml -short-paths to start the top-level, and scrape away!

Documentation

Lambda Soup's interface consists of one module, whose signature is documented here.

Lambda Soup is based on Markup.ml. As a consequence, it resolves entity references, detects character encodings automatically, and converts everything to UTF-8. And, you can use Lambda Soup on XML, by parsing the XML with Markup.ml and feeding the signals to Lambda Soup.

Developing

See CONTRIBUTING. All feedback is welcome – open an issue on GitHub, or send me an email at antonbachin@yahoo.com. If you find yourself repeatedly writing the same helper on top of Lambda Soup's functions, perhaps we should add it to Lambda Soup.

License

Lambda Soup is distributed under the BSD license.

About

Functional HTML scraping and rewriting with CSS in OCaml.

http://aantron.github.io/lambda-soup


Languages

Language:OCaml 96.1%Language:Makefile 2.8%Language:Python 1.1%