staffanm / ferenda

Transform unstructured document collections to structured Linked Data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ferenda is a python library and framework for transforming unstructured document collections into structured Linked Data. It helps with downloading documents, parsing them to add explicit semantic structure and RDF-based metadata, finding relationships between documents, and publishing the results, including through a REST-based HTTP API.

https://badge.fury.io/py/ferenda.png https://travis-ci.org/staffanm/ferenda.png?branch=master https://ci.appveyor.com/api/projects/status/aqdo3c39cdof8opa/branch/master https://coveralls.io/repos/staffanm/ferenda/badge.png?branch=master Code Health https://pypip.in/d/ferenda/badge.png

Quick start

This example uses ferenda's project framework to download the 50 latest RFCs and W3C standards, parse documents into structured, RDF-enabled XHTML documents, loads all RDF metadata into a triplestore and generates a web site of static HTML5 files that are usable offline:

pip install ferenda
ferenda-setup myproject
cd myproject
./ferenda-build.py ferenda.sources.tech.RFC enable
./ferenda-build.py ferenda.sources.tech.W3Standards enable
./ferenda-build.py all all --downloadmax=50 --staticsite --fulltextindex=False
open data/index.html

The same functionality can also be accessed through a python API, if you want to use ferenda as part of a larger system. It's also possible to just use the parts of ferenda that you need (eg. only the downloading and parsing features).

More information

See http://ferenda.readthedocs.org/ for in-depth documentation.

Copyright and license

Most of the code written by Staffan Malmgren, licensed under the main 2-clause BSD license.

Some bundled code are written by other authors, included in accordance with their respective licenses:

About

Transform unstructured document collections to structured Linked Data

License:BSD 2-Clause "Simplified" License


Languages

Language:HTML 63.7%Language:Python 34.2%Language:XSLT 1.5%Language:CSS 0.3%Language:JavaScript 0.1%Language:Shell 0.1%Language:PHP 0.1%Language:Dockerfile 0.1%Language:Batchfile 0.0%