TinoDidriksen / Transfuse

Extract formatted text from documents, transform it, then put back in place

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Transfuse

Requirements

  • CMake
  • SQLite 3
  • libxml2
  • xxhash
  • libzip
  • pkg-config (for non-vcpkg platforms)
  • Debian/Ubuntu: sudo apt-get install build-essential cmake pkg-config libsqlite3-dev libxml2-dev libxxhash-dev libzip-dev
  • macOS MacPorts: sudo port install cmake pkgconfig sqlite3 libxml2 xxhash libzip

Usage

Given a HTML document, run tf-extract document.html or cat document.html | tf-extract to extract text blocks with transformed inline tags.

About

Extract formatted text from documents, transform it, then put back in place

License:GNU General Public License v3.0


Languages

Language:C++ 90.3%Language:CMake 5.5%Language:HTML 2.8%Language:Roff 0.7%Language:Shell 0.6%