The core engine behind Ink's file processing. Currently supports docx
and epub
(2.0 and 3.0).
Apache 2.0
We haven't yet published this package on npm
but you can install it directly from the GitHub repository.
npm
:
npm install RebusFoundation/ink-engine
const engine = require("ink-engine");
const path = require("path");
const epubPath = "path/to/epubfile.epub";
// This callback is called for every file in the publication, including those generated by the engine like the publication JSON file itself
async function extract(vfile, resource, metadata) {
// do something with the vfile, e.g. upload to Google Storage
// Then return the full url for the uploaded resource.
return "uploaded/" + resource.url;
}
async function process(file) {
const result = await engine(file, extract);
// `result` is the publication metadata object. It conforms to the W3C wpub standard _for the most part.
console.dir(result);
}
process(epubPath);
Returns a Promise
for a publication object that conforms (for the most part) to the W3C Web Publication note.
vfile
: a virtual file object generated by thevfile
module.resource
: a Web PublicationLinkedResource
object.metadata
: suitable for themetadata
property when uploading to Google Storage.
options.sanitize
(default:true
): whether the extracted files should be sanitized before handed over to theextractCallback
. This removes all JS files and sanitises CSS, HTML, SVG and XHTML files.options.cssPrefix
(default:#ink-engine
): the selector prefix that should be used to sandbox the selectors in extracted CSS.
Markup is sanitized using dompurify
. CSS is sanitized using an internal PostCSS module that filters out all unknown properties, prefixes all of the selectors with your chosen selector, and removes position: fixed
. It also transforms body
and html
element selectors to ink-body
and ink-html
for additional rendering control.
The engine also generates hast
JSON files for all HTML and XHTML files. hast
is an abstract syntax tree format used by the unified
collection of processing tools. These JSON files let clients skip the HTML parsing stage when rendering the publications as a part of a website. They also come embedded with a version of the publication object, a ToC in object form (if available) under the data.book
and data.toc
properties respectively. And the LinkedResource
object for the current file is available under data.resource
.
The prerender
directory contains code that uses the rehype-annotate
module to process, prepare, and match annotations to the prerendered file.
- Broader format support in general
-
warc
support - Figure out if and how paged media formats can be supported (PDF, CBZ, CBR)
- Figure out if and how time-based media formats can be supported (video, audio, podcasts, audiobooks)
- Support the W3C's proposed Lightweight Packaging Format
- Make the publication object more compliant with the W3C Web Publications Manifest note.