ink-engine

The core engine behind Ink's file processing. Currently supports docx and epub (2.0 and 3.0).

License

Apache 2.0

Install

We haven't yet published this package on npm but you can install it directly from the GitHub repository.

npm:

npm install RebusFoundation/ink-engine

Example

const engine = require("ink-engine");
const path = require("path");

const epubPath = "path/to/epubfile.epub";

// This callback is called for every file in the publication, including those generated by the engine like the publication JSON file itself
async function extract(vfile, resource, metadata) {
  // do something with the vfile, e.g. upload to Google Storage
  // Then return the full url for the uploaded resource.
  return "uploaded/" + resource.url;
}

async function process(file) {
  const result = await engine(file, extract);
  // `result` is the publication metadata object. It conforms to the W3C wpub standard _for the most part.
  console.dir(result);
}

process(epubPath);

API

`engine(filepath, extractCallback[, options])`

Returns a Promise for a publication object that conforms (for the most part) to the W3C Web Publication note.

`extractCallback(vfile, resource, metadata)`

vfile: a virtual file object generated by the vfile module.
resource: a Web Publication LinkedResource object.
metadata: suitable for the metadata property when uploading to Google Storage.

`options`

options.sanitize (default: true): whether the extracted files should be sanitized before handed over to the extractCallback. This removes all JS files and sanitises CSS, HTML, SVG and XHTML files.
options.cssPrefix (default: #ink-engine): the selector prefix that should be used to sandbox the selectors in extracted CSS.

Markup is sanitized using dompurify. CSS is sanitized using an internal PostCSS module that filters out all unknown properties, prefixes all of the selectors with your chosen selector, and removes position: fixed. It also transforms body and html element selectors to ink-body and ink-html for additional rendering control.

Generated `hast` JSON files

The engine also generates hast JSON files for all HTML and XHTML files. hast is an abstract syntax tree format used by the unified collection of processing tools. These JSON files let clients skip the HTML parsing stage when rendering the publications as a part of a website. They also come embedded with a version of the publication object, a ToC in object form (if available) under the data.book and data.toc properties respectively. And the LinkedResource object for the current file is available under data.resource.

Prerender

The prerender directory contains code that uses the rehype-annotate module to process, prepare, and match annotations to the prerendered file.

TODO

Broader format support in general
warc support
Figure out if and how paged media formats can be supported (PDF, CBZ, CBR)
Figure out if and how time-based media formats can be supported (video, audio, podcasts, audiobooks)
Support the W3C's proposed Lightweight Packaging Format
Make the publication object more compliant with the W3C Web Publications Manifest note.

baldurbjarnason / ink-engine

ink-engine

License

Install

Example

API

`engine(filepath, extractCallback[, options])`

`extractCallback(vfile, resource, metadata)`

`options`

Generated `hast` JSON files

Prerender

TODO

About

Languages

ink-engine

License

Install

Example

API

engine(filepath, extractCallback[, options])

extractCallback(vfile, resource, metadata)

options

Generated hast JSON files

Prerender

TODO

About

Languages

`engine(filepath, extractCallback[, options])`

`extractCallback(vfile, resource, metadata)`

`options`

Generated `hast` JSON files