Popplonode

Return metadata & text extraction of a PDF file

Why Popplonode

Popplonode is an addons node.js which means it use pure c++ code, it's clearly faster than PDFJS & it's faster than a spawn of Poppler pdfinfo too! because we only use specific c++ class of it.

Requirements

This version is working for Node.js v10, v8 & v6 (LTS) on Linux & OSX, a working for windows is in progress..

Install

npm install popplonode

INFO If you want to use it with a particular version of node(eg: 8.5) you will need:

sudo apt-get install cmake g++
brew install cmake

Usage

const Popplonode = require('popplonode');

const poppl = new Popplonode();

// We load the PDF file into poppl
poppl.load('path/to/my/file.pdf'); 

// We can access the metadata of the PDF file
const metadata = poppl.getMetadata(); // 

poppl.getTextFromPage(0, (error, content) => {
  // do something with the content page
});

API

load(string)

arguments:

string path to your pdf file

getMetadata()

returns:

object returns an object that contains all of the pdf's metadata

// example of an metadata object return
{ 
  CreationDate: 'D:20100304130800+01\'00\'',
  Author: 'manshanden',
  Creator: 'PScript5.dll Version 5.2',
  Producer: 'Acrobat Distiller 7.0.5 (Windows)',
  ModDate: 'D:20100304130837+01\'00\'',
  Title: 'Microsoft Word - Test document Word.doc',
  TotalNbPages: 1,
  PDFFormatVersion: '1.4'
}

getTextFromPage(number, function)

arguments :

number page number (first page start at zero)
function callback who return page text

#Windows If anyone could help us to build poppler on windows we could then build it for node.js :D

About

Just a node module for poppler library

MIT License

Languages

Language:C++ 50.0%Language:Makefile 16.2%Language:HTML 9.5%Language:CMake 7.0%Language:Shell 6.8%Language:C 4.9%Language:Objective-C 3.6%Language:Python 0.9%Language:M4 0.6%Language:Roff 0.4%Language:CSS 0.1%Language:Groovy 0.1%Language:JavaScript 0.0%Language:QML 0.0%Language:Dockerfile 0.0%