Return metadata & text extraction of a PDF file
Popplonode is an addons node.js which means it use pure c++ code, it's clearly faster than PDFJS & it's faster than a spawn of Poppler pdfinfo too! because we only use specific c++ class of it.
This version is working for Node.js v10, v8 & v6 (LTS) on Linux & OSX, a working for windows is in progress..
npm install popplonode
INFO If you want to use it with a particular version of node(eg: 8.5) you will need:
sudo apt-get install cmake g++
brew install cmake
const Popplonode = require('popplonode');
const poppl = new Popplonode();
// We load the PDF file into poppl
poppl.load('path/to/my/file.pdf');
// We can access the metadata of the PDF file
const metadata = poppl.getMetadata(); //
poppl.getTextFromPage(0, (error, content) => {
// do something with the content page
});
arguments:
- string path to your pdf file
returns:
- object returns an object that contains all of the pdf's metadata
// example of an metadata object return
{
CreationDate: 'D:20100304130800+01\'00\'',
Author: 'manshanden',
Creator: 'PScript5.dll Version 5.2',
Producer: 'Acrobat Distiller 7.0.5 (Windows)',
ModDate: 'D:20100304130837+01\'00\'',
Title: 'Microsoft Word - Test document Word.doc',
TotalNbPages: 1,
PDFFormatVersion: '1.4'
}
arguments :
- number page number (first page start at zero)
- function callback who return page text
#Windows If anyone could help us to build poppler on windows we could then build it for node.js :D