kcb0126 / google-pdf-scraper

Packagist library to filter pdf document in google driver with keywords.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Google PDF Scraper with Keywords

This is a php library to filter pdf documents in google driver for Daniel Fischl.

To import this into your project, use composer.

composer require tiefan/google-pdf-scraper

Extract text from PDF document

$text = PdfScraper::textFromDriveId(string $fileId);
$text = PdfScraper::textFromDriveUrl(string $url);

Check Document with "Begin" and "End" Keyword

$isThatDocument = PdfScraper::checkKeywordsFromDriveId(string $fileId, string $begin, string $end = null);
$isThatDocument = PdfScraper::checkKeywordsFromDriveUrl(string $url, string $begin, string $end = null);
$scraper = new PdfScraper($doc, $isURL = true); // $isURL: true for url, false for id
$isThatDocument = $scraper->checkKeywords(string $begin, string $end = null);

Using MySQL or MariaDB to process data at once

Following code is using db schema in Sample\db_pdf_scraper.sql

$pdfDB = new PdfDB($host, $username, $password, $database);
$processed_count = $pdfDB->checkPdfs();

About

Packagist library to filter pdf document in google driver with keywords.


Languages

Language:PHP 100.0%