compuphase / sphider-pdo

A simple search engine and spider in PHP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some PDF with non UTF-8 was making an error insert fulltxt

Lenderboy opened this issue · comments

Indexing PDFs would often return error:

Sphider\admin\spider.php:300:HY000] Incorrect string value: '\xAD \xAD

added the following code to line 296 of spider.php. It converted the non unicode in PDF to '?' so they were not skipped during indexing.

$title = $db->quote($title);  // line 294
$title = mb_convert_encoding($title, 'UTF-8'); // added this line to not skip over PDF with non utf-8 characters

$fulltxt = $db->quote($fulltxt); //line 297
$fulltxt = mb_convert_encoding($fulltxt, 'UTF-8'); //add this line to spider.php line 296


I merged this changes in (commit f4dfad0).
Thanks.