Some PDF with non UTF-8 was making an error insert fulltxt
Lenderboy opened this issue · comments
Indexing PDFs would often return error:
Sphider\admin\spider.php:300:HY000] Incorrect string value: '\xAD \xAD
added the following code to line 296 of spider.php. It converted the non unicode in PDF to '?' so they were not skipped during indexing.
$title = $db->quote($title); // line 294
$title = mb_convert_encoding($title, 'UTF-8'); // added this line to not skip over PDF with non utf-8 characters
$fulltxt = $db->quote($fulltxt); //line 297
$fulltxt = mb_convert_encoding($fulltxt, 'UTF-8'); //add this line to spider.php line 296
I merged this changes in (commit f4dfad0).
Thanks.