kaliop-uk / eztika

eZPublish4 extension: a wrapper for the standalone Tika toolkit that allows conversion to plain text and indexing of a large variety of binary file types like MsWord, MsOffice, PDF, Excel, ODF, .... Copy from http://svn.projects.ez.no/eztika

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

License for all but the tika.jar file: GNU GPL 2.0
   tika.jar is licensed with the ASF License (Apache)

Installation: See, INSTALL.txt

Description:

eZ Tika is an extension that enables a handler for converting multiple binary file formats to plain text as used by the search engine (if you enabled those attributes as searcheable)

Currently, most common office formats are enabled (see also binaryfile.ini.append.php):

[application/pdf]
[application/msword]
[application/vnd.ms-excel]
[application/vnd.ms-powerpoint]
[application/vnd.visio]
[application/vnd.ms-outlook]
[application/xml]
[application/rtf]
[application/vnd.oasis.opendocument.text]
[application/vnd.oasis.opendocument.presentation]
[application/vnd.oasis.opendocument.spreadsheet]
[application/vnd.oasis.opendocument.formula]
[application/zip]
[application/vnd.openxmlformats-officedocument.wordprocessingml.document]
[application/vnd.openxmlformats-officedocument.spreadsheetml.sheet]
[application/vnd.openxmlformats-officedocument.presentationml.presentation]
[application/octet-stream]

About

eZPublish4 extension: a wrapper for the standalone Tika toolkit that allows conversion to plain text and indexing of a large variety of binary file types like MsWord, MsOffice, PDF, Excel, ODF, .... Copy from http://svn.projects.ez.no/eztika

License:Other


Languages

Language:PHP 95.1%Language:Shell 4.9%