csenger / python-tika-with-deps

Python wrapper for Apache Tika, made to be easy_installed

Home Page:http://tika.apache.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

python-tika - Python bindings for Apache Tika

Requirements

  • Java >= 1.5
  • JCC

Installation

$ python setup.py build
$ python setup.py install

Usage

To use the AutoDetectParser,

import tika
tika.initVM()

from tika import parser

print parser.from_buffer("<html><body>Hello World</body></html>")
# Or directly from a file, 
# print parser.from_file("/tmp/foo.doc")

returns a dict,

{'content': u'Hello Cruel World',
 'metadata': {u'Content-Encoding': u'ISO-8859-1',
				  u'Content-Type': u'text/html',
				  u'title': u'Hello world'}
}

About

Python wrapper for Apache Tika, made to be easy_installed

http://tika.apache.org/