There are 2 repositories under tika-python topic.
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Interactive Image similarity and Visual Search and Retrieval application
A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video
The Distributed Release Audit Tool (DRAT) for code analysis and verification.
🚴♂️⛷Data Lake, Performance tuning for text extraction from a huge amount of files.
tika-python as Debian GNU/Linux and Ubuntu Linux package
Extracting information from PDF files.
Веб-приложение, которое предсказывает тип документа по его содержанию 📝
python module for extracting texts from URL and PDF
USC DSCI 550 Assignment 3 - Spring 2021
This project showcase the application of LDA Topic Modelling and KMeans Clustering for extracting information from the PDF documents
Compilation of my coding practice notebooks tackling different stuff from simple Python to scraping and pandas.