apache-tika docker-container docker-image kubernetes k8s extracts-metadata text-to-speech document-to-text extract-text document-to-text-ui hacktoberfest

Apache Tika Implementation

The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

Prerequisite

Kubernetes Cluster >= 1.26
ArgoCD (Optional)

Deployment

Kubernetes Deployment

Create namespace, via kubectl create ns web Assuming you've checked out this repo

kubectl kustomize deployment/ | kubectl apply -f -

Or, to deploy via argocd:

kubectl apply -f deployment/argocd/application.yml

NOTE: Remeber to update Ingress hostname

Take it for a test drive:

Via CLI:

You'll need to forward service via kubectl port-forward -n web svc/tika-ui 8080

curl -d @test/url.json http://localhost:8080/ -H 'Content-Type: application/json'

Or, via Web UI:

Using a browser visit:

http://loclahost:8080/

About

Apache Tika - Toolkit detects and extracts metadata

apache-tika docker-container docker-image kubernetes k8s extracts-metadata text-to-speech document-to-text extract-text document-to-text-ui hacktoberfest

MIT License

Languages

Language:JavaScript 42.3%Language:Dockerfile 29.2%Language:HTML 17.2%Language:Python 11.3%