ametnes / nesis

Your AI Powered Enterprise Knowledge Partner. Designed to be used at scale from ingesting large amounts of documents formats such as pdfs, docx, xlsx, png, jpgs, tiff, mp3, mp4, jpeg. Integrates with s3, Windows Shares, Google Drive and more.

Home Page:https://ametnes.github.io/nesis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feature] Extract text from documents

mawandm opened this issue · comments

Description
As a user, I'd like to extract text from the document.

Detail
Text extraction is useful to allow for intermediary steps to document ingestion. This will be allow for other processes such as;

  1. Data cleansing
  2. Data exclusion based on an exclusion list.
  3. Approval workflows

Acceptance Criteria

  1. An API /v1/extractions/text in the RAG microservice.
  2. Extraction path added to the API microservice during document processing.
  3. Persisting the extracted text to an external SQL datasource.