haiderasad / tabular_data_extraction

A repo utilizing Document table extraction models and serving it as a standalone API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Image Processing API for Table Extraction πŸ“ŠπŸ”

Introduction 🌟

This API is designed to extract tabular data from images, leveraging EasyOCR and table detection models from microsoft/table-transformer . It's ideal for digitizing documents and automating data extraction from various table formats.

Installation πŸ› 

Using Conda

If you prefer using Conda for managing your Python environments, you can create a new environment and install the necessary packages as follows:

  1. Create a new Conda environment:
conda create --name table_extraction python=3.8
  1. Activate the Conda environment:
conda activate table_extraction
  1. Install the required packages:
pip install -r requirements.txt

API Usage πŸš€

Start the Flask server by running:

python main.py

To process an image, send a POST request with a base64-encoded image string. Use the following curl command as an example:

curl -X POST -H "Content-Type: application/json" -d '{"image_base64": "<base64_string>"}' http://localhost:5000/process-image

or use "inference.py" to process a image

API Reference πŸ“š

Endpoint: /process-image

  • Method: POST

  • Body: JSON object containing a base64-encoded image string.

    keys: 'image_base64'

  • Response: JSON object with the extracted table data and base64 table image cropped.

    keys:'json_result' and 'detected_tables_base64' .

IOS CONVERSION COMPATIBILITY 🍎

Deploying Huggingface models on iOS devices is possible through model optimization and conversion techniques such as quantization, pruning, and using intermediary formats like ONNX for conversion to CoreML. Apple's CoreML framework supports deploying machine learning models on iOS devices, but careful model optimization and testing are crucial to ensure performance and feasibility on mobile devices.

Converting Transformer models for iOS involves several steps, typically requiring model optimization and translation into a format compatible with Core ML, Apple's machine learning framework for iOS devices. Here's a high-level overview:

  1. Export to ONNX: First, export the PyTorch models to ONNX format, a popular open model format compatible across different ML frameworks.
  2. Optimize the ONNX Model: Use the ONNX Runtime to optimize the model for inference efficiency.
  3. Convert to Core ML: Use the coremltools library to convert the optimized ONNX model to Core ML format.
  4. Integrate into iOS App: Finally, integrate the Core ML model into your iOS app using Xcode and the Core ML framework.

About

A repo utilizing Document table extraction models and serving it as a standalone API


Languages

Language:Python 100.0%