There are 17 repositories under data-extraction topic.
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
:newspaper: Let ChatGPT Summarize Hacker News for You
🚜 Parse text and tables from PDF files.
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
A python client for the Sypht API
Benchmarking PDF libraries
This repository provides usage examples for the Python module Newspaper3k.
Structured HTML table data extraction from URLs in Go that has almost no external dependencies
A Python utility to digitize plots.
Line segmentation algorithm for Google Vision API.
Superpipe - optimized LLM pipelines for structured data
Scraping assistant tool. Editing and maintaining CSS/XPath selectors across webpages.
Python client for Reincubate's ricloud API. Yes, it works with iOS 14 & iPhone 12 backups!
A Java client for the Sypht API
file metadata parsing, done cheap
This repository contains the code that extracts a table from an image and exports it to an Excel.
⚡️ Next-generation data transformation framework for TypeScript that puts developer experience first
Domain-specific language for extracting structured data from HTML documents
DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing is possible for security and confidentiality
GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.
Extract receipt info
Fixed Width Data Visualizer plugin for Notepad++. Turns Notepad++ into Excel for fixed-width data files. Displays cursor position data. Jumps to specific fields. Folding Record Blocks. Extracts Data. Builtin dialogs to configure file-type, record-type & fields; Themes & Colors; and Folding. Handles homogenous, mixed & multi-line records.
Google maps scraper with gui
A Golang client for the Sypht API