Unstructured (Unstructured-IO)

Unstructured-IO

User data from Github https://github.com/Unstructured-IO

0

followers

0

following

0

stars

Home Page:https://unstructured.io/

GitHub:@Unstructured-IO

Twitter:@UnstructuredIO

Unstructured's repositories

unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

Language:HTMLLicense:Apache-2.0Stargazers:13144Issues:68Issues:1227
Language:PythonLicense:Apache-2.0Stargazers:830Issues:27Issues:142

pipeline-sec-filings

Preprocessing pipeline notebooks and API supporting text extraction from SEC documents

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:146Issues:20Issues:7

unstructured-python-client

A Python client for the Unstructured Platform API

Language:PythonLicense:MITStargazers:107Issues:18Issues:41

unstructured-js-client

A JavaScript/Typescript client for the Unstructured Platform API

Language:TypeScriptLicense:MITStargazers:57Issues:22Issues:20

unstructured.PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Language:PythonLicense:Apache-2.0Stargazers:40Issues:1Issues:0
Language:Jupyter NotebookStargazers:37Issues:11Issues:0

pipeline-paddleocr

Pipeline for converting PDFs to raw text with PaddleOCR

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:23Issues:21Issues:1

danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.

Language:PythonLicense:NOASSERTIONStargazers:11Issues:0Issues:0

langchain

⚡ Building applications with LLMs through composability ⚡

Language:PythonLicense:MITStargazers:8Issues:2Issues:0

pipeline-oer

Pipeline for extraction information from Army OERs

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:8Issues:21Issues:1
Language:PythonLicense:Apache-2.0Stargazers:8Issues:21Issues:0

docs

Documentation for all Unstructured products and libraries

Language:HTMLStargazers:7Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:6Issues:2Issues:0

base-images

Store Dockerfiles and Packer configs for images to use as a base to build upon

Language:ShellLicense:Apache-2.0Stargazers:5Issues:3Issues:1

unstructured.pytesseract

A Python wrapper for Google Tesseract

Language:PythonLicense:Apache-2.0Stargazers:4Issues:1Issues:0
Language:Jupyter NotebookStargazers:2Issues:0Issues:0

azure-ai-hub-gateway-solution-accelerator

Reference architecture that provides a set of guidelines and best practices for implementing a central AI API gateway to empower various line-of-business units in an organization to leverage Azure AI services

Language:BicepLicense:MITStargazers:1Issues:0Issues:0

model-cards

FedRAMP formatted model cards

Stargazers:1Issues:0Issues:0

rag-over-hybrid-data-sources

Two sources (S3, ElasticSearch) to RAG DB pipeline.

Language:Jupyter NotebookStargazers:1Issues:0Issues:0

js-client-batch

JS Client Batch Processing

Language:JavaScriptStargazers:0Issues:0Issues:0

aws-blog-post-example

Script to accompany the AWS blog post on unstructured data ETL with Unstructured Ingest library

Language:PythonLicense:Apache-2.0Stargazers:0Issues:3Issues:0

pairing-technical-challenge

Pairing Technical Challenge

Language:TypeScriptStargazers:0Issues:2Issues:0
Language:Jupyter NotebookStargazers:0Issues:0Issues:0

wolfi-dev-os

Main package repository for production Wolfi images

Language:CLicense:NOASSERTIONStargazers:0Issues:0Issues:0