There are 7 repositories under unstructured-data topic.
The open-source tool for building high-quality datasets and computer vision models
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc.
Neo4j graph construction from unstructured data using LLMs
A curated list of resources for Document Understanding (DU) topic
Interact, analyze and structure massive text, image, embedding, audio and video datasets
A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.
Low-code ETL for structured and unstructured data. Generates Python code you can deploy anywhere.
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.
python implementation of jordansissel's grok regular expression library
Enforce structured output from LLMs 100% of the time
Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.
Home of the AI workforce - Multi-agent system, AI agents & tools
Accurate, private and configurable document retrieval LLM
Programming language for symbolic computation with unusual combination of pattern matching features: Tree patterns, associative patterns and expressions embedded in patterns.
A Jupyter notebook that uses the Watson Visual Recognition and Natural Language Understanding services to enrich Facebook Analytics and uses Cognos Dashboard Embedded to explore and visualize the results in Watson Studio
📺 Instill Console for 🔮 Instill Core: https://github.com/instill-ai/instill-core
Adansons Base is a data programming tool for error-analysis of training results. It organizes metadata of unstructured data and creates and organizes datasets. It makes dataset creation more effective and helps to find low-quality data by using the training results and improves AI performance.
Building Knowledge Graphs from Unstructured Text
⌨️ Instill CLI for 🔮 Instill Core: https://github.com/instill-ai/instill-core
⚗️ Instill Model contains components for AI model orchestration
⇋ A REST/gRPC server for Instill VDP API service
⇋ A REST/gRPC server for Instill Model API service
💙 Unstructured Data Connectors for Haystack 2.0
RL3 examples repository (information extraction, NER, NLP, web & text mining, etc).
Extract tabular information from scanned documents (PDF to CSV)
How do we process data in different formats like docx, pdf etc and generate insights to be linked with structured data in database?This pattern helps in establishing relations between structured & unstructured data to generate recommendations using Watson NLU & Watson Studio.
🔮 Instill Core contains components for supporting Instill VDP and Instill Model
The infoZilla unstructured software engineering data mining tool. It can find and extract source code regions, patches, stack traces, enumerations and itemizations from discussion threads.
A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension