Jeffrey Wang's repositories
Airflow-PaddleOCR-test
storing script test of paddleocr for airflow docker paddleocr architecture
anthropic-cookbook
A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.
awesome-data-centric-ai
Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖
cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
data-engineering-zoomcamp
Free Data Engineering course!
extractnet
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
justsubs
Download subtitles from YouTube as plain text.
Linux-tools
A few Linux tools and useful scripts.
neuralforecast
Scalable and user friendly neural :brain: forecasting algorithms.
nist-crc-2023
NIST Collaborative Research Cycle on Synthetic Data
POC-LSTM-sigmoid-labelling
Proof of concept for signal labelling using a Pytorch LSTM nn
ragas
SOTA metrics for evaluating Retrieval Augmented Generation (RAG) pipelines
RepoToText
Turn an entire GitHub Repo into a single organized .txt file to use with LLM's (GPT-4, Claude Opus, Gemini, etc)
Simplified_Langchain_proofofconcept
Quick Test Overview Demo of a simplified, scalable, but service dependent question-answering utilizing Astra DB and LangChain, enhanced by Vector Search.
text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
Youtube-Video-Summarizer
TESTING From youtube link, to text, and through chatGPT: summarizes key points of a video. This is made for analysis videos, podcasts, etc.