dlmacedo

David Macêdo, PhD's starred repositories

llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Language:Jupyter NotebookApache-2.032733 340 61

ollama

Get up and running with Llama 2, Mistral, and other large language models locally.

Language:GoMIT32412 226 1238

stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

Language:PythonApache-2.029018 341 267

tinygrad

You like pytorch? You like micrograd? You love tinygrad! ❤️

Language:PythonMIT24592 267 628

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.020742 195 2962

LLMs-from-scratch

Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step

Language:Jupyter NotebookNOASSERTION18956 220 41

gpt-crawler

Crawl a site to generate knowledge files to create your own custom GPT from a URL

Language:TypeScriptISC18026 118 110

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

Language:PythonApache-2.014224 129 3278

DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Language:Jupyter Notebook12795 296 820

dspy

DSPy: The framework for programming—not prompting—foundation models

Language:PythonMIT12540 116 507

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Language:PythonApache-2.011957 96 1018

latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Language:Jupyter NotebookMIT10874 97 333

taipy

Turns Data and AI algorithms into production-ready web applications in no time.

Language:PythonApache-2.09456 61 570

streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Language:PythonMIT6299 61 76

BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

Language:PythonMIT5686 52 1625

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonBSD-3-Clause5267 61 87

amazon-dsstne

Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models

Language:C++Apache-2.04414 341 108

Anima

33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU

Language:Jupyter NotebookApache-2.03401 98 131

KeyBERT

Minimal keyword extraction with BERT

Language:PythonMIT3277 32 191

Top2Vec

Top2Vec learns jointly embedded topic, document and word vectors.

Language:PythonBSD-3-Clause2864 38 327

ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)

Language:PythonMIT2587 41 250

RAGatouille

Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.

Language:PythonApache-2.02302 22 150

sparrow

Data processing with ML and LLM

Language:PythonGPL-3.02149 35 50

dialoqbase

Create chatbots with ease

Language:TypeScriptMIT1491 25 152

ATLAS

A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxiv.org/abs/2312.16171

Language:PythonApache-2.0816 20 7

tab-ddpm

[ICML 2023] The official implementation of the paper "TabDDPM: Modelling Tabular Data with Diffusion Models"

Language:PythonMIT338 6 33

URIAL

Language:PythonApache-2.0256 1 7

awesome-graph-generation

233 60

boolformer

Language:PythonMIT157 5 1

ForestDiffusion

Generating and Imputing Tabular Data via Diffusion and Flow XGBoost Models

Language:Python108 7 10