Heng-Shiou Sheu's starred repositories
LLMs-from-scratch
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
Stirling-PDF
#1 Locally hosted web application that allows you to perform various operations on PDF files
Chinese_spelling_Correction
Chinese Grammar Error and Spelling Error Correction System - 中文文法錯誤及錯別字校正系統
AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Large_dataset_translator
Translate large dataset to any language with google translation api and multithread processing, no key required !
Adaptive-MT-LLM-Fine-tuning
Fine-tuning Mistral LLM for Adaptive Machine Translation
mt-metrics-eval
Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.
VoiceStreamAI
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
transformers_tasks
⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.
aya-annotations-ui
Web UI & Backend for Data Annotations in Aya
instructor
structured outputs for llms
how-to-train-tokenizer
怎么训练一个LLM分词器
MoneyPrinter
Automate Creation of YouTube Shorts using MoviePy.
OneRingTranslator
Simple REST service to translate texts. Plugins. Automatic calculate BLEU/COMET metrics of translation quality.
self-translate
Do Multilingual Language Models Think Better in English?
List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
List of Dirty, Naughty, Obscene, and Otherwise Bad Words
inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
augmentoolkit
Convert Compute And Books Into Instruct-Tuning Datasets