Shu Li Zheng's starred repositories
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
boilerpipe
Work in progress transmit from Google Code
readability
A standalone version of the readability lib
Html2Article
Html网页正文提取
html-extractor
基于行块分布函数的通用网页正文抽取算法优化,Python实现
AlphaCodium
Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
List of Dirty, Naughty, Obscene, and Otherwise Bad Words
amber-data-prep
Data preparation code for Amber 7B LLM