Zekai Zhou (zzk0910)

zzk0910

Geek Repo

Company:National University of Singapore

Location:Singapore

Github PK Tool:Github PK Tool

Zekai Zhou's starred repositories

DDParser

百度开源的依存句法分析系统

Language:PythonLicense:Apache-2.0Stargazers:973Issues:0Issues:0

GloVe_Chinese_word_embedding

根据维基中文语料库预训练 GloVe 中文词向量;Pre-train GloVe word-embedding From Chinese Wiki corpus

Language:ShellStargazers:63Issues:0Issues:0

Chinese-Word-Vectors

100+ Chinese Word Vectors 上百种预训练中文词向量

Language:PythonLicense:Apache-2.0Stargazers:11734Issues:0Issues:0

LangueOne

练习题︱基于今日头条开源数据的文本挖掘

Language:PythonStargazers:84Issues:0Issues:0

bm25_pt

minimal pytorch implementation of bm25 (with sparse tensors)

Language:PythonLicense:MITStargazers:80Issues:0Issues:0

rank_bm25

A Collection of BM25 Algorithms in Python

Language:PythonLicense:Apache-2.0Stargazers:955Issues:0Issues:0

UltraChat

Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)

Language:PythonLicense:MITStargazers:2193Issues:0Issues:0

PoetryLibrary

**诗词歌赋数据库 总计82万余首(827108) CSV 格式 简体中文 按照number有序

License:MITStargazers:60Issues:0Issues:0

Firefly

Firefly: 大模型训练工具,支持训练Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型

Language:PythonStargazers:5493Issues:0Issues:0

TigerBot

TigerBot: A multi-language multi-task LLM

Language:PythonLicense:Apache-2.0Stargazers:2226Issues:0Issues:0

LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Language:PythonLicense:MITStargazers:10086Issues:0Issues:0

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonLicense:NOASSERTIONStargazers:9693Issues:0Issues:0

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonLicense:NOASSERTIONStargazers:1300Issues:0Issues:0

examples

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

Language:PythonLicense:BSD-3-ClauseStargazers:22149Issues:0Issues:0

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonLicense:Apache-2.0Stargazers:34347Issues:0Issues:0

awesome-llm-and-aigc

🚀🚀🚀A collection of some awesome public projects about Large Language Model, Vision Foundation Model and AI Generated Content.

Stargazers:503Issues:0Issues:0

py-googletrans

(unofficial) Googletrans: Free and Unlimited Google translate API for Python. Translates totally free of charge.

Language:PythonLicense:MITStargazers:3821Issues:0Issues:0

JioNLP

中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com

Language:PythonLicense:Apache-2.0Stargazers:3213Issues:0Issues:0

dragnet

Just the facts -- web page content extraction

Language:PythonLicense:MITStargazers:1238Issues:0Issues:0

newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

Language:PythonLicense:MITStargazers:13980Issues:0Issues:0

lxml

The lxml XML toolkit for Python

Language:PythonLicense:NOASSERTIONStargazers:2637Issues:0Issues:0

cc_net

Tools to download and cleanup Common Crawl data

Language:PythonLicense:MITStargazers:950Issues:0Issues:0

fastText

Library for fast text representation and classification.

Language:HTMLLicense:MITStargazers:25795Issues:0Issues:0

LLaMA-Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonLicense:NOASSERTIONStargazers:66Issues:0Issues:0

Chinese-Names-Corpus

中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。

License:Apache-2.0Stargazers:3920Issues:0Issues:0

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonLicense:Apache-2.0Stargazers:3556Issues:0Issues:0

gpt-neo

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

Language:PythonLicense:MITStargazers:8193Issues:0Issues:0

hh-rlhf

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

License:MITStargazers:1536Issues:0Issues:0

Cornucopia-LLaMA-Fin-Chinese

聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)

Language:PythonLicense:Apache-2.0Stargazers:579Issues:0Issues:0

Luotuo-Chinese-LLM

骆驼(Luotuo): Open Sourced Chinese Language Models. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:3627Issues:0Issues:0