淡定's repositories
vector_search
各种向量搜索工具
bert4torch
参考bert4keras的pytorch实现
Chinese-Word-Vectors
100+ Chinese Word Vectors 上百种预训练中文词向量
data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
DeepLearning-500-questions
深度学习500问,以问答形式对常用的概率知识、线性代数、机器学习、深度学习、计算机视觉等热点问题进行阐述,以帮助自己及有需要的读者。 全书分为18个章节,50余万字。由于水平有限,书中不妥之处恳请广大读者批评指正。 未完待续............ 如有意合作,联系scutjy2015@163.com 版权所有,违权必究 Tan 2018.06
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
EasyNLP
EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit
DB-GPT
Revolutionizing Database Interactions with Private LLM Technology
DB-GPT-Hub
A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance, especially in Text-to-SQL.
k8s_images
k8s镜像仓库
kserve
Serverless Inferencing on Kubernetes
MrDoc
mrdoc,online document system developed based on python. It is suitable for individuals and small teams to manage documents, wiki, knowledge and notes. 觅思文档,适合于个人和中小型团队的在线文档、知识库系统。
NLP-Loss-Pytorch
Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al
public-apis
A collective list of free APIs
pytorch-loss
label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful
Scorecard-Bundle
A High-level Scorecard Modeling API | 评分卡建模尽在于此
scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
sentence-transformers
Multilingual Sentence & Image Embeddings with BERT
Tabular-LLM
本项目旨在收集开源的表格智能任务数据集(比如表格问答、表格-文本生成等),将原始数据整理为指令微调格式的数据并微调LLM,进而增强LLM对于表格数据的理解,最终构建出专门面向表格智能任务的大型语言模型。
text2vec
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
text_classification
使用rnn,lstm,gru,fasttext,textcnn,dpcnn,rnn-att,lstm-att,兼容huggleface/transformers,以及以transforemrs作为词嵌入模型,后面接入cnn、rnn、attention等等做文本分类。以及各个模型的对比
torchrec
Pytorch domain library for recommendation systems
unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
vocab-coverage
语言模型中文认知能力分析
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow