xiaokc

xiaokc

Geek Repo

Location:Beijing

Github PK Tool:Github PK Tool

xiaokc's starred repositories

stable-diffusion

A latent text-to-image diffusion model

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:68111Issues:557Issues:713

CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Language:Jupyter NotebookLicense:MITStargazers:25593Issues:324Issues:402

Awesome-Chinese-LLM

整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。

annoy

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Language:C++License:Apache-2.0Stargazers:13202Issues:318Issues:399

nlp_course

YSDA course in Natural Language Processing

Language:Jupyter NotebookLicense:MITStargazers:9778Issues:363Issues:46

nlp_chinese_corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

text2vec

text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。

Language:PythonLicense:Apache-2.0Stargazers:4457Issues:31Issues:150

CLUEDatasetSearch

搜索所有中文NLP数据集,附常用英文NLP数据集

llm-attacks

Universal and Transferable Attacks on Aligned Language Models

Language:PythonLicense:MITStargazers:3408Issues:33Issues:95

SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821

Language:PythonLicense:MITStargazers:3399Issues:28Issues:269

DeepClustering

Methods and Implements of Deep Clustering

baby-llama2-chinese

用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.

Language:PythonLicense:MITStargazers:2511Issues:17Issues:76

gpt2-ml

GPT2 for Multiple Languages, including pretrained models. GPT2 多语言支持, 15亿参数中文预训练模型

Language:PythonLicense:Apache-2.0Stargazers:1716Issues:38Issues:89

Research

novel deep learning research works with PaddlePaddle

Language:PythonLicense:Apache-2.0Stargazers:1716Issues:48Issues:150

data-augmentation-review

List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.

OpenNMT-tf

Neural machine translation and sequence learning using TensorFlow

Language:PythonLicense:MITStargazers:1455Issues:63Issues:424

Awesome-LLM-Safety

A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide researchers, practitioners, and enthusiasts with insights into the safety implications, challenges, and advancements surrounding these powerful models.

Safety-Prompts

Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。

DataAug4NLP

Collection of papers and resources for data augmentation for NLP.

CValues

面向中文大模型价值观的评估与对齐研究

Language:PythonLicense:Apache-2.0Stargazers:472Issues:1Issues:7

Prompt-BERT

PromptBERT: Improving BERT Sentence Embeddings with Prompts

ChatPLUG

A Chinese Open-Domain Dialogue System

Language:PythonLicense:Apache-2.0Stargazers:313Issues:11Issues:15

Contrastive-Clustering

Code for the paper "Contrastive Clustering" (AAAI 2021)

Language:PythonLicense:MITStargazers:300Issues:5Issues:61

siamese-pytorch

Implementation of Siamese Networks for image one-shot learning by PyTorch, train and test model on dataset Omniglot

awesome-neural-adaptation-in-NLP

Awesome Neural Adaptation in Natural Language Processing. A curated list. https://arxiv.org/abs/2006.00632

COLDataset

The official repository of the paper: COLD: A Benchmark for Chinese Offensive Language Detection

SafeDecoding

Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding

Language:Jupyter NotebookLicense:MITStargazers:92Issues:1Issues:6

KeywordProcesser

使用python实现了一个简单的trie树结构,可增加/查找/删除关键词,用于中文文本的关键词匹配、停用词删除等。

Language:PythonStargazers:65Issues:3Issues:0

vae_for_text

Tensorflow implementation of Generating Sentences from a Continuous Space

Language:PythonStargazers:22Issues:1Issues:0