송영숙's repositories
AwesomeKorean_Data
한국어 데이터 세트 링크
Chatbot_data
Chatbot_data_for_Korean
single_turn_dialogue
사전에서 대화 예문만 추출한 데이터
2020LangconOnOff
자연어 처리 데이터에게 길을 묻다.
Awesome_GhatGPT_News
유용한 ChatGPT 블로그 글들 모음
ConceptSpeechMood
단어 집합과 화행을 이용한 gpt-3.5-turbo 모델 생성 결과 품질 통제(Quality Control) 데이터 세트
DAKSA-Domain_Adaptation_in_Korean_Speech_Act
Cross-Domain Speech Act Adaptation and Analysis
Awosome_KOITblog
한국어 기반의 기술 블로그
parsing_json
모두의 말뭉치 파싱 코드 예시
CodeMixed-Text-Generator
This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.
Ko-ATOMIC
Korean Commonsense Knowledge Graph
KoChatGPT
ChatGPT의 RLHF를 학습을 위한 3가지 step별 한국어 데이터셋
Korean-CommonGen
[Findings of NAACL2022] A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation
koSpeechAct
#Generate natural language sentences that reflect speech acts
LLMLingua
To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
NeurIPS-2022-Submission-3358
This is the code for the Submission 3358 at NeurIPS 2022.
project-dialogism-novel-corpus
The official repository for the The Project Dialogism Novel Corpus, a dataset of annotated quotations in full-length English novels.
UnethicalQuestionsKor
ethicalVsUnethicalQuestionsKor로 데이터 증강 필요
XSum
Topic-Aware Convolutional Neural Networks for Extreme Summarization