llm-datasets

There are 1 repository under llm-datasets topic.

neo4j-labs / text2cypher
collection of text2cypher datasets, evaluations, and finetuning instructions
cypher cypher-query-language graph llm llm-datasets llm-training llms neo4j text2cypher
Language:Jupyter Notebook 139
dsdanielpark / open-llm-datasets
Repository for organizing datasets and papers used in Open LLM.
datasets large-language-models llm llm-datasets llm-training natural-language-processing
91
discus-labs / discus
A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ
fine-tuning fine-tuning-llm gpt gpt-4 huggingface-transformers large-language-models llm-datasets llm-training llms ner-data openai python synthetic-data synthetic-dataset-generation
Language:Python 63
asimsinan / LLM-Research
A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks
arxiv-papers buyuk-dil-modelleri large-language-models llm llm-benchmarking llm-datasets llm-frameworks llm-research llm-theses llm-tools llms
Language:Python 37
altunenes / rustysozluk
Efficiently fetch and perform sentiment analysis (Turkish Only) on eksisozluk.com entries using Rust
duyguanalizi eksi-sozluk eksisozluk llm-datasets llm-training reqwest rust rust-lang rust-scraping scraper sentiment-analysis turkish webscraping
Language:Rust 7
DefinetlyNotAI / LLM_Data
A bunch of very famous repos source code's in python as pure localdocs all in this repo to train CODE AI
c code-examples cpp cuda data data-dum jupyter-notebook llm llm-code llm-datasets programming-data programming-data-sets python3
Language:Python 3
arian-askari / SOLID
Synthetically Generating Intent-Aware Information-Seeking Dialogues! Useful for various tasks such as training/evaluating User Intent Predictors with the possibility to training/evaluating on real human dialogues. The backbone LLM of SOLID is Zephyr-7b-beta.
conversational-ai dataset-generation intent-classification llm-datasets llm-inference llm-training llm-conversations llm-dialogs solid intent-aware-conversation-generation solid-rl zephyr-7b-beta
Language:Python 2
tiddly-gittly / TiddlyWiki-LLM-dataset
WikiText syntax dataset generation pipeline and open dataset for auto UI generation in TiddlyWiki. (WIP)
dataset llm tiddlywiki wikitext llm-datasets llm-training
Language:TypeScript 2
redblock-ai / parrot-python
PARROT (Performance Assessment of Reasoning and Responses On Trivia) is a novel benchmarking framework designed to evaluate Large Language Models (LLMs) on real-world, complex, and ambiguous QA tasks.
benchmarking-framework llm-benchmarking llm-datasets llm-inference llm-qa-document
Language:Python 1
aloobun / basedUX
minimal dataset conisting og 363 Human & Assitant dialogs
dataset llm-datasets
0
aloobun / ccpem-modified
A modified dataset consisting of English dialogs between a user and an assistant discussing movie preferences in natural language.
dataset llm-datasets
0
jsurrea / LLM-Latino
Collection of ETL scripts used to create a dataset of text in Spanish to train Large Language Models.
etl-pipeline google-cloud-platform llm-datasets python web-scraping
Language:Python

llm-datasets

neo4j-labs / text2cypher

dsdanielpark / open-llm-datasets

discus-labs / discus

asimsinan / LLM-Research

altunenes / rustysozluk

DefinetlyNotAI / LLM_Data

arian-askari / SOLID

tiddly-gittly / TiddlyWiki-LLM-dataset

redblock-ai / parrot-python

aloobun / basedUX

aloobun / ccpem-modified

jsurrea / LLM-Latino