There are 1 repository under llm-datasets topic.
collection of text2cypher datasets, evaluations, and finetuning instructions
Repository for organizing datasets and papers used in Open LLM.
A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ
A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks
Efficiently fetch and perform sentiment analysis (Turkish Only) on eksisozluk.com entries using Rust
A bunch of very famous repos source code's in python as pure localdocs all in this repo to train CODE AI
Synthetically Generating Intent-Aware Information-Seeking Dialogues! Useful for various tasks such as training/evaluating User Intent Predictors with the possibility to training/evaluating on real human dialogues. The backbone LLM of SOLID is Zephyr-7b-beta.
WikiText syntax dataset generation pipeline and open dataset for auto UI generation in TiddlyWiki. (WIP)
PARROT (Performance Assessment of Reasoning and Responses On Trivia) is a novel benchmarking framework designed to evaluate Large Language Models (LLMs) on real-world, complex, and ambiguous QA tasks.
A modified dataset consisting of English dialogs between a user and an assistant discussing movie preferences in natural language.
Collection of ETL scripts used to create a dataset of text in Spanish to train Large Language Models.