This repository hosts a diverse NLP dataset comprising 1,000 stories spanning 100 genres for comprehensive language understanding tasks.
license: cc-by-2.0 (Personal or commercial use but give attribution)
language:
- en
size_categories:
- 1K<n<10K
pretty_name: Thousand Stories, Hundred Genres
task_categories:
- summarization
- text-generation
- text-classification
tags:
- data science
- Storytelling
- Genre Classification
- NLP
- LLM
- Deep Learning
This dataset contains 1000 stories spanning 100 different genres. Each story is represented in a tabular format using a dataframe. The dataset includes unique IDs, titles, and the content of each story.
The list of all genres can be found in the genres.txt file.
reading genre_list variable
with open('story_genres.pkl', 'rb') as f:
story_genres = pickle.load(f)
Sample of genre list:
genres = ['Sci-Fi', 'Comedy', ...]
The dataset is structured in the following format:
- id: Unique identifier for each dataframe.
- title: Title of the story.
- story: The content of the story.
- genre: The genre of the story.
id | title | story | genre |
---|---|---|---|
25235 | The Unseen Miracle | It was a stormy night in ... | Horror |
... | ... | ... | ... |
- Title: 6 words
- Story: 960 words
This dataset is licensed under the cc-by-2.0