sleepingcat4 / datasets

a project of sleeping ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sleeping AI has launched Datasets, an ambitious project to compile high-quality datasets for artificial intelligence use, mainly in the fields of natural language processing, multimodality, and computer vision. Our first product, Ox-debate, provides audio recordings of debate sessions, public interviews, and events held at the Oxford Union in the past 12 years, which have been uploaded to their official YouTube channel. We have uploaded data up to July 15, 2023. The reason for this project is that there are many high-quality datasets available in the NLP field, but they are all very similar in terms of their uniqueness. They provide data on a wide variety of topics in a generalised manner. We wanted to change that, which is why we are releasing Ox-debate. Ox-debate is an extremely high-quality dataset that features experts and highly knowledgeable speakers providing their opinions in the form of debates at the Union. The style of speech, tone, and delivery make it an even more amazing dataset from an NLP perspective.

Ox-debate (Kaggle link): https://www.kaggle.com/datasets/sleepingcat4/ox-debate (size: 13GB)

The uniqueness of the dataset lies in the following:
  • It provides expert opinions and discussions.
  • It features a constructed form of speech and persuasive conversation pattern.
We hope that Ox-debate will be a valuable resource for the AI community by:
  1. Training large language models on expert opinions and persuasive language.
  2. Providing control over the tone of the model.
  3. Identifying the correct sentiment of the speaker from persuasive language.
  4. Understanding passive bias in expert opinions.
  5. Evaluating the mental state of speakers from their voice.
Additionally, it is valuable for linguistics research by:
  1. Understanding linguistic jargon.
  2. Analysing tone and signature habits, as well as the progression of language over decades.

In the coming weeks, we'll release more in-depth datasets on exotic topics with high-quality.

About

a project of sleeping ai

License:GNU General Public License v3.0