ryan-air / Custom-Alpaca-Style-Dataset-Generation

This project aimed to address the requirements of the company's Language Model (LLM) training by leveraging Stanford's Dataset System and OpenAI's powerful language model. The objective was to create a more suitable and domain-specific dataset to enhance the LLM's capabilities for the company's intern-assigned tasks.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Custom-Alpaca-Style-Dataset-Generation

Professional work-related project

Project Overview

Leveraging Stanford's Dataset System and OpenAI's Language Model for Enhanced Dataset Generation

This project aimed to address the requirements of the company's Language Model (LLM) training by leveraging Stanford's Dataset System and OpenAI's powerful language model. The objective was to create a more suitable and domain-specific dataset to enhance the LLM's capabilities for the company's intern-assigned tasks.

This was achieved by manually creating a few question-response examples related to the company's particular requirements and field of work and then running through OpenAI to generate similar questions and responses to create the dataset. This dataset can then be used for training an LLM.

About

This project aimed to address the requirements of the company's Language Model (LLM) training by leveraging Stanford's Dataset System and OpenAI's powerful language model. The objective was to create a more suitable and domain-specific dataset to enhance the LLM's capabilities for the company's intern-assigned tasks.

License:MIT License


Languages

Language:Jupyter Notebook 69.4%Language:Python 30.6%