chowfi / FineTune-LLM-OnlineRL

Fine-tuning LLM agents w online RL for XiangQi (Chinese Chess)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is a group project developed by a team of three individuals.

FineTune-LLM-OnlineRL

Game: Xiangqi

Main Idea - Fine-tuning LLM Agent with Online RL (PPO & LoRA) :

  1. Pre-trained LLMs are used as starting policy for RL agent
  2. Observations from environments are converted to text
  3. Text observations triggers an action and subsequently updates the RL agent’s policy

Other Methods Implemented:

  1. Random
  2. Greedy
  3. DQN
  4. DDQN

About

Fine-tuning LLM agents w online RL for XiangQi (Chinese Chess)


Languages

Language:Jupyter Notebook 100.0%