oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature Request: Anti-slop / fine tuning of a model output in realtime / on the fly for output quality enhancement.

David-AU-github opened this issue · comments

Feature Description

Basically this enhancement fixes model generation on the fly so to speak, and drastically improves the performance of any model
for specific tasks.

Although this is more involved than the "XTC" enhancement, this one is far stronger and will allow users to control the quality of generation of any model at the root level customized to their use case(s).

The occurs on the word, phrase level rather than per token level... roughly it forces the model to regenerate token(s) when generated token(s) match word(s)/phrase(s) in a json file. Like a live fire "regen" forcing the model to generate better/stronger output during generation.

Roughly is a live "proof" reading / editing during generation... but it seems to do more than that based on examples at the project website.

This may reduce T/S but increase quality output.
On smaller models this would allow them to perform well outside their "size" so to speak.

In some ways enhancement almost acts like "live fine tuning" of a model.

This is the project online for this enhancements:
https://github.com/sam-paech/antislop-sampler

Here is a detailed test at EQBENCH using this enhancement on a Gemma 9B top rated model:
https://eqbench.com/results/creative-writing-v2/Gemma-2-Ataraxy-v2-9B%20[antislop].txt
Motivation

Drastic improvement of output quality on the fly in real time, tunable by the use / use case.
Smaller and mid size model improvement across the board.
Larger models would also benefit - could result in drastic leap in coherence / generation and maybe logic solving improvement.

This could match or exceed closed source model performance depending on implementation.
The quality enhancement at small / mid-sized models can not be overstated.

Roughly allows fine tuning of a model at the user / case level.

On the user level: Far fewer (if any) "regens" to get good output quality.
That all by itself is a game changer.
This would drastically improve user / end user experience across the board.
Possible Implementation

Based on the project online and noted ; one or more "user defined" text files / json for specific use case(s).
This could be a "config.json" at the "source level" before quanting and/or post use / post quant.

Option to download hugging face, one or more of these files to augment usage of model(s) much like a dataset is used to fine tune a model.

I would suggest implementing this as a minimum to allow one or more files to be selected at the quantize (similar to embedding "config.json") in the GGUF step and/or inference step(s).

IE:
--optimize creative.json

OR when you "run" the model:

--enhance creative.json

Which would then run "creative.json" live as per project noted here during output generation:
https://github.com/sam-paech/antislop-sampler

NOTE: Also submitted this at llamacpp , no idea if it will be accepted.