daveshap / ACE_Framework

ACE (Autonomous Cognitive Entities) - 100% local and open source autonomous agents

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

COST TO RUN & DEPENDENCE on OpenAI

jayfalls opened this issue · comments

When watching Dave's demo of the project, a big standout were his remarks of timing out the API when just running the demo briefly, and seeing the amount of inferences that will need to be generated.

I don't think this limitation is necessary, and depending on a third party is not ideal. The limitation should rather be the amount of compute available, and getting this to run on consumer hardware would be the best.

As such, I suggest using the dolphin-2.1-mistral-7b model.
Specifically a quantised version that can run with a maximum ram requirement of only 7.63 GB and a download size of only 5.13gb.
Using the llama-cpp-python bindings, which meets the project requirements of only being in python.

There are benefits to doing it this way:

  • No dependence on a third party for the LLM (THE MOST ESSENTIAL COMPONENT)
  • No cost besides the electricity bill, and obviously upfront hardware cost

And benefits to this model specifically

  • Higher benchmark performance than LLama 70B
  • Apache 2.0, meaning commercially viable
  • Completely uncensored, which gives it higher performance and higher compliance to the system and user prompts
  • Small model, which means higher performance and lower memory requirements
  • Quantised model, which means it can run with a maximum ram requirement of 7.63 GB
  • GGUF format, which has massive support for many different bindings, with CPU/GPU or CPU&GPU support

This is just a suggestion, and this model will become outdated within the week.
But I think that this is truly the right way to go.

This does not belong here. Please move it to the Discussions tab.