Work with ollama, return error "Generation step failed -- too many retries"

Question

Work with ollama, return error "Generation step failed -- too many retries"

devdimit93 opened this issue 3 months ago · comments

I installed augmentoolkit for windows 10 and anaconda, but have not correct results in output folder
I installed it with next commands.

git clone https://github.com/e-p-armstrong/augmentool.git
pip install protobuf sentencepiece transformers matplotlib nltk openai
cd augmentool
python processing.py

When I tested it with any documents, ever with sample document from raw_txt_input folder, it return next error result

2024-03-17 17:56:09,572 - ERROR - Above prompt resulted in error, probably the model's fault: Invalid \escape: line 48 column 46 (char 13094)
Traceback (most recent call last):
  File "C:\Users\Dmitrii\yolo_3d\augmentool\augmentoolkit\generation_functions\generation_step_class.py", line 119, in generate
    messages = json.loads(prompt_formatted)
  File "E:\anaconda\envs\llm_to_dataset\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "E:\anaconda\envs\llm_to_dataset\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "E:\anaconda\envs\llm_to_dataset\lib\json\decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid \escape: line 48 column 46 (char 13094)
Q ERROR: Generation step failed -- too many retries!
Traceback (most recent call last):
  File "C:\Users\Dmitrii\yolo_3d\augmentool\augmentoolkit\control_flow_functions\control_flow_functions.py", line 1256, in generate_qatuples_from_para
    ) = await qatuples_generator.generate(
  File "C:\Users\Dmitrii\yolo_3d\augmentool\augmentoolkit\generation_functions\generation_step_class.py", line 141, in generate
    raise Exception("Generation step failed -- too many retries!")
Exception: Generation step failed -- too many retries!
100%|████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00,  9.46it/s]
-------------- QUESTIONS CREATED ------------- STATS SO FAR (may be wrong if run was continued from interruption):
Nones: 0
Non-nones: 0
Total: 0
---------------- ONTO EXAMPLES GENERATION-------------------
-------------- QUESTIONS REVISED ------------- STATS SO FAR:
Nones: 0
Non-nones: 0
Total: 0
---------------- ONTO EXAMPLES GENERATION-------------------
0it [00:00, ?it/s]
0it [00:00, ?it/s]
Conversion complete. Master list written to ./output/master_list.jsonl. Simplified data written to ./output/simplified_data.jsonl.
Conversion complete. The processed master list is written to 'processed_master_list.json'.

It returns result texts in console, but has not correct results in outputs folder. Also sometimes program return error beetween text blocks. Sample of error

  {
    "role": "user",
    "content": "Text details: ./raw_txt_input\on_war_clausewitz\\n\\nText to make questions from: \\n\"\"\"\\nNeither of these thinkers was concerned with the ethics of the struggle which each studied so exhaustively, but to both men the phase or condition presented itself neither as moral nor immoral, any more than are famine, disease, or other natural phenomena, but as emanating from a force inherent in all living organisms which can only be mastered by understanding its nature. It is in that spirit that, one after the other, all the Nations of the Continent, taught by such drastic lessons as Koniggrätz and Sedan, have accepted the lesson, with the result that to-day Europe is an armed camp, and peace is maintained by the equilibrium of forces, and will continue just as long as this equilibrium exists, and no longer. Whether this state of equilibrium is in itself a good or desirable thing may be open to argument. I have discussed it at length in my \"War and the World's Life\"; but I venture to suggest that to no one would a renewal of the era of warfare be a change for the better, as far as existing humanity is concerned. Meanwhile, however, with every year that elapses the forces at present in equilibrium are changing in magnitude--the pressure of populations which have to be fed is rising, and an explosion along the line of least resistance is, sooner or later, inevitable.\\n\"\"\""
  }
]
2024-03-17 17:56:09,556 - ERROR - Above prompt resulted in error, probably the model's fault: Invalid \escape: line 48 column 46 (char 13094)
Traceback (most recent call last):
  File "C:\Users\Dmitrii\yolo_3d\augmentool\augmentoolkit\generation_functions\generation_step_class.py", line 119, in generate
    messages = json.loads(prompt_formatted)
  File "E:\anaconda\envs\llm_to_dataset\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "E:\anaconda\envs\llm_to_dataset\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "E:\anaconda\envs\llm_to_dataset\lib\json\decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid \escape: line 48 column 46 (char 13094)
2024-03-17 17:56:09,561 - ERROR - Error in Generation Step: Invalid \escape: line 48 column 46 (char 13094)
[
  {
    "role": "system",
    "content": "You are an expert educational AI that, given a paragraph or two from a text, will create suitable educational questions based on the paragraphs, and *only* based on the paragraphs. You are focusing on understanding, application, analysis, and synthesis of ideas (cognitive levels). The questions you create will lean towards longer, more difficult questions that require some thought to solve вЂ” but can still be solved given the paragraphs provided. Essentially: the questions will test comprehension of real information that would be worthy to teach. After the question, you will also write its answer.\\n\\nDo not explicitly mention the paragraphs in the questions themselves вЂ” just ask about the concepts related to the questions. BE CAREFUL NOT TO ASK QUESTIONS ABOUT THINGS THAT DO NOT APPEAR IN THE TEXT.\\n\\nYou will not mention the text explicitly in any questions you think of, since the questions you generate are intended to test people's knowledge of the information вЂ” when given the questions, they WILL NOT HAVE THE TEXT ON HAND, and so if you mention the author they won't have a clue what you're talking about."
  },

Here is my config.yaml

PATH:
  INPUT: "./raw_txt_input"
  OUTPUT: "./output"
  DEFAULT_PROMPTS: "./prompts" # the baseline prompt folder that Augmentoolkit falls back to if it can't find a step in the PROMPTS path
  PROMPTS: "./prompts" # Where Augmentoolkit first looks for prompts
API:
  API_KEY: "ollama" # Add the API key for your favorite provider here
  BASE_URL: "http://localhost:11434/v1/" # add the base url for a provider, or local server, here. Some possible values:  http://127.0.0.1:5000/v1/ # <- local models. # https://api.together.xyz # <- together.ai, which is real cheap, real flexible, and real high-quality, if a tad unreliable. # https://api.openai.com/v1/ # <- OpenAI. Will bankrupt you very fast. # anything else that accepts OAI-style requests, so basically any API out there (openrouter, fireworks, etc etc etc...)
  LOGICAL_MODEL: "llama2" # model used for everything except conversation generation at the very end
  LARGE_LOGICAL_MODEL: "llama2" # model used for conversation generation at the very end. A pretty tough task, if ASSISTANT_MODE isn't on.
SYSTEM:
  USE_FILENAMES: True # give the AI context from the filenames provided to it. Useful if the filenames are meaningful, otherwise turn them off.
  ASSISTANT_MODE: True # If False, the conversations generated are between a user and an AI assistant. If True, the generated convs are between fictional characters in historical or fictional settings, with randomized personalities (some are nsfw by default, because a lot of model creators make models for that purpose. Change this (or amplify it) in ./augmentoolkit/generation_functions/special_instructions.py, it only requires changes to some strings.)
  DOUBLE_CHECK_COUNTER: 1 # How many times to check a question and answer pair during each validation step. Majority vote decides if it passes that step. There are three steps. So most questions are by default checked around 9 times (fewer if the first two checks for a step pass, obviously).
  USE_SUBSET: True # Whether to take only the first 13 chunks from a text during the run. Useful for experimenting and iterating and seeing all the steps without costing too much money or time.
  REARRANGEMENTS_TO_TAKE: 1 # How many times to rearrange the questions and answers for generating different conversations from the same group of questions and answers.
  CONCURRENCY_LIMIT: 2 # Hard limit of how many calls can be run at the same time, useful for API mode (aphrodite automatically manages this and queues things, as far as I know)
  COMPLETION_MODE: False # Change to false if you want to use chat (instruct) mode; this requires .json files in your chosen prompts directory, in the OpenAI API format. Not all APIs support completion mode.
  MODE: "api" # can be one of "api"|"aphrodite"
  GRAPH: False

I tried ollama with llama2 and mixtral models, reduced CONCURRENCY_LIMIT, DOUBLE_CHECK_COUNTER, tried reduce "max_tokens" and completion_mode and retries in generation_step_class.py. I got same error.
Also tried previous commit - also same error result.

suridol · Answer 1 · Tue Mar 19 2024 20:48:06 GMT+0800 (China Standard Time)

Any updates on this issue, It too face the same problem.

Evan Armstrong · Answer 2 · Wed Mar 20 2024 04:25:14 GMT+0800 (China Standard Time)

It looks like this might be an issue with windows paths having backslashes in them, which causes invalid escape characters to be inadvertantly used when the file paths are read (and they contain backslashes). A quick workaround would be to turn off USE_FILENAMES in the config because that will stop the file paths from appearing in the prompt, meaning they won't be read by the json formatter. I'll try to get on the actual fix for this later today when I have more time and am not at work. Thanks for the bug report!

Evan Armstrong · Answer 3 · Wed Mar 27 2024 10:02:06 GMT+0800 (China Standard Time)

I think the latest push will have solved this (should solve all future JSON decode issues in fact). Please let me know if it works now after pulling @suridol @devdimit93

akoyaki ayagi · Answer 4 · Tue Apr 23 2024 00:11:08 GMT+0800 (China Standard Time)

I had same issue but looks like it cuz cant connected to api (i used tabbyapi OAI as backend, http://127.0.0.1:5000/v1/, it work on Sillytavern OAI mode)

  File "/home/omni/miniconda3/envs/tb/lib/python3.11/site-packages/openai/_base_client.py", line 1548, in _request
    raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.
  0%|                                                                                            | 0/13 [00:07<?, ?it/s]
Traceback (most recent call last):
  File "/home/omni/exllamav2/augmentool/processing.py", line 440, in <module>
    asyncio.run(main())
  File "/home/omni/miniconda3/envs/tb/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/omni/miniconda3/envs/tb/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/omni/miniconda3/envs/tb/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/omni/exllamav2/augmentool/processing.py", line 189, in main
    await control_flow_functions.filter_all_questions(
  File "/home/omni/exllamav2/augmentool/augmentoolkit/control_flow_functions/control_flow_functions.py", line 1462, in filter_all_questions
    await future
  File "/home/omni/miniconda3/envs/tb/lib/python3.11/asyncio/tasks.py", line 615, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "/home/omni/exllamav2/augmentool/processing.py", line 105, in run_task_with_limit
    return await task
           ^^^^^^^^^^
  File "/home/omni/exllamav2/augmentool/augmentoolkit/control_flow_functions/control_flow_functions.py", line 1358, in determine_worthy
    judgement = await judge.generate(arguments={"text": p[0], "textname": p[1]})
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/omni/exllamav2/augmentool/augmentoolkit/generation_functions/generation_step_class.py", line 152, in generate
    raise Exception("Generation step failed -- too many retries!")
Exception: Generation step failed -- too many retries!

here is config.yaml

PATH:
  INPUT: "./raw_txt_input"
  OUTPUT: "./output"
  DEFAULT_PROMPTS: "./prompts" # the baseline prompt folder that Augmentoolkit falls back to if it can't find a step in the PROMPTS path
  PROMPTS: "./prompts" # Where Augmentoolkit first looks for prompts
API:
  API_KEY: "b4c8725b9cfa36e1b76666dcf70f553d" # Add the API key for your favorite provider here
  BASE_URL: "http://127.0.0.1:5000/v1/" # add the base url for a provider, or local server, here. Some possible values:  http://127.0.0.1:5000/v1/ # <- local models. # https://api.together.xyz # <- together.ai, which is real cheap, real flexible, and real high-quality, if a tad unreliable. # https://api.openai.com/v1/ # <- OpenAI. Will bankrupt you very fast. # anything else that accepts OAI-style requests, so basically any API out there (openrouter, fireworks, etc etc etc...)
  LOGICAL_MODEL: "command-r-p-4.5bpw" # model used for everything except conversation generation at the very end
  LARGE_LOGICAL_MODEL: "command-r-p-4.5bpw" # model used for conversation generation at the very end. A pretty tough task, if ASSISTANT_MODE isn't on.
SYSTEM:
  USE_FILENAMES: False # give the AI context from the filenames provided to it. Useful if the filenames are meaningful, otherwise turn them off.
  ASSISTANT_MODE: False # If False, the conversations generated are between a user and an AI assistant. If True, the generated convs are between fictional characters in historical or fictional settings, with randomized personalities (some are nsfw by default, because a lot of model creators make models for that purpose. Change this (or amplify it) in ./augmentoolkit/generation_functions/special_instructions.py, it only requires changes to some strings.)
  DOUBLE_CHECK_COUNTER: 3 # How many times to check a question and answer pair during each validation step. Majority vote decides if it passes that step. There are three steps. So most questions are by default checked around 9 times (fewer if the first two checks for a step pass, obviously).
  USE_SUBSET: True # Whether to take only the first 13 chunks from a text during the run. Useful for experimenting and iterating and seeing all the steps without costing too much money or time.
  REARRANGEMENTS_TO_TAKE: 3 # How many times to rearrange the questions and answers for generating different conversations from the same group of questions and answers.
  CONCURRENCY_LIMIT: 50 # Hard limit of how many calls can be run at the same time, useful for API mode (aphrodite automatically manages this and queues things, as far as I know)
  COMPLETION_MODE: False # Change to false if you want to use chat (instruct) mode; this requires .json files in your chosen prompts directory, in the OpenAI API format. Not all APIs support completion mode.
  MODE: "api" # can be one of "api"|"aphrodite"
  GRAPH: False # Whether to show a pretty graph after filtering out stuff not worthy for questions, useful for seeing whether or not your text is suitable for making data from using Augmentoolkit by default. Will pause the pipeline's execution until you close the window, which is why this is false by default.
  STOP: True # True = Use stop tokens, False = do not use stop tokens. OpenAI's API restricts you to four stop tokens and all steps have way more than four stop tokens, so you'll need to turn this to False if you're using OAI's API. Also NOTE that if you turn this OFF while using COMPLETION MODE, EVERYTHING WILL BREAK and it will cost you money in the process. Don't do that.