JSON errors in `generate()` happen for certain base models (but not for others)
alexsherstinsky opened this issue · comments
Alex Sherstinsky commented
System Info
@jeffreyftang I am finding that LoRAX runs into JSON errors when used (via the Predibase SDK) to prompt the "gemma-2b"
and "mistral-7b" base models (but no issues with
"phi-2", and
"zephyr-7b-beta"), but one time
"phi-2"` also did not work. The behavior in terms of which model causes the error is inconsistent. When the error happens, the stack trace is:
> result: GeneratedResponse = base_llm_deployment.generate(
prompt=prompt,
options=options,
)
sdk/python/langchain/libs/community/langchain_community/llms/predibase.py:61:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <predibase.resource.llm.interface.LLMDeployment object at 0x3782fee60>
prompt = "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that yo...ific tasks and log results.\nInstruction:\n\nQuestion: What are the approaches to Task Decomposition?\nHelpful Answer:"
options = {'details': False, 'max_new_tokens': 256, 'temperature': 0.1}
def generate(
self,
prompt: str,
options: Optional[Dict[str, Union[str, float]]] = None,
) -> GeneratedResponse:
if not options:
options = dict()
# Need to do this since the lorax client sets this to True by default
if "details" not in options:
options["details"] = False
options = self._override_adapter_options(options)
> res = self.lorax_client.generate(prompt=prompt, **options)
sdk/python/predibase/resource/llm/interface.py:307:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <lorax.client.Client object at 0x3782fcca0>
prompt = "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that yo...ific tasks and log results.\nInstruction:\n\nQuestion: What are the approaches to Task Decomposition?\nHelpful Answer:"
adapter_id = None, adapter_source = None, merged_adapters = None, api_token = None, do_sample = False, max_new_tokens = 256, ignore_eos_token = False, best_of = None, repetition_penalty = None, return_full_text = False, seed = None
stop_sequences = None, temperature = 0.1, top_k = None, top_p = None, truncate = None, typical_p = None, watermark = False, response_format = None, decoder_input_details = False, return_k_alternatives = None, details = False
def generate(
self,
prompt: str,
adapter_id: Optional[str] = None,
adapter_source: Optional[str] = None,
merged_adapters: Optional[MergedAdapters] = None,
api_token: Optional[str] = None,
do_sample: bool = False,
max_new_tokens: Optional[int] = None,
ignore_eos_token: bool = False,
best_of: Optional[int] = None,
repetition_penalty: Optional[float] = None,
return_full_text: bool = False,
seed: Optional[int] = None,
stop_sequences: Optional[List[str]] = None,
temperature: Optional[float] = None,
top_k: Optional[int] = None,
top_p: Optional[float] = None,
truncate: Optional[int] = None,
typical_p: Optional[float] = None,
watermark: bool = False,
response_format: Optional[Union[Dict[str, Any], ResponseFormat]] = None,
decoder_input_details: bool = False,
return_k_alternatives: Optional[int] = None,
details: bool = True,
) -> Response:
"""
Given a prompt, generate the following text
Args:
prompt (`str`):
Input text
adapter_id (`Optional[str]`):
Adapter ID to apply to the base model for the request
adapter_source (`Optional[str]`):
Source of the adapter (hub, local, s3)
merged_adapters (`Optional[MergedAdapters]`):
Merged adapters to apply to the base model for the request
api_token (`Optional[str]`):
API token for accessing private adapters
do_sample (`bool`):
Activate logits sampling
max_new_tokens (`Optional[int]`):
Maximum number of generated tokens
ignore_eos_token (`bool`):
Whether to ignore EOS tokens during generation
best_of (`int`):
Generate best_of sequences and return the one if the highest token logprobs
repetition_penalty (`float`):
The parameter for repetition penalty. 1.0 means no penalty. See [this
paper](https://arxiv.org/pdf/1909.05858.pdf) for more details.
return_full_text (`bool`):
Whether to prepend the prompt to the generated text
seed (`int`):
Random sampling seed
stop_sequences (`List[str]`):
Stop generating tokens if a member of `stop_sequences` is generated
temperature (`float`):
The value used to module the logits distribution.
top_k (`int`):
The number of highest probability vocabulary tokens to keep for top-k-filtering.
top_p (`float`):
If set to < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or
higher are kept for generation.
truncate (`int`):
Truncate inputs tokens to the given size
typical_p (`float`):
Typical Decoding mass
See [Typical Decoding for Natural Language Generation](https://arxiv.org/abs/2202.00666) for more information
watermark (`bool`):
Watermarking with [A Watermark for Large Language Models](https://arxiv.org/abs/2301.10226)
response_format (`Optional[Union[Dict[str, Any], ResponseFormat]]`):
Optional specification of a format to impose upon the generated text, e.g.,:
```
{
"type": "json_object",
"schema": {
"type": "string",
"title": "response"
}
}
```
decoder_input_details (`bool`):
Return the decoder input token logprobs and ids
return_k_alternatives (`int`):
The number of highest probability vocabulary tokens to return as alternative tokens in the generation result
details (`bool`):
Return the token logprobs and ids for generated tokens
Returns:
Response: generated response
"""
# Validate parameters
parameters = Parameters(
adapter_id=adapter_id,
adapter_source=adapter_source,
merged_adapters=merged_adapters,
api_token=api_token,
best_of=best_of,
details=details,
do_sample=do_sample,
max_new_tokens=max_new_tokens,
ignore_eos_token=ignore_eos_token,
repetition_penalty=repetition_penalty,
return_full_text=return_full_text,
seed=seed,
stop=stop_sequences if stop_sequences is not None else [],
temperature=temperature,
top_k=top_k,
top_p=top_p,
truncate=truncate,
typical_p=typical_p,
watermark=watermark,
response_format=response_format,
decoder_input_details=decoder_input_details,
return_k_alternatives=return_k_alternatives
)
request = Request(inputs=prompt, stream=False, parameters=parameters)
resp = requests.post(
self.base_url,
json=request.dict(by_alias=True),
headers=self.headers,
cookies=self.cookies,
timeout=self.timeout,
)
# TODO: expose better error messages for 422 and similar errors
> payload = resp.json()
/opt/homebrew/anaconda3/envs/predibase/lib/python3.10/site-packages/lorax/client.py:190:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Response [503]>, kwargs = {}
def json(self, **kwargs):
r"""Returns the json-encoded content of a response, if any.
:param \*\*kwargs: Optional arguments that ``json.loads`` takes.
:raises requests.exceptions.JSONDecodeError: If the response body does not
contain valid json.
"""
if not self.encoding and self.content and len(self.content) > 3:
# No encoding set. JSON RFC 4627 section 3 states we should expect
# UTF-8, -16 or -32. Detect which one to use; If the detection or
# decoding fails, fall back to `self.text` (using charset_normalizer to make
# a best guess).
encoding = guess_json_utf(self.content)
if encoding is not None:
try:
return complexjson.loads(self.content.decode(encoding), **kwargs)
except UnicodeDecodeError:
# Wrong UTF codec detected; usually because it's not UTF-8
# but some other 8-bit codec. This is an RFC violation,
# and the server didn't bother to tell us what codec *was*
# used.
pass
except JSONDecodeError as e:
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
try:
return complexjson.loads(self.text, **kwargs)
except JSONDecodeError as e:
# Catch JSON-related errors and raise as requests.JSONDecodeError
# This aliases json.JSONDecodeError and simplejson.JSONDecodeError
> raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
E requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
/opt/homebrew/anaconda3/envs/predibase/lib/python3.10/site-packages/requests/models.py:975: JSONDecodeError
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Thank you.
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
Run generate()
on "mistral-7b"
.
Expected behavior
There should be no errors when running generate()
on any supported Predibase serverless model.
Alex Sherstinsky commented
Moving to predibase internal.