outlines-dev / outlines

Structured Text Generation

Home Page:https://outlines-dev.github.io/outlines/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`build_regex_from_schema` generates incorrect regexes when used with `"const"` and `"enum"` for booleans, nulls, and strings needing escaping

mwootten opened this issue · comments

Describe the issue as clearly as possible:

  • The regex expects booleans to be capitalized like in Python, not in lowercase as in JSON
  • The regex expects nulls to be None like in Python, not null as in JSON
  • The regex handles string escaping for regular expressions, but doesn't take into account that a string with quotes or backslashes needs to be escaped in the JSON output.

Steps/code to reproduce the bug:

import json
from outlines.generate.json import build_regex_from_schema
print(build_regex_from_schema(json.dumps({"const": True}))) # => True
print(build_regex_from_schema(json.dumps({"const": None}))) # => None
print(build_regex_from_schema(json.dumps({"const": '"'})))  # => """

Expected result:

true
null
"\\""

Error message:

No response

Outlines/Python version information:

Version information

``` pip freeze 0.0.44 Python 3.12.3 (main, Apr 17 2024, 00:00:00) [GCC 13.2.1 20240316 (Red Hat 13.2.1-7)] aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 attrs==23.2.0 beartype==0.15.0 certifi==2024.6.2 cfgv==3.4.0 chardet==5.2.0 charset-normalizer==3.3.2 cloudpickle==3.0.0 cmake==3.29.5.1 coverage==7.5.3 datasets==2.20.0 diff_cover==9.0.0 dill==0.3.8 diskcache==5.6.3 distlib==0.3.8 distro==1.9.0 filelock==3.15.1 flake8==7.0.0 frozenlist==1.4.1 fsspec==2024.5.0 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 huggingface-hub==0.23.4 identify==2.5.36 idna==3.7 iniconfig==2.0.0 interegular==0.3.3 Jinja2==3.1.4 jsonschema==4.22.0 jsonschema-specifications==2023.12.1 lark==1.1.9 llama_cpp_python==0.2.78 llvmlite==0.43.0 MarkupSafe==2.1.5 mccabe==0.7.0 mpmath==1.3.0 multidict==6.0.5 multiprocess==0.70.16 nest-asyncio==1.6.0 networkx==3.3 ninja==1.11.1.1 nodeenv==1.9.1 numba==0.60.0 numpy==1.26.4 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.5.40 nvidia-nvtx-cu12==12.1.105 openai==1.34.0 outlines @ file:///home/mwootten/Projects/outlines packaging==24.1 pandas==2.2.2 platformdirs==4.2.2 pluggy==1.5.0 pre-commit==3.7.1 py-cpuinfo==9.0.0 pyairports==2.1.1 pyarrow==16.1.0 pyarrow-hotfix==0.6 pycodestyle==2.11.1 pycountry==24.6.1 pydantic==2.7.4 pydantic_core==2.18.4 pyflakes==3.2.0 Pygments==2.18.0 pytest==8.2.2 pytest-benchmark==4.0.0 pytest-cov==5.0.0 pytest-mock==3.14.0 python-dateutil==2.9.0.post0 pytz==2024.1 PyYAML==6.0.1 referencing==0.35.1 regex==2024.5.15 requests==2.32.3 responses==0.25.3 rpds-py==0.18.1 safetensors==0.4.3 setuptools==70.0.0 six==1.16.0 sniffio==1.3.1 sympy==1.12.1 tokenizers==0.19.1 torch==2.3.0 tqdm==4.66.4 transformers==4.41.2 typing_extensions==4.12.2 tzdata==2024.1 urllib3==2.2.1 virtualenv==20.26.2 wheel==0.43.0 xxhash==3.4.1 yarl==1.9.4 ```

Context for the issue:

No response