xnuinside / simple-ddl-parser

Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, BigQuery, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc. & table properties, types, domains, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Escaped single quote in COMMENT is not handled properly

kukigai opened this issue · comments

Describe the bug
Escaped single quote in COMMENT is not handled properly

To Reproduce
ddl = """
CREATE EXTERNAL TABLE test (
job_id STRING COMMENT 'test's'
)
STORED AS PARQUET LOCATION 'hdfs://test'
"""
from simple_ddl_parser import DDLParser
parse_results = DDLParser(ddl).run(output_mode="hql")

Expected behavior
Non empty json should be returned

@kukigai but now in your example it's not escaped, I mean if I put it in query editor it will parse 'test' -as string, but s' - will be marked like invalid

as I see - must be some 'escaping' character like ' or '' - for sql server

sorry, i think github remove escaped char when i pasted. you can try this to reproduce.

ddl = """
CREATE EXTERNAL TABLE test (
job_id STRING COMMENT 'test\'s'
)
STORED AS PARQUET LOCATION 'hdfs://test'
"""
from simple_ddl_parser import DDLParser
parse_results = DDLParser(ddl).run(output_mode="hql")

@kukigai this is the most sad ticket, I add some fixes in 0.19.6 but anyway because of python specific there is some complicated things. Escaping characters are showed only in raw strings in python, but if pass text as normal string firstly and when convert it to raw string (under the package logic) it will not work, because escaping symbols dropped immediately in normal string.

So there is 2 options how to parse escaping characters in python:

  1. It must be not ' in text but \', like in this test case: https://github.com/xnuinside/simple-ddl-parser/blob/main/tests/test_simple_ddl_parser.py#L1941
  2. Or you can pass you ddl-string not as simple python string, but as raw string, like here:
    https://github.com/xnuinside/simple-ddl-parser/blob/main/tests/test_simple_ddl_parser.py#L1990

I would be glad to maintain it anyhow more user-friendly way, but it's not first time when I bumbed on escaping symbols in python. So there is only those 2 ways. There is no way like pass simple string and use only one '' symbol to escape. Very sorry about it.

@kukigai, can I help anyhow more on this ticket or I can close it?

thanks! yeah, i can workaround it using one of the option.