macbre / sql-metadata

Uses tokenized query returned by python-sqlparse and generates query metadata

Home Page:https://pypi.python.org/pypi/sql-metadata

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parser().tables fix for missing schema in ms_sql and sybase

davlee1972 opened this issue · comments

Can we modify the two functions below?

Microsoft SQL Server and Sybase both allow empty db schema(s).
If empty it defaults to either the user login or "dbo" depending if user.table or dbo.table exists.

This works..

>>> from sql_metadata import Parser
>>> Parser("select 1 from my_db.my_schema.my_table").tables
['my_db.my_schema.my_table']

But this does not:

>>> from sql_metadata import Parser
>>> Parser("select 1 from my_db..my_table").tables
['..my_table']

The code modifications will resolve to proper table names..

>>> from sql_metadata import Parser
>>> Parser("select 1 from my_db..my_table").tables
['my_db..my_table']

Code fix: Only step back one token if schema is missing.

    def _combine_qualified_names(self, index: int, token: SQLToken) -> None:
        """
        Combines names like <schema>.<table>.<column> or <table/sub_query>.<column>
        """
        value = token.value
        is_complex = True
        while is_complex:
            value, index, is_complex = self._combine_tokens(index=index, value=value)
        token.value = value

    def _combine_tokens(self, index: int, value: str) -> Tuple[str, int, bool]:
        """
        Checks if complex identifier is longer and follows back until it's finished
        """
        if index > 1 and str(self.non_empty_tokens[index - 1]) == ".":
            prev_value = self.non_empty_tokens[index - 2].value.strip("`").strip('"')
            """
            Only step back one token if schema is missing to get database name
            Otherwise step back two tokens to get database name
            """
            if prev_value == ".":
                index = index - 1
                value = f".{value}"
            else:
                index = index - 2
                value = f"{prev_value}.{value}"
            return value, index, True
        return value, None, False

Technical Docs..

From the documentation:

When database objects are referenced by using a one-part name, SQL Server first looks in the user's default schema. If the object is not found there, SQL Server looks next in the dbo schema. If the object is not in the dbo schema, an error is returned.

They say "one-part name" but this also applies when you reference an object with a pattern like:

[server].[database]..[name]

or

[database]..[name]

@davlee1972 - thanks! Can you prepare a PR with the fix you've proposed above?