'SECOND' is recognized as keyword word in the Databrics dialect
beipang opened this issue Β· comments
Search before asking
- I searched the issues and found no similar issues.
What Happened
We are migrating our SQL queries from Redshift to SparkSQL.
Our SQLFluff linter raises an error on the usage of the datediff
function
In the datediff
documentation, you can see that this function accpets "SECOND" as a unit.
However, the linter throws errors on our query with regard to the unit "SECOND"
Expected Behaviour
The linter should accept SECOND
as a keyword and returns no error.
The same query is fine with redshift dialect
$ sqlfluff lint --dialect redshift example.sql --config xxx
All Finished π π!
Observed Behaviour
The linter throws two errors:
- First it forces the "SECOND" to be changed to lower case because
CP02 | Unquoted identifiers must be consistently lower case.
- After changing it to "second", the linter still raises an error :
RF02 | Unqualified reference 'second' found in select with more
| than one referenced table/view.
| [references.qualification]
These behavior suggests that the databricks dialect is treating the "SECOND" as an identifier instead of as a keyword.
Proof that 'SECOND' is being treated as an identifier:
Run $ sqlfluff parse --dialect databricks example.sql
And see the "[L: 5, P: 16] | naked_identifier: 'SECOND'"
[L: 1, P: 1] |file:
[L: 1, P: 1] | statement:
[L: 1, P: 1] | select_statement:
[L: 1, P: 1] | select_clause:
[L: 1, P: 1] | keyword: 'SELECT'
[L: 1, P: 7] | newline: '\n'
[L: 2, P: 1] | whitespace: ' '
[L: 2, P: 5] | [META] indent:
[L: 2, P: 5] | select_clause_element:
[L: 2, P: 5] | column_reference:
[L: 2, P: 5] | naked_identifier: 'a'
[L: 2, P: 6] | comma: ','
[L: 2, P: 7] | newline: '\n'
[L: 3, P: 1] | whitespace: ' '
[L: 3, P: 5] | select_clause_element:
[L: 3, P: 5] | column_reference:
[L: 3, P: 5] | naked_identifier: 'b'
[L: 3, P: 6] | [META] dedent:
[L: 3, P: 6] | newline: '\n'
[L: 4, P: 1] | from_clause:
[L: 4, P: 1] | keyword: 'FROM'
[L: 4, P: 5] | whitespace: ' '
[L: 4, P: 6] | from_expression:
[L: 4, P: 6] | [META] indent:
[L: 4, P: 6] | from_expression_element:
[L: 4, P: 6] | table_expression:
[L: 4, P: 6] | table_reference:
[L: 4, P: 6] | naked_identifier: 'my_table'
[L: 4, P: 14] | [META] dedent:
[L: 4, P: 14] | newline: '\n'
[L: 5, P: 1] | where_clause:
[L: 5, P: 1] | keyword: 'WHERE'
[L: 5, P: 6] | [META] (implicit) indent:
[L: 5, P: 6] | whitespace: ' '
[L: 5, P: 7] | expression:
[L: 5, P: 7] | function:
[L: 5, P: 7] | function_name:
[L: 5, P: 7] | function_name_identifier: 'DATEDIFF'
[L: 5, P: 15] | bracketed:
[L: 5, P: 15] | start_bracket: '('
[L: 5, P: 16] | [META] indent:
[L: 5, P: 16] | expression:
[L: 5, P: 16] | column_reference:
[L: 5, P: 16] | naked_identifier: 'SECOND'
[L: 5, P: 22] | comma: ','
[L: 5, P: 23] | whitespace: ' '
[L: 5, P: 24] | expression:
[L: 5, P: 24] | column_reference:
[L: 5, P: 24] | naked_identifier: 'timestamp_a'
[L: 5, P: 35] | comma: ','
[L: 5, P: 36] | whitespace: ' '
[L: 5, P: 37] | expression:
[L: 5, P: 37] | column_reference:
[L: 5, P: 37] | naked_identifier: 'timestamp_b'
[L: 5, P: 48] | [META] dedent:
[L: 5, P: 48] | end_bracket: ')'
[L: 5, P: 49] | whitespace: ' '
[L: 5, P: 50] | comparison_operator:
[L: 5, P: 50] | raw_comparison_operator: '>'
[L: 5, P: 51] | whitespace: ' '
[L: 5, P: 52] | numeric_literal: '1'
[L: 5, P: 53] | [META] dedent:
[L: 5, P: 53] | newline: '\n'
[L: 6, P: 1] | [META] end_of_file:
How to reproduce
To reproduce the issue:
The example.sql
:
SELECT
my_table.a,
other_table.b
FROM my_table
LEFT JOIN other_table
ON DATEDIFF(SECOND, my_table.timestamp_a, other_table.timestamp_b) > 1
Dialect
Databricks and SparkSQL
Version
sqlfluff, version 2.3.5
Python 3.9.18
Configuration
[sqlfluff]
templater = jinja
sql_file_exts = .sql
# L016 - Line Length (unnecessary given our macro logic)
# L029 - Keywords should not be used as identifies (too many legacy tables rely on this)
# L034 - Statement ordering complexity (not-applicable)
exclude_rules = L016,L029,L034
large_file_skip_byte_limit=50000
[sqlfluff:indentation]
indent_unit = space
tab_space_size = 2
indented_joins = true
indented_using_on = true
template_blocks_indent = false
[sqlfluff:rules]
allow_scalar = True
unquoted_identifiers_policy = all
hanging_indents = False
[sqlfluff:layout:type:comma]
line_position = trailing
[sqlfluff:rules:capitalisation.literals]
capitalisation_policy = upper
[sqlfluff:rules:capitalisation.keywords]
# Inconsistent capitalisation of keywords
capitalisation_policy = upper
[capitalisation.identifiers]
# Inconsistent capitalisation of unquoted identifiers.
extended_capitalisation_policy = lower
unquoted_identifiers_policy = all
[sqlfluff:rules:capitalisation.functions]
extended_capitalisation_policy = upper
[sqlfluff:rules:references.quoting]
ignore_words=time,date
Are you willing to work on and submit a PR to address the issue?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct