Big Number with Trendline coudn't work with PostgreSQL's date field
ensky opened this issue · comments
Big Number with Trendline coudn't work with PostgreSQL's date field.
How to reproduce the bug
- Create a table with a column which type is "date" in postgresql
- Add that database and table as dataset to Superset
- Try to create a Big Number with Trendline chart with that dataset
- It shows
Unexpected error
Error: '+08'
which +08 is the timezone about your pgsql settings
Expected results
The chart should be draw successfully
Actual results
Unexpected error
Error: '+08'
Screenshots
Additional Error Log
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/usr/local/lib/python3.8/site-packages/flask_appbuilder/security/decorators.py", line 190, in wraps
return f(self, *args, **kwargs)
File "/app/superset/utils/log.py", line 245, in wrapper
value = f(*args, **kwargs)
File "/app/superset/views/core.py", line 2351, in results
return self.results_exec(key)
File "/app/superset/views/core.py", line 2419, in results_exec
obj = _deserialize_results_payload(
File "/app/superset/views/utils.py", line 625, in _deserialize_results_payload
df = result_set.SupersetResultSet.convert_table_to_df(pa_table)
File "/app/superset/result_set.py", line 177, in convert_table_to_df
return table.to_pandas(integer_object_nulls=True)
File "pyarrow/array.pxi", line 757, in pyarrow.lib._PandasConvertible.to_pandas
File "pyarrow/table.pxi", line 1748, in pyarrow.lib.Table._to_pandas
File "/usr/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 789, in table_to_blockmanager
blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
File "/usr/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 1130, in _table_to_blocks
return [_reconstruct_block(item, columns, extension_columns)
File "/usr/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 1130, in <listcomp>
return [_reconstruct_block(item, columns, extension_columns)
File "/usr/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 733, in _reconstruct_block
dtype = make_datetimetz(item['timezone'])
File "/usr/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 758, in make_datetimetz
tz = pa.lib.string_to_tzinfo(tz)
File "pyarrow/types.pxi", line 1927, in pyarrow.lib.string_to_tzinfo
File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
File "/usr/local/lib/python3.8/site-packages/pytz/__init__.py", line 188, in timezone
raise UnknownTimeZoneError(zone)
pytz.exceptions.UnknownTimeZoneError: '+08'
2022-05-23 03:27:32,051:ERROR:superset.views.base:'+08'
Environment
- browser type and version: any browser
- superset version: 1.5.0
- python version: 3.8.12
- any feature flags active:
- VERSIONED_EXPORT
- ENABLE_TEMPLATE_PROCESSING
Checklist
- I have checked the superset logs for python stacktraces and included it here as text if there are any.
- I have reproduced the issue with at least the latest released version of superset.
- I have checked the issue tracker for the same issue and I haven't found one similar.
Additional context
- This issue doesn't exists in Superset-v1.4.0
The SQL in the chart is
SELECT DATE_TRUNC('day', publish_date) AS __timestamp,
count(*) AS count
FROM
(SELECT publish_date
FROM software.package_file) AS virtual_table
GROUP BY DATE_TRUNC('day', publish_date)
LIMIT 50000;
The error part seems to be the DATE_TRUNC function, it returns something like "2022-05-23 12:00:00+08". Which format seems to be errorly processed by pyarrow project.
Accourding to pyarrow project, the timezone format seems to be like +08:00 format, maybe it's the root cuase? https://github.com/apache/arrow/blob/apache-arrow-5.0.0/python/pyarrow/types.pxi#L1915
@ensky what is the datatype of publish_date
in the software.package_file
table?
@villebro It's date
type
SELECT
table_name,
column_name,
data_type
FROM
information_schema.columns
WHERE
table_name = 'package_file' and column_name = 'publish_date';
@villebro I can reproduce it in SQL Lab with your reproduce step with query
SELECT DATE_TRUNC('day', publish_date),
count(*) AS count
FROM
(SELECT '2021-01-01'::date as publish_date) AS virtual_table
GROUP BY DATE_TRUNC('day', publish_date)
LIMIT 50000;
pgsql version
PostgreSQL 12.7 (Debian 12.7-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
I am facing same issue with all Time-series charts(echarts and legacy both). The issue seems to be handling of Date
datatype in Postgres, casting such column to Timestamp
solves the issue.
It looks like an issue that popped somewhere else, and that was fixed by upgrading psycopg2
to 2.9.1
in requirements.txt
@ensky I'm not able to reproduce the error with the query you provided:
SELECT DATE_TRUNC('day', publish_date),
count(*) AS count
FROM
(SELECT '2021-01-01'::date as publish_date) AS virtual_table
GROUP BY DATE_TRUNC('day', publish_date)
LIMIT 50000;
Executed tests:
master
onPostgres 12
master
onPostgres 12.11
1.5.1rc1
onPostgres 12
1.5.1rc1
onPostgres 12.11
I am facing same issue with all Time-series charts(echarts and legacy both). The issue seems to be handling of
Date
datatype in Postgres, casting such column toTimestamp
solves the issue.
Seems like issue is something else. Local docker setup doesn't show issue, but kubernetes setup shows issue. I have older version of postgres in k8s, so that may be the issue.
but kubernetes setup shows issue
True. Our environment is located on the k8s indeed.
What's the Postgres version in k8s?
@michael-s-molina As mentioned above, it's 12.7
#20155 (comment)
I misunderstood 😄 , I thought that when @kamalkeshavani-aiinside commented:
I have older version of postgres in k8s
He was already considering the previous comment and that we tested on versions 12 and 12.11.
I am facing same issue with all Time-series charts
Me too. For me, this issue was introduced with the upgrade to 1.5.0. Unfortunately, a downgrade does not work ("Databases" is empty after downgrade).
Maybe the FutureWarning at the top of my (redacted) stack trace contains a valuable hint for one of you:
"Data is timezone-aware. Converting timezone-aware data to timezone-naive by passing dtype='datetime64[ns]' to DataFrame or Series is deprecated and will raise in a future version. Use
pd.Series(values).dt.tz_localize(None)
instead."
Jun 02 15:09:25 demeter gunicorn[1697193]: /srv/superset/local/lib/python3.8/site-packages/superset/common/query_context_processor.py:186: FutureWarning: Data is timezone-aware. Converting timezone-aware data to timezone-naive by passing dtype='datetime64[ns]' to DataFrame or Series is deprecated and will raise in a future version. Use `pd.Series(values).dt.tz_localize(None)` instead.
Jun 02 15:09:25 demeter gunicorn[1697193]: result = query_context.datasource.query(query_object.to_dict())
Jun 02 15:09:25 demeter gunicorn[1697193]: Query SELECT DATE_TRUNC('year', entry_date) AS __timestamp,
Jun 02 15:09:25 demeter gunicorn[1697193]: sum(our_fees) AS "SUM(our_fees)"
Jun 02 15:09:25 demeter gunicorn[1697193]: FROM
Jun 02 15:09:25 demeter gunicorn[1697193]: (SELECT i.date AS invoice_date,
Jun 02 15:09:25 demeter gunicorn[1697193]: e.date AS entry_date,
...
Jun 02 15:09:25 demeter gunicorn[1697193]: WHERE entry_date >= TO_TIMESTAMP('2012-01-01 00:00:00.000000', 'YYYY-MM-DD HH24:MI:SS.US')
Jun 02 15:09:25 demeter gunicorn[1697193]: AND entry_date < TO_TIMESTAMP('2022-01-01 00:00:00.000000', 'YYYY-MM-DD HH24:MI:SS.US')
Jun 02 15:09:25 demeter gunicorn[1697193]: AND number = 970
Jun 02 15:09:25 demeter gunicorn[1697193]: AND subnumber = 1
Jun 02 15:09:25 demeter gunicorn[1697193]: GROUP BY DATE_TRUNC('year', entry_date)
Jun 02 15:09:25 demeter gunicorn[1697193]: ORDER BY "SUM(our_fees)" DESC
Jun 02 15:09:25 demeter gunicorn[1697193]: LIMIT 10000 on schema public failed
Jun 02 15:09:25 demeter gunicorn[1697193]: Traceback (most recent call last):
Jun 02 15:09:25 demeter gunicorn[1697193]: File "/srv/superset/local/lib/python3.8/site-packages/superset/connectors/sqla/models.py", line 1719, in query
Jun 02 15:09:25 demeter gunicorn[1697193]: df = self.database.get_df(sql, self.schema, mutator=assign_column_label)
Jun 02 15:09:25 demeter gunicorn[1697193]: File "/srv/superset/local/lib/python3.8/site-packages/superset/models/core.py", line 453, in get_df
Jun 02 15:09:25 demeter gunicorn[1697193]: df = result_set.to_pandas_df()
Jun 02 15:09:25 demeter gunicorn[1697193]: File "/srv/superset/local/lib/python3.8/site-packages/superset/result_set.py", line 202, in to_pandas_df
Jun 02 15:09:25 demeter gunicorn[1697193]: return self.convert_table_to_df(self.table)
Jun 02 15:09:25 demeter gunicorn[1697193]: File "/srv/superset/local/lib/python3.8/site-packages/superset/result_set.py", line 177, in convert_table_to_df
Jun 02 15:09:25 demeter gunicorn[1697193]: return table.to_pandas(integer_object_nulls=True)
Jun 02 15:09:25 demeter gunicorn[1697193]: File "pyarrow/array.pxi", line 757, in pyarrow.lib._PandasConvertible.to_pandas
Jun 02 15:09:25 demeter gunicorn[1697193]: File "pyarrow/table.pxi", line 1748, in pyarrow.lib.Table._to_pandas
Jun 02 15:09:25 demeter gunicorn[1697193]: File "/srv/superset/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 789, in table_to_blockmanager
Jun 02 15:09:25 demeter gunicorn[1697193]: blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
Jun 02 15:09:25 demeter gunicorn[1697193]: File "/srv/superset/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 1130, in _table_to_blocks
Jun 02 15:09:25 demeter gunicorn[1697193]: return [_reconstruct_block(item, columns, extension_columns)
Jun 02 15:09:25 demeter gunicorn[1697193]: File "/srv/superset/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 1130, in <listcomp>
Jun 02 15:09:25 demeter gunicorn[1697193]: return [_reconstruct_block(item, columns, extension_columns)
Jun 02 15:09:25 demeter gunicorn[1697193]: File "/srv/superset/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 733, in _reconstruct_block Jun 02 15:09:25 demeter gunicorn[1697193]: dtype = make_datetimetz(item['timezone'])
Jun 02 15:09:25 demeter gunicorn[1697193]: File "/srv/superset/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 758, in make_datetimetz
Jun 02 15:09:25 demeter gunicorn[1697193]: tz = pa.lib.string_to_tzinfo(tz)
Jun 02 15:09:25 demeter gunicorn[1697193]: File "pyarrow/types.pxi", line 1927, in pyarrow.lib.string_to_tzinfo Jun 02 15:09:25 demeter gunicorn[1697193]: File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
Jun 02 15:09:25 demeter gunicorn[1697193]: File "/usr/lib/python3/dist-packages/pytz/__init__.py", line 181, in timezone
Jun 02 15:09:25 demeter gunicorn[1697193]: raise UnknownTimeZoneError(zone)
Jun 02 15:09:25 demeter gunicorn[1697193]: pytz.exceptions.UnknownTimeZoneError: '+01'
Jun 02 15:09:25 demeter gunicorn[1697193]: 192.168.0.121 - - [02/Jun/2022:15:09:25 +0200] "POST /api/v1/chart/data?form_data=%7B%22slice_id%22%3A211%7D&dashboard_id=13&force HTTP/1.0" 400 27 "https://[...]" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0"
Jun 02 15:09:25 demeter gunicorn[1697193]: [2022-06-02 15:09:25 +0200] [1697193] [DEBUG] Closing connection.
It looks like an issue that popped somewhere else, and that was fixed by upgrading
psycopg2
to2.9.1
inrequirements.txt
I can confirm that with psycopg2=2.9.1 the problem is solved.
So it seems to be the problem about k8s setup example located in https://superset.apache.org/docs/installation/running-on-kubernetes , and which instructed to pip install psycopg2==2.8.5
in dependency section.
Anyone may change it to pip install psycopg2==2.9.1
then this problem is solved.
So it seems to be the problem about k8s setup example located in https://superset.apache.org/docs/installation/running-on-kubernetes , and which instructed to pip install psycopg2==2.8.5 in dependency section.
Thanks for reporting this @ensky and also for submitting a PR to update the guide 🙂
I can confirm that with psycopg2=2.9.1 the problem is solved.
Anyone may change it to
pip install psycopg2==2.9.1
then this problem is solved.
I cannot confirm this resolution. Upgrading superset to 1.5.1 via pip. The problem persists for me. pip even upgrades to psycopg2==2.9.3. Also, manually downgrading to 2.9.1 does not resolve the issue. The patch in 0b49ca6 still works.
This is really useful.
Setting psycopg2==2.9.3 works for me (Using superset 2.0.0 Docker Image)
Perhaps this is an issue with my PostgreSQL version (14). Anyway, c61ac76 is an updated patch to fix the timezone issue if (like me) the psycopg2 upgrade does not resolve it for you. This is better than my previous patch, because it is also compatible with using Prophet.