download_sql does not return more than 1000 rows

Question

download_sql does not return more than 1000 rows

muracstech opened this issue 3 years ago · comments

muracstech commented 3 years ago

is there a way I can download 100K rows using download_sql?

Luca Cerone · Answer 1 · Wed Jun 30 2021 20:21:04 GMT+0800 (China Standard Time)

I am having the same issue, and I would like to download as many rows as needed.

Anup Kumar Ray · Answer 2 · Thu Dec 02 2021 16:16:49 GMT+0800 (China Standard Time)

I too faced the same and figured out that Livy is restricting it to 1000 records. Spark explain plan shows a global limit of 1000 and I am trying to find how to bump that up.

padraic-mcatee · Answer 3 · Thu Jan 06 2022 05:35:58 GMT+0800 (China Standard Time)

Would anyone be interested in an s3/hdfs redirect download feature (@acroz not sure if this would be within the scope of this project)?

Users could provide the following additional params to session constructor:

a prefix/directory for temporary storage (s3://, hdfs://, file://, etc)
a fetcher function that returns a dataframe given a URI as a string.

The download method could have an optional flag for overriding default behavior. Instead of writing out rows, dataframe is saved to temp storage at generated uri uri = "TMP_DIR/DF_NAME.parquet" and returns fetcher(uri).

penggongkui · Answer 4 · Mon Mar 21 2022 14:45:57 GMT+0800 (China Standard Time)

LivySession.create(livy_url, kind=SessionKind.SQL, spark_conf={'livy.rsc.sql.num-rows': '2000'})
with this spark_conf can control the output rows