prestodb / RPresto

DBI-based adapter for Presto for the statistical programming language R.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for row data type in Presto

chalioui opened this issue · comments

I try to run a query on a row column but it seems to be not supported :

Error in .json.tabular.to.data.frame(data.list, r.types, timezone = timezone) :
Unsupported column type for column [2], presto type: "row", R type: "NA"

We unfortunately do not support row types currently. If you are using dbGetQuery, you will need to expand fields to separate columns or use a map if the types allow it.
If you are using dplyr style access, you can use raw sql to do the same, something like: tbl(sql("select row_column.field1 as field1 from table")).

Hello. Just a note for anyone still interested. I was trying to understand what is the issue with rows and realized that if I only wrap the row field into an array in the server side, the parser yields what one expects from nested fields in R, a (possibly) nested list after undoing the array wrapping. E.g., using

tidyr::unnest(dbGetQuery(conn, "select array[nested_field] as nested_field from table"), nested_field)

instead of

dbGetQuery(conn, "select nested_field from table")

EDIT: The downside is that the resulting R data frame rows are not named lists when the Row has named fields.

I wrote #137 to enable simple ROW support to at least unblock users from not being able to even scan a table with ROW fields. It allows users to select a ROW field and import its entirety into R as a list-column of named list.

It still doesn't allow remote manipulation of the ROW field using dplyr (for instance, select(tbl, row_column.sub_field) because dplyr won't know how to interpret the . operator and update the result column names. For now, users who wish to access the subfields in dplyr need to collect the entire ROW column and do further data wrangling within R. As a stretched goal, we could try to implement tidyr's rectangling suite functions as remote functions to enable more complicated remote ROW field operations.

I'm planning a major refactoring of the data processing code to fully address this FR. However, I need to refactor some fundamental code in PrestoCursor and dbFetch() first (see #153) to pave the road.

Closed by #158