r-dbi / DBI

A database interface (DBI) definition for communication between R and RDBMSs

Home Page:https://dbi.r-dbi.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consider relaxing `fetch()` to support non-table data

tzakharko opened this issue · comments

The fetch() function currently reports an error if an implementation returns non-table data. it would be useful to make this function more generic in the sense of compatibility with different types of implementations. My concrete use case is that I would like to implement fetch() for database resource descriptors, not all of which resolve to table data. Of course, I can always use my own my_fetch(), I'd prefer to keep more generic names for the sake of user convenience.

Thanks. What database and what kind of non-tabular data are you considering?

The concrete use case is an R package for the AUTOTYP database (https://github.com/autotyp/autotyp-data)

It consists of multiple (possibly nested) datasets, where each dataset is associated with descriptive metadata. My current plan is to expose database descriptor objects (e.g. AUTOTYP$Register would return the descriptor for the Register dataset) and let them call fetch() on the descriptor to actually retrieve the dataset. I have explored alternative options, and I think that exposing "fetchable" descriptors is the most consistent design (other option would be to completely decouple datasets and their description which I don't think is a good user experience overall). Extending fetch() to support non-tabular data would allow me to consistently retrieve nested data components such as lists.

Would nested data frames (with list columns) also work? This is already supported in duckdb.

What's the reason you suggest using fetch() and not dbFetch() ?

Would nested data frames (with list columns) also work? This is already supported in duckdb.

I suppose I could limit top-level datasets to tables and only allow to fetch those. This would however introduce an API asymmetry (e.g. AUTOTYP$GrammaticalMarkers$MarkerExpressedCategories would be a descriptor for a list column field, but to fetch it one would need to use fetch(AUTOTYP$GrammaticalMarkers)$MarkerExpressedCategories). My concern is building an API with least possible surprise for the target users, who often do not have technical education.

What's the reason you suggest using fetch() and not dbFetch() ?

Oh, I prefer fetch() mostly for aesthetic reasons. The datasets will be stored locally as part of the package data anyway. I am merely looking for a suitable generic function that would clearly communicate the intent to the user and be easy to remember. Ideally I would use data() but that is pretty much impossible to make into a generic in a sane way.

I do realize that this is not the original intent of the DBI package. That said, name collision is a big issue with R and I would appreciate if the community can collectively make some steps in simplifying access to "generic-like" functions. This is why I really like idea behind the generics package, alas it does not currently offer a generic with a semantics I am looking for. IMO, fetch() is a strong candidate for being considered one of "common generic" functions.

P.S. I can of course always work around the check in fetch() by walking the call tree and forcing an early return from the main implementation. It is icky though... if a better solution is possible, I would really appreciate it.

Thanks. fetch() exists as a generic in this package, but many other issues raised here seem to be out of scope for DBI.

I do appreciate that name collisions, especially for generics, are a bit of a struggle. Could be that S7 does a better job: https://rconsortium.github.io/S7/articles/packages.html .

For presenting a relational dataset to the user, the dm package could be helpful.

I do appreciate that name collisions, especially for generics, are a bit of a struggle. Could be that S7 does a better job: https://rconsortium.github.io/S7/articles/packages.html .

I did look at S7 but I don't see how I can extend the fetch() without hiding the original implementation. I don't want to create yet another name collision. I'll keep looking though.

For presenting a relational dataset to the user, the dm package could be helpful

Thank you for the suggestion! I am building something similar, the data model is different though.

Anyway, thank you for the discussion. My goal was to explore the possibility of making fetch() more generic. Since it seems that it is out of scope for now, I will need to find a different solution.

What do you mean by "the data model is different"? The dm package helps with arbitrary relational data models.

What do you mean by "the data model is different"? The dm package helps with arbitrary relational data models.

We make extensive use of nested tables (columns of tables) in lieu of normalized tables and foreign keys. This is a more natural representation of our data, as there is often a clear sense of hierarchy and ownership (one table "owns" another one). There are some relational structures, these can occur at different levels though. This is not a classical relational data model.