r-lib / vctrs

Generic programming with typed R vectors

Home Page:https://vctrs.r-lib.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

feature request: `vec_proxy_na()`

khusmann opened this issue · comments

Hello, thanks for this awesome library!

I'm working on a package for working with "missing reason" data in R.

I'm experimenting with vctrs to see if I can build a generic Result<Value, MissingReason> type vector. The goal is for it to act like transparently like the value type, but store reasons for missing values as an attribute. In my early experimentation, I've come really close: https://github.com/khusmann/interlacer/blob/vctrs/R/interlaced.R

It's 99% of the way there, but it just has one little quirk: I want to be able to define what I consider to be a "missing value" for the vector.

Right now, as you know, vec_detect_missing() and friends all use vec_proxy_equal() to determine what is considered missing. This creates a problem for me because I want equality to test the equality of missing reasons (e.g. na("Reason 1") == na("Reason 1") && na("Reason 1") != na("Reason 2")), but then with its current behavior this means only rows missing values AND reasons are considered NA by vec_detect_missing().

I can sort of work around this by providing custom definitions for is.na(), but this is only a surface level fix (I want something that'd properly propagate into the tidyverse like tidyr::replace_na)

I want to propose a new proxy for this: vec_proxy_na(). By default, it'd just call vec_proxy_equal() (so it would be 100% backwards compatible), but then when overridden it would allow developers like me a hook into the missingness behavior of their vctrs.

That said, if there's an alternate way to hook into vec_detect_missing(), please let me know!