matthieugomez / statar

R package for data manipulation — inspired by Stata's API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inputs to the fuzzy_join function

xyya opened this issue · comments

commented

I'm having an issue with how the exact argument work in the fuzzy_join function. For example:
Here' are the datasets:

x <- data.table(id=c(4,5),name=c('sa','qwr'),num=c(1,2))
y <- data.table(id=c(4,5),name=c('saq','qwrw'),num=c(2,2))

This works as expected:

test1 <- fuzzy_join(x,y,fuzzy='name')

results:

     distance id.x name.x num.x id.y name.y num.y
1: 0.08888889    4     sa     1    4    saq     2
2: 0.05833333    5    qwr     2    5   qwrw     2

This doesn't work as expected:

test2 <- fuzzy_join(x,y,exact='id',fuzzy='name')

results:

    distance id.x name.x num.x id.y name.y num.y
1: 0.3333333    4     sa     1   NA     NA    NA
2: 0.3333333    5    qwr     2   NA     NA    NA

Also this isn't what I expected either:

test3 <- fuzzy_join(x,y,exact=c('id','num'),fuzzy='name')

I get the following error message with this example:

Error in .subset2(x, i, exact = exact) : subscript out of bounds

It'd be much appreciated if you can shed some light on how the exact argument works. Thank you.

Thanks for the input. I've just addressed theses issues in a new commit.
Could you reinstall the package install_github("matthieugomez/statar") and tell me if you still have issues?

commented

It works well now, thank you very much!