Inputs to the fuzzy_join function

Question

Inputs to the fuzzy_join function

xyya opened this issue 9 years ago · comments

I'm having an issue with how the exact argument work in the fuzzy_join function. For example:
Here' are the datasets:

x <- data.table(id=c(4,5),name=c('sa','qwr'),num=c(1,2))
y <- data.table(id=c(4,5),name=c('saq','qwrw'),num=c(2,2))

This works as expected:

test1 <- fuzzy_join(x,y,fuzzy='name')

results:

     distance id.x name.x num.x id.y name.y num.y
1: 0.08888889    4     sa     1    4    saq     2
2: 0.05833333    5    qwr     2    5   qwrw     2

This doesn't work as expected:

test2 <- fuzzy_join(x,y,exact='id',fuzzy='name')

results:

    distance id.x name.x num.x id.y name.y num.y
1: 0.3333333    4     sa     1   NA     NA    NA
2: 0.3333333    5    qwr     2   NA     NA    NA

Also this isn't what I expected either:

test3 <- fuzzy_join(x,y,exact=c('id','num'),fuzzy='name')

I get the following error message with this example:

Error in .subset2(x, i, exact = exact) : subscript out of bounds

It'd be much appreciated if you can shed some light on how the exact argument works. Thank you.

Matthieu Gomez · Answer 1 · Sun May 10 2015 10:10:33 GMT+0800 (China Standard Time)

Thanks for the input. I've just addressed theses issues in a new commit.
Could you reinstall the package install_github("matthieugomez/statar") and tell me if you still have issues?

xyya · Answer 2 · Sun May 10 2015 10:23:42 GMT+0800 (China Standard Time)

It works well now, thank you very much!