Inputs to the fuzzy_join function
xyya opened this issue · comments
I'm having an issue with how the exact argument work in the fuzzy_join function. For example:
Here' are the datasets:
x <- data.table(id=c(4,5),name=c('sa','qwr'),num=c(1,2))
y <- data.table(id=c(4,5),name=c('saq','qwrw'),num=c(2,2))
This works as expected:
test1 <- fuzzy_join(x,y,fuzzy='name')
results:
distance id.x name.x num.x id.y name.y num.y
1: 0.08888889 4 sa 1 4 saq 2
2: 0.05833333 5 qwr 2 5 qwrw 2
This doesn't work as expected:
test2 <- fuzzy_join(x,y,exact='id',fuzzy='name')
results:
distance id.x name.x num.x id.y name.y num.y
1: 0.3333333 4 sa 1 NA NA NA
2: 0.3333333 5 qwr 2 NA NA NA
Also this isn't what I expected either:
test3 <- fuzzy_join(x,y,exact=c('id','num'),fuzzy='name')
I get the following error message with this example:
Error in .subset2(x, i, exact = exact) : subscript out of bounds
It'd be much appreciated if you can shed some light on how the exact argument works. Thank you.
Thanks for the input. I've just addressed theses issues in a new commit.
Could you reinstall the package install_github("matthieugomez/statar")
and tell me if you still have issues?
It works well now, thank you very much!