remove_duplicates - error
evezeyl opened this issue · comments
Hi,
I am exploring your package, and following your example: https://elizagrames.github.io/litsearchr/litsearchr_vignette_v030.html with my data.
I encountered a problem when using litsearchr::remove_duplicates(import_search, "title", "exact")
- with import_search my dataframe. It reported an error:
Error in synthesisr::find_duplicates(data = df, match_variable = field, :
unused argument (match_variable = field)
so I slighlyt tested and modified the function (I am not as advanced in R as you are) but this seemsed to work as intended:
remove_duplicates2 <- function (df, field, method = c("stringdist", "fuzzdist", "exact")) {
dups <- synthesisr::find_duplicates(df[,field], match_function = method,
to_lower = TRUE, rm_punctuation = TRUE)
df <- synthesisr::extract_unique_references(df, matches = dups)
return(df)
}
#called as:
remove_duplicates2(import_search, "title", "stringdist")
I actually got also more direct result with :
synthesisr::deduplicate(import_search, match_by = "title", match_function = "stringdist", to_lower = TRUE, rm_punctuation = TRUE)
but you might intend to complexify your function.
Anyhow, I just wanted to let you know that at least something was not working when I tried it with my data.
And thank you so much to developp this package, it will be really helpfull.
All the best
Eve
Yeah, we slightly reworked the dedupe functions in synthesisr and I forgot to update it in litsearchr to reflect the new variable names. Your workaround is exactly what the fix is, which is now changed in the current master branch of litsearchr.