jxshen311 / RefDeduR

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RefDeduR

R build status License: GPL3.0

RefDeduR is an R package that supports accurate and high-throughput reference deduplication. It is especially useful for large datasets and operates on standard bibliographic information (i.e., it does not require information that cannot be retrieved from a mainstream search engine such as PMID).

The deduplication pipeline is modularized into finely-tuned text normalization, three-step exact matching, and two-step fuzzy matching processes. The package features a decision-tree algorithm and considers preprints and conference proceedings when they co-exist with a peer-reviewed version.

Author

Jiaxian Shen

Department of Civil and Environmental Engineering, Northwestern University

Installation

You can install RefDeduR from GitHub with:

# install.packages("devtools")
devtools::install_github("jxshen311/RefDeduR")

Tutorial, website and publication

Citation

If you use RefDeduR, please cite: https://www.biorxiv.org/content/10.1101/2022.09.29.510210v1

Acknowledgement

We thank Yutong Wu for the illuminating discussions about the design of RefDeduR. We are also grateful to Ruochen Jiao and Alexander G. McFarland for their help in coding.

We thank Ahmad Roaayala, Eko Purnomo, and Vectors Point from Noun Project for allowing us to use the following icons Research Paper, Report Paper, report, and Stats Report to create the logo.

About

License:GNU General Public License v3.0


Languages

Language:R 95.4%Language:Python 4.6%