agoutsmedt / Ranystyle

Provides tools for automated extraction, parsing, and cleaning of bibliographic references from PDF and text documents.

Home Page:https://agoutsmedt.github.io/Ranystyle/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ranystyle

Lifecycle: experimental

Ranystyle (pronounce R-anystyle) is an R package designed to automate the extraction, parsing, and cleaning of bibliographic references from PDF and text documents as well as vector of references stored in an R object. Utilizing the power of the ‘anystyle’ Ruby gem, it segments references and converts them into structured formats suitable for analysis and use.

You can cite this package as:

citation("Ranystyle")
#> To cite biblionetwork in publications use:
#> 
#>   Goutsmedt, Aurélien, (2023). Ranystyle: Automated Bibliographic
#>   Reference Parsing and Cleaning. R package version 0.0.999.
#>   https://github.com/agoutsmedt/Ranystyle
#> 
#> Une entrée BibTeX pour les utilisateurs LaTeX est
#> 
#>   @Manual{,
#>     title = {Ranystyle: Automated Bibliographic Reference Parsing and Cleaning},
#>     author = {Aurélien Goutsmedt},
#>     year = {2024},
#>     note = {R package version 0.0.999},
#>     url = {https://github.com/agoutsmedt/Ranystyle},
#>   }
#> 
#> As Ranystyle is continually evolving, you may want to cite its version
#> number. Find it with 'help(package=Ranystyle)'.

Installation

You can install the development version of Ranystyle from GitHub with:

# install.packages("devtools")
devtools::install_github("agoutsmedt/Ranystyle")

For the functions of Ranystyle to work, you need to install manually Ruby and RubyGems first. Then, you can use install_anystyle() to automatically install anystyle and anystyle-cli ruby gems.

Example

Here’s a basic example of how you might use Ranystyle to parse and clean references from a PDF document:

library(Ranystyle)
# Path to your PDF document
pdf_path <- system.file("extdata", package = "Ranystyle")
files <- list.files(pdf_path)

# Extract references from the PDF
extracted_refs <- find_ref_to_df(input = paste0(pdf_path, "/", files[1]))
#> [1] "anystyle -f json find C:/Users/goutsmedt/AppData/Local/Temp/RtmpMxc1Bs/temp_libpath2cc835e6e90/Ranystyle/extdata/example_doc_1.pdf "
#> [1] "anystyle --overwrite -f ref find C:/Users/goutsmedt/AppData/Local/Temp/RtmpMxc1Bs/temp_libpath2cc835e6e90/Ranystyle/extdata/example_doc_1.pdf ./"

# Print the extracted references
print(extracted_refs)
#> # A tibble: 81 × 23
#>    id_doc doc         id_ref author   title  year `container-title` volume pages
#>     <int> <chr>       <chr>  <list>   <chr> <int> <chr>             <chr>  <chr>
#>  1      1 example_do… 1_1    <tibble> ECB …  2023 Financial Times   <NA>   <NA> 
#>  2      1 example_do… 1_2    <tibble> Rule…  1983 Journal of Monet… 12     101–…
#>  3      1 example_do… 1_3    <tibble> Idea…  2009 Journal of Europ… 16     701–…
#>  4      1 example_do… 1_4    <tibble> A st…  1999 Scottish Journal… 46     17–39
#>  5      1 example_do… 1_5    <tibble> Tech…  2018 International Po… 12     328–…
#>  6      1 example_do… 1_6    <tibble> Late…  2003 Journal of Machi… 3      <NA> 
#>  7      1 example_do… 1_7    <tibble> Plan…  2022 Zeitschrift Für … 32     707–…
#>  8      1 example_do… 1_8    <tibble> Repu…  2010 <NA>              <NA>   <NA> 
#>  9      1 example_do… 1_9    <tibble> The …  2016 At the Macroecon… <NA>   <NA> 
#> 10      1 example_do… 1_10   <tibble> Effe…  2017 At the EUROFI Co… <NA>   <NA> 
#> # ℹ 71 more rows
#> # ℹ 14 more variables: location <chr>, publisher <chr>, type <chr>, date <chr>,
#> #   other_date <chr>, other_title <chr>, url <chr>, issue <chr>, doi <chr>,
#> #   edition <chr>, genre <chr>, note <chr>, editor <chr>, full_ref <chr>

See the vignette("using_Ranystyle") for a more in-depth presentation of the package.

Credits

anystyle has been developed by Alex Fenton, Sylvester Keil, Johannes Krtek and Ilja Srna. anystyle is under copyright: Copyright 2011-2018 Sylvester Keil. All rights reserved. See the Licence for details.

The logo of Ranystyle has been generated with DALL·E.

About

Provides tools for automated extraction, parsing, and cleaning of bibliographic references from PDF and text documents.

https://agoutsmedt.github.io/Ranystyle/

License:Other


Languages

Language:R 100.0%