rperseus -- Get texts from the Perseus Digital Library

Question

rperseus -- Get texts from the Perseus Digital Library

daranzolin opened this issue 7 years ago · comments

Summary

rperseus extracts classical texts from the Perseus Digital Library's CapiTainS environment and returns them as a data frame.
Paste the full DESCRIPTION file inside a code block below:

Package: rperseus
Title: Gets texts from the Perseus Digital Library
Version: 0.0.0.9000
Authors@R: person("David", "Ranzolin", email = "daranzolin@gmail.com", role = c("aut", "cre"))
Description: Perseus is a collection of primary source texts. This package helps
    you get them.
Depends:
    R (>= 3.3.0)
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Imports:
    stringr,
    httr,
    xml2,
    dplyr,
    purrr,
    rvest,
    tidyr,
    stats,
    tibble
RoxygenNote: 6.0.1
Suggests: knitr,
    rmarkdown,
    testthat,
    covr
VignetteBuilder: knitr
URL: https://github.com/daranzolin/rperseus
BugReports: https://github.com/daranzolin/rperseus/issues

URL for the package (the development repository, not a stylized html page):
https://github.com/daranzolin/rperseus
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

This package falls under data retrieval. With the recent surge in text mining and sentiment analysis, the works of past authors are now at our disposal.

Who is the target audience?

Classicists with technical training, as well as enthusiasts interested in text mining and analysis. The data obtained from this package could serve as a springboard to linguistic and literary analysis.

Are there other R packages that accomplish the same thing? If so, how does
yours differ or meet our criteria for best-in-category?

None that I'm aware of.

Requirements

Confirm each of the following by checking the box. This package:

[x ] does not violate the Terms of Service of any service it interacts with.
[x ] has a CRAN and OSI accepted license.
[x ] contains a README with instructions for installing the development version.
[x ] includes documentation with examples for all functions.
[x ] contains a vignette with examples of its essential functions and uses.
[x ] has a test suite.
[x ] has continuous integration, including reporting of test coverage, using services such as Travis CI, Coeveralls and/or CodeCov.
[x ] I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

[x ] Do you intend for this package to go on CRAN?
Do you wish to automatically submit to the Journal of Open Source Software? If so:
- The package contains a paper.md with a high-level description in the package root or in inst/.
- The package is deposited in a long-term repository with the DOI:
- (Do not submit your package separately to JOSS)

Detail

[x ] Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:
[x ] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

Noam Ross · Answer 1 · Fri Aug 25 2017 06:48:51 GMT+0800 (China Standard Time)

Editor checks:

Fit: The package meets criteria for fit and overlap
Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
License: The package has a CRAN or OSI accepted license
Repository: The repository link resolves correctly
Archive (JOSS only, may be post-review): The repository DOI resolves correctly
Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thank you for the submission @daranzolin! This package is a good fit and I am currently seeking reviewers.

Some notes: Below is output from goodpractice::gp(). It shows a generally clean output. A few things to be addressed, though:

running lintr::lint_package() shows some additional extraneous whitespace and commented-out code
You may want to use linter exclusions (See the linter README: https://github.com/jimhester/lintr) for lines with long URLs
The "Namespaces in Imports" field error should be addressed. It appears you import packages that you do not use directly. This should be fixed before I assign reviewers.
I believe the UTF NOTE is handled correctly and this is just a snafu in goodpractice, but I will seek a reviewer with related expertise in ensuring that all unicode is handled properly.

── GP rperseus ────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 95% of code lines are covered by test cases.

    R/utils.R:28:NA
    R/utils.R:39:NA

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide.
    Try make your lines shorter than 80 characters

    R/get_perseus_text.R:17:1
    R/get_perseus_text.R:22:1
    R/utils.R:37:1
    R/utils.R:45:1
    tests/testthat/test-get-perseus-text.R:17:1
    ... and 1 more lines

  ✖ not import packages as a whole, as this can cause name clashes between the imported packages. Instead, import only the specific
    functions you need.
  ✖ fix this R CMD check NOTE: Namespaces in Imports field not imported from: ‘stats’ ‘tibble’ ‘tidyr’ All declared Imports should
    be used.
  ✖ fix this R CMD check NOTE: Note: found 61 marked UTF-8 strings
───────────────────────────────────────────────

Reviewers:
Due date:

David Ranzolin · Answer 2 · Fri Aug 25 2017 10:01:04 GMT+0800 (China Standard Time)

Much obliged @noamross! I updated the package accordingly: lintr::lint_package returns nothing, and I removed the extraneous imports.

Noam Ross · Answer 3 · Wed Aug 30 2017 19:13:39 GMT+0800 (China Standard Time)

Reviewer: @czeildi
Reviewer: @fmichonneau
Due date: 2017-09-23

Ildikó Czeller · Answer 4 · Fri Sep 01 2017 05:56:22 GMT+0800 (China Standard Time)

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Summary

Overall I think this is a great package as it is very focused: does one thing and does it well. I could immediately get going based on the readme. I found it great and helpful that the package also has a website generated by pkgdown. I have a few suggestions regarding style, please see my comments at the end and after the packaging guidelines section.

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions in R help
Examples for all exported functions in R Help that run successfully locally
Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

For packages co-submitting to JOSS

The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software

Authors: A list of authors with their affiliations

A statement of need clearly stating problems the software is designed to solve and its target audience.

References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

The package should contain top-level documentation for ?foobar, (or ?foobar-package if there is a naming conflict). Optionally, you can use both ?foobar and ?foobar-package for the package level manual file, using @Aliases roxygen tag.

Make sure to list packages used for testing (testthat), and documentation (knitr, roxygen2) in your Suggests section of package dependencies.

Make sure you include links to websites if you wrap a web API, scrape data from a site, etc. in the Description field of your DESCRIPTION file

The readme should start with the package name

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5

Review Comments

These are my suggestions and things to consider in order of importance:

consider adding a different error message when the text_urn is valid but the API somehow is not available
use stringr::str_replace_all instead of gsub for consistency with other used stringr functions and more compatibility with the %>%.
data documentation:
- document the columns in the order they appear
- one / missing from the documentation of language
- ... is not needed as all columns are explicitly documented
- based on this, consider adding your perseus_catalog generator script under raw_data so that the code for this package is fully self-contained
consider extracting the following part to a util function so that each function is more focused:

BASE_URL <- "http://cts.perseids.org/api/cts"
  text_url <- httr::modify_url(BASE_URL,
                   query = list(
                     request = "GetPassage",
                     urn = paste(text_urn, text_index, sep = ":")
                     )
                   )

get_text_url <- function(text_urn, text_index) {
    # ...
}

This extraction could make the following test more clear as well: "Lots Jesus talk extracted from the Gospel of Matthew"

do not use explicit return statements: they are usually saved for early return from functions
extend the documentation of get_perseus_text in what does it return: something like whole text in sections with metadata in columns
small typo in the description part of readme: API endpoint instead of API end point
consider adding internal roxygen2 documentation to get_full_text_index as it was the hardest for me to understand even after running it for multiple examples step-by-step.
news should contain information most relevant to the user so I would not include that unit test have been added
consider group_name instead of groupname as a column name in your perseus_catalog data
consider renaming your local 'r' variable to response or similar
I also have a question: row_number in the last but one row of get_perseus_text: which package does it come from as I believe it is not the row_number function from dplyr? Maybe consider using that for consistency.
further improvement idea, maybe for a later version: make it possible to directly query by label (and language) instead of urn

David Ranzolin · Answer 5 · Sat Sep 02 2017 02:11:32 GMT+0800 (China Standard Time)

My sincere thanks to Ildikó, both for the speed and comprehensiveness of the review. I've learned a lot about best practices already!

I've tried to address each concern in the latest commits:

use stringr::str_replace_all instead of gsub for consistency with other used stringr functions and more compatibility with the %>%.

Done.

data documentation:
document the columns in the order they appear
one / missing from the documentation of language
... is not needed as all columns are explicitly documented
based on this, consider adding your perseus_catalog generator script under raw_data so that the code for this package is fully self-contained

Done.

consider extracting...to a util function so that each function is more focused (get_text_url)

Done.

do not use explicit return statements: they are usually saved for early return from functions

Done. Is it best practice to leave the object at the end? Or the last function that produced it?

extend the documentation of get_perseus_text in what does it return: something like whole text in sections with metadata in columns

Done.

small typo in the description part of readme: API endpoint instead of API end point

Done.

consider adding internal roxygen2 documentation to get_full_text_index as it was the hardest for me to understand even after running it for multiple examples step-by-step.

Done. Is that also best practice? What criteria should I use to document unexported functions?

news should contain information most relevant to the user so I would not include that unit test have been added

Done.

consider group_name instead of groupname as a column name in your perseus_catalog data

Done.

consider renaming your local 'r' variable to response or similar

Done.

I also have a question: row_number in the last but one row of get_perseus_text: which package does it come from as I believe it is not the row_number function from dplyr? Maybe consider using that for consistency.

I admit I am also confused by this. I assumed it was from dplyr, but when I substituted dplyr::row_number, the function throws an error:

Error in mutate_impl(.data, dots) : 
  Evaluation error: argument "x" is missing, with no default.

A reprex:

df <- data.frame(x = runif(10))
df %>% dplyr::mutate(rn = row_number())
           x rn
1  0.4389244  1
2  0.1549839  2
3  0.8573690  3
4  0.7328806  4
5  0.9063636  5
6  0.3541390  6
7  0.5007385  7
8  0.3527843  8
9  0.7526521  9
10 0.1082413 10

Now explicitly indicating the namespace:


df <- data.frame(x = runif(10))
df %>% dplyr::mutate(rn = dplyr::row_number())
Error in mutate_impl(.data, dots) : 
  Evaluation error: argument "x" is missing, with no default.

I'm not sure what exactly is happening here.

further improvement idea, maybe for a later version: make it possible to directly query by label (and language) instead of urn

This crossed my mind too, but my initial thoughts were that trying to figure out the precise spelling
of an author and text would be less user-friendly than just scanning for and pasting the urn.

Ildikó Czeller · Answer 6 · Sat Sep 02 2017 03:44:35 GMT+0800 (China Standard Time)

hi, just a quick comment about row_number: dplyr::row_number should be called with one of the columns of your data frame, e.g. section = dplyr::row_number(urn). My question still holds but your solution seems perfectly fine, I just do not understand this. I will look at the other parts of your update later.

Ildikó Czeller · Answer 7 · Sun Sep 03 2017 06:07:07 GMT+0800 (China Standard Time)

Hi @daranzolin , the changes look great!

You forgot to apply the groupname --> group_name change in the readme and vignette as well so these now do not run.

Answering your question about return statements: I believe it is accepted practice not to use explicit return although I failed to quickly find reference for this. From an R function the last evaluated expression will be returned so for example the last lines of extract_text could be this:

dplyr::data_frame(text = text) %>%
    dplyr::filter(text != "")

Usually I name my return variables only if the name can add meaningful information or if there are many piped statements after each other.

Regarding documenting internal functions: I am not aware of any guideline or best practices, in my personal experince I found it useful to document functions which are especially difficult to understand. But this is quite subjective.

I believe your code needs a few minor changes to pass packaging guidelines:

Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

The package should contain top-level documentation for ?foobar, (or ?foobar-package if there is a naming conflict). Optionally, you can use both ?foobar and ?foobar-package for the package level manual file, using @Aliases roxygen tag.

You can find direction here

Make sure to list packages used for testing (testthat), and documentation (knitr, roxygen2) in your Suggests section of package dependencies.

I also noticed that your readme loads tidyverse and tidytext, you should list the packages you use in your readme and vignettes under Suggests field. Reference. I believe from tidyverse you only use dplyr an purrr which are already imported so I advise load these in your readme and not the whole tidyverse.

Make sure you include links to websites if you wrap a web API, scrape data from a site, etc. in the Description field of your DESCRIPTION file

The readme should start with the package name

I also have a couple more comments:

I see you did go forward with my suggestion about stringr::str_replace_all, in this case you might want to follow through and use it in extract_text as well. You use the %>% a lot in your code so I think it would make these lines slightly more readable:

text <- purrr::map(r_list$reply$passage$TEI$text$body$div,
                     ~ paste(unlist(.), collapse = " ")) %>%
    stringr::str_replace_all("\\s+", " ") %>%
    stringr::str_replace_all("\\*", "") %>%
    stringr::str_trim()

I see that you included dplyr:: in many places. In this case I believe you do not have to import the whole dplyr package but only pipe from dplyr: @importFrom dplyr %>% This avoids unnecessary name collisions when someone loads your package but do not want to use dplyr. (Although dplyr is very widely used, but generally it is good practice to have as few dependencies as possible which do not hurt convenience much.) The package level documentation file is a great place to include imports like %>% which you use in many of your functions.

These comments are more a question of taste so feel free to apply those you can agree with. Great work!

David Ranzolin · Answer 8 · Tue Sep 05 2017 06:43:16 GMT+0800 (China Standard Time)

Woops, thanks for the reminder! Fixed.

Regarding package level documentation--is this correct?

Also addressed:

Added packages to "suggests" in DESCRIPTION
Included link to website in DESCRIPTION
README starts with package name

Noam Ross · Answer 9 · Wed Sep 06 2017 01:45:06 GMT+0800 (China Standard Time)

Thanks for your rapid review, @czeildi, and your rapid responses @daranzolin! Just an FYI, you may want to hold off on making additional changes in response until the second reviewer gets his review in, as its possible there are related or conflicting opinions.

Noam Ross · Answer 10 · Mon Sep 18 2017 19:36:13 GMT+0800 (China Standard Time)

@fmichonneau A friendly reminder that your review is due this Saturday, 9/23.

François Michonneau · Answer 11 · Tue Sep 19 2017 23:58:46 GMT+0800 (China Standard Time)

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README

I think the goal of the package could be made a little clearer by being more direct. Something like: "rperseus can be used to download classic texts (and in some case their English translation) from Homer to Cicero to Boetheius, to be analyzed in R." Also define, or link to what CapiTainS is.
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions in R help
- get_full_text_index isn't exported but documented. I think a more
  realistic example of this function (if it's actually useful) could be
  included. Indeed, it seems to be useful when it's part of a pipe (otherwise,
  how would a user construct a modified urn?) that it would be good to see
  reflected in the example.
- in the documentation of the package, maybe start by describing the goal of
  the package before writing about what it contains. You could also consider
  pointing to the vignette to let users know how to get started.
- I think the dataset (perseus_catalog) is documented twice, once in
  R/data.R and once in R/rperseus.R.
Examples for all exported functions in R Help that run successfully locally
- It seems it would be useful to show in the examples of get_perseus_text()
  how to provide URNs from the dataset.
Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
- I don't see a CONTRIBUTING file or mention on how to contribute in the
  README.

For packages co-submitting to JOSS

The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software

Authors: A list of authors with their affiliations

A statement of need clearly stating problems the software is designed to solve and its target audience.

References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines
- add a link to CONDUCT.md in the README file

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 6

Review Comments

This is a simple and elegant package to retrieve classical full texts. The package is straightforward to use, and the vignette makes it easy to get started. I have a few comments in the checklist above and below. I found the R CMD check returned a NOTE that I addressed in this PR.

I always find useful to have a link to the URL for the API in the README, and
potentially its documentation.
How often is the catalog updated? Is it a concern that the package could get
out of date and not include works that are being added to the catalog? Would
it make sense to provide a function to generate the catalog locally?
It would be good to validate that the user only provides a single text_urn
to get_perseus_text() to provide an informative error message.
The vignette indicates how to use map_df to fetch multiple texts at once,
maybe add this information also in the README so it's obvious to users how to
achieve this easily.
What is the difference between the "Perseus Collection" and the "Perseus
Catalog"? Why are some texts listed in the collection do not appear in the
catalog? For instance, it seems that all the texts from the 19th century
collection are not in the catalog. It might not be relevant for the package
per se but it might be informative to the users who might check the web
interface first to find a work, only to realize they actually can't get it
from the API.
Running the checks of the package on r-hub, I see:
- NOTE: checking data for non-ASCII characters Note: found 61 marked UTF-8 strings (both on Windows and on R-devel on Linux)
- Error: processing vignette 'rperseus-vignette.Rmd' failed with diagnostics: could not find function "xpath_class" Execution halted (on R-devel on Linux)
  (but I haven't had a chance to investigate the origin of these).

Noam Ross · Answer 12 · Wed Sep 20 2017 00:15:00 GMT+0800 (China Standard Time)

Thanks for your review, @fmichonneau! @daranzolin, let us know when you have made changes in response to this and @czeildi's remaining comments.

David Ranzolin · Answer 13 · Sat Sep 30 2017 09:33:04 GMT+0800 (China Standard Time)

Thanks @fmichonneau! Comments below.

I always find useful to have a link to the URL for the API in the README, and
potentially its documentation.

Done.

How often is the catalog updated? Is it a concern that the package could get
out of date and not include works that are being added to the catalog? Would
it make sense to provide a function to generate the catalog locally?

I've contacted Tufts for clarification. In an earlier version of the package, get_perseus_catalog would generate the catalog locally, but I thought it would simplify things to just include the catalog on load.

It would be good to validate that the user only provides a single text_urn
to get_perseus_text() to provide an informative error message.

Done.

The vignette indicates how to use map_df to fetch multiple texts at once,
maybe add this information also in the README so it's obvious to users how to
achieve this easily.

I have an example of this under the tidyverse and tidytext heading.

What is the difference between the "Perseus Collection" and the "Perseus
Catalog"? Why are some texts listed in the collection do not appear in the
catalog? For instance, it seems that all the texts from the 19th century
collection are not in the catalog. It might not be relevant for the package
per se but it might be informative to the users who might check the web
interface first to find a work, only to realize they actually can't get it
from the API.

The "Perseus Catalog" is admittedly my own moniker. And the catalog XML I parsed for this package only included the Greek and Roman texts, but perhaps I should look further into incorporating the other literature. If the reviewers consider it worthwhile, I'll work on it next week.

As for the r-hub checks, I will need further guidance on how to address them.

François Michonneau · Answer 14 · Sun Oct 15 2017 23:56:32 GMT+0800 (China Standard Time)

Hi @daranzolin, sorry for taking so long to get back to you. Thank you for addressing my comments.

It looks like you missed the comments I included under the checkboxes that I didn't check. Please address them, and I'll look more into the r-hub check results after that.

EDIT: Looking more closely at the code, it actually looks like you already addressed some of them.

Noam Ross · Answer 15 · Mon Oct 16 2017 22:21:12 GMT+0800 (China Standard Time)

Let me just jump in to make sure we're on the same page. @daranzolin, do you think at this point you've addressed everything from the @czeildi's and @fmichonneau's initial reviews? Let us know. If so, @fmichonneau and @czeildi, let us know if you agree with a comment below and checking off remaining boxes on your reviews. If not, let us know what else if left from your perspective.

David Ranzolin · Answer 16 · Tue Oct 17 2017 13:00:24 GMT+0800 (China Standard Time)

Hi everyone, @noamross @czeildi @fmichonneau

Apologies for the delayed response--I see now that I have not yet addressed all of @fmichonneau 's initial review. I'll have responses/updates later this week.

Thanks for checking in!

David Ranzolin · Answer 17 · Mon Oct 23 2017 10:40:48 GMT+0800 (China Standard Time)

Hi everyone, @noamross @czeildi @fmichonneau

Okay, I think I caught up where we left off:

A statement of need clearly stating problems the software is designed to solve and its target audience in README

Updated README with a new goal: The goal of rperseus is to furnish classicists, textual critics, and R enthusiasts with texts from the Classical World. While the English translations of most texts are available through gutenbergr, rperseusreturns these works in their original language--Greek and Latin.

in the documentation of the package, maybe start by describing the goal of
the package before writing about what it contains. You could also consider
pointing to the vignette to let users know how to get started.

I now link to the vignette in the README.

I think the dataset (perseus_catalog) is documented twice, once in
R/data.R and once in R/rperseus.R.

Removed from R/rperseus.R Now only documented once.

I also heard back from Tufts regarding the updating of the catalog. The Managing Editor informed me that there is no regular update schedule at this time.

Ildikó Czeller · Answer 18 · Thu Oct 26 2017 19:02:29 GMT+0800 (China Standard Time)

Hi @daranzolin and @noamross ,

@daranzolin responded to my comments, I recommend to approve his package.

I have one minor comment: although the last chunk of the vignette is set to eval = FALSE it would be nice to change map_df to purrr::map_df or load purrr so that it can be run.

Great work!

Noam Ross · Answer 19 · Thu Oct 26 2017 22:39:26 GMT+0800 (China Standard Time)

One quick editor's comment, regarding the lazy-loaded catalog: There's sometimes reason to load stuff with the package for workflow reasons, but it's pretty rare because adding to the namespace is unexpected behavior for loading. I don't think this case merits going beyond the conventional ways of making data available: Via either data("perseus_catalog") or perseus_catalog() (the latter can simply take no arguments and return the data frame, or if you want to get fancy take some kind of filter or search as an argument).

François Michonneau · Answer 20 · Thu Oct 26 2017 22:50:49 GMT+0800 (China Standard Time)

Thanks @daranzolin for addressing my comments. It looks great! 👌

I submitted a minor PR to add links in the documentation across the functions and fix little issues associated with the CRAN checks.

There are 2 little things that remain.

First I noticed is that get_full_text_index is documented but not exported. It might be confusing for users to see this function in the help, but not being able to call it directly. I would advise to either:

comment out the documentation (it is useful to have the function documented in the code but not necessarily to be available publicly)
export the function (if you think users might use it)
add a note in the help that this function is not exported and therefore not intended to be used outside of the package.

Second, you should include mention of the code of conduct in the README. Typically, rOpenSci packages put this information and other details under a "Meta" header (e.g., https://github.com/ropensci/geojsonio#meta)

Once these are taken care of, I'd be happy to recommend the inclusion of this package in rOpenSci 🎉

David Ranzolin · Answer 21 · Fri Oct 27 2017 05:54:04 GMT+0800 (China Standard Time)

Thanks everyone! I'm giddy with excitement.

@czeildi -- added library(purrr) to README chunk.

@fmichonneau -- I merged pull request # 5 "Small fixes", commented out the roxygen for get_full_text_index, and added a Meta section to the README.

@noamross -- Hadley recommends setting LazyData: true in R Packages. Is this an instance where it should be false? Happy to make the change, but can you point me to an example of how to make the catalog available via data or perseus_catalog?

Noam Ross · Answer 22 · Mon Oct 30 2017 05:03:45 GMT+0800 (China Standard Time)

Great. Sorry, @daranzolin, my brain was pulling up advice from old standards, you're right. I've been typing data('something') I forget stuff is lazy-loaded anyway.

@fmichonneau, please check off the "responded and approved" box on your original review if everything looks on the up-and-up.

François Michonneau · Answer 23 · Mon Oct 30 2017 22:51:52 GMT+0800 (China Standard Time)

I just checked the box. Great work @daranzolin!

Noam Ross · Answer 24 · Tue Oct 31 2017 06:05:59 GMT+0800 (China Standard Time)

🎉 Approved! Thanks @daranzolin or submitting and @fmichonneau and @czeildi for your reviews!

To-dos:

Transfer the repo to the rOpenSci organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
Add the rOpenSci footer to the bottom of your README

[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)

Add the "Peer-reviewed" status badge to your README:

[![](https://badges.ropensci.org/145_status.svg)](https://github.com/ropensci/onboarding/issues/145)

Fix any links in badges for CI and coverage to point to the ropensci URL. (We'll turn on the services on our end as needed)

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). I for one am very interested in the types of analyses this package facilitates. Let us know if you are interested, and if so @stefaniebutland will be in touch about it.

David Ranzolin · Answer 25 · Tue Oct 31 2017 12:31:40 GMT+0800 (China Standard Time)

Thanks @noamross ! This has been a wonderful, illuminating process, and I'm truly humbled by the attention of your reviewers.

I updated everything but have one lingering concern: Travis isn't rebuilding from the latest commits. Do you know why that is?

Stefanie Butland · Answer 26 · Fri Nov 03 2017 02:07:56 GMT+0800 (China Standard Time)

Hi @daranzolin. Are you leaning toward writing a blog post about rperseus?

We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). I for one am very interested in the types of analyses this package facilitates.

I'm happy to answer any questions to help you decide

David Ranzolin · Answer 27 · Sat Nov 04 2017 02:23:54 GMT+0800 (China Standard Time)

Hi @stefaniebutland!

Yes, I am willing to do a blog. Disclaimer: I am not a scientist, and my use cases for this package are for literary analysis.

I am also attempting to add several new features, such as a morphological parser and a stop words dictionary. Perhaps postponing the blog may make it more interesting down the road?

Let me know what you think.

Stefanie Butland · Answer 28 · Sat Nov 04 2017 02:54:30 GMT+0800 (China Standard Time)

Great to hear that @daranzolin!

My strong bias would be to put something out sooner rather than later. The end part of the post could briefly outline your specific plans for new features (and maybe even point to a couple of issues that might attract new contributors?).

This is not a problem:

Disclaimer: I am not a scientist, and my use cases for this package are for literary analysis.

I don't feel fully qualified to point out blog examples that directly relate to your niche but these might be relevant: https://ropensci.org/tags/text-analysis/ and several package / blog post contributors like Thomas Leeper and Lincoln Mullen are not "scientists" either.

Do you think a post in late November or early December would work for you?
Technical details: https://github.com/ropensci/roweb2#contributing-a-blog-post

David Ranzolin · Answer 29 · Sat Nov 04 2017 03:18:04 GMT+0800 (China Standard Time)

@stefaniebutland late November is doable. I'll contact you if I encounter any roadblocks along the way.