ropensci / software-review

rOpenSci Software Peer Review.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

frictionless: Read and Write Frictionless Data Packages

peterdesmet opened this issue · comments

Date accepted: 2022-02-10
Submitting Author Name: Peter Desmet
Submitting Author Github Handle: @peterdesmet
Other Package Authors Github handles: @damianooldoni
Repository: https://github.com/frictionlessdata/frictionless-r
Version submitted: 0.9.0
Submission type: Standard

Editor: @melvidoni
Reviewers: @zambujo, @beatrizmilz

Due date for @zambujo: 2022-02-06

Due date for @beatrizmilz: 2022-02-09
Archive: TBD
Version accepted: TBD
Language: en


  • Paste the full DESCRIPTION file inside a code block below:
Package: frictionless
Title: Read and Write Frictionless Data Packages
Version: 0.9.0.9000
Authors@R: c(
    person("Peter", "Desmet", , "peter.desmet@inbo.be", role = c("aut", "cre"),
           comment = c(ORCID = "0000-0002-8442-8025")),
    person("Damiano", "Oldoni", , "damiano.oldoni@inbo.be", role = "aut",
           comment = c(ORCID = "0000-0003-3445-7562")),
    person("Research Institute for Nature and Forest (INBO)", , , 
           "info@inbo.be", role = c("cph"))
  )
Description: Read and write Frictionless Data Packages. A Data Package 
  (<https://specs.frictionlessdata.io/data-package/>) is a simple container 
  format and standard to describe and package a collection of (tabular) data. 
  It is typically used to publish FAIR and open datasets.
License: MIT + file LICENSE
URL: https://github.com/frictionlessdata/frictionless-r,
    https://frictionlessdata.github.io/frictionless-r/
BugReports: https://github.com/frictionlessdata/frictionless-r/issues
Imports:
    assertthat,
    dplyr,
    glue,
    httr,
    jsonlite,
    purrr,
    readr (>= 2.1.0),
    stringr
Suggests:
    knitr,
    hms,
    lubridate,
    testthat (>= 3.0.0),
    rmarkdown
Config/testthat/edition: 3
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2
VignetteBuilder: knitr

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • data validation and testing (listed as category, but not in issue template)
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

frictionless allows users to read and write Frictionless Data Packages, an open and general-purpose standard to structure and describe (tabular) datasets, typically used to publish FAIR datasets. The package allows users to read (local and remote) Data Packages (data retrieval), load its data resources in data frames (data extraction), return errors if the Data Package is malformed (data validation and testing), add data frames as new resources (data munging) and write Data Packages back to disk (Data deposition).

  • Who is the target audience and what are scientific applications of this package?

Anyone who wants to read or create datasets structured as Frictionless Data Packages. The community is referred to as the Frictionless Data community and typical includes researchers, data scientists and data engineers, often interested in (publishing) open data.

Yes, datapackage.r: it has an object-oriented design (using a Package class) and offers validation. frictionless on the other hand allows users to quickly read and write Data Package data to and from R data frames, getting out of your way for the rest of your analysis. It is designed to be lightweight, follows tidyverse principles and supports piping. The main functionality (reading data into data frame, adding a data frame as a resource to a package, writing a Data Package to disk) is offered as functions, rather than the class properties in datapackage.r.

Not applicable

  • If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Not applicable

Technical checks

Confirm each of the following by checking the box.

Note that the link to guide for authors above (in the issue template) returns a 404. It should be https://devguide.ropensci.org/authors-guide.html. I tried to use pkgcheck but I got package ‘pkgcheck’ is not available for this version of R

This package:

Publication options

  • Do you intend for this package to go on CRAN?

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

Note that this package falls under the Frictionless Data Code of Conduct.

Missing values: author1, repourl, submission-type, language

@ropensci-review-bot I have now included the missing <!--> tags in the issue body.

Thanks, about to send the query.

🚀

Editor check started

👋

Checks for frictionless (v0.9.0.9000)

git hash: dc9daa6a

  • ✔️ Package name is available
  • ✔️ has a 'CITATION' file.
  • ✖️ does not have a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✔️ Package has continuous integration checks.
  • ✔️ Package coverage is 100%.
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: MIT + file LICENSE


1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 15 files) and
  • 2 authors
  • 1 vignette
  • 1 internal data file
  • 8 imported packages
  • 8 exported functions (median 18 lines of code)
  • 26 non-exported functions in R (median 24 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 15 73.0
files_vignettes 1 68.4
files_tests 10 90.7
loc_R 528 49.2
loc_vignettes 119 31.1
loc_tests 1192 89.2
num_vignettes 1 64.8
data_size_total 1364 61.3
data_size_median 1364 66.0
n_fns_r 34 44.1
n_fns_r_exported 8 38.3
n_fns_r_not_exported 26 48.5
n_fns_per_file_r 1 21.7
num_params_per_fn 2 11.9
loc_per_fn_r 20 59.8
loc_per_fn_r_exp 18 42.5
loc_per_fn_r_not_exp 24 70.4
rel_whitespace_R 13 42.4
rel_whitespace_vignettes 41 38.1
rel_whitespace_tests 13 80.4
doclines_per_fn_exp 30 34.5
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 46 64.6

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


2. goodpractice and other checks

Details of goodpractice and other checks (click to open)

3a. Continuous Integration Badges

R-CMD-check

GitHub Workflow Results

name conclusion sha date
pages build and deployment success 96a3d1 2022-01-03
pkgdown success dc9daa 2022-01-03
R-CMD-check success dc9daa 2022-01-03
test-coverage success dc9daa 2022-01-03

3b. goodpractice results

R CMD check with rcmdcheck

rcmdcheck found no errors, warnings, or notes

Test coverage with covr

Package coverage: 100

Cyclocomplexity with cyclocomp

No functions have cyclocomplexity >= 15

Static code analyses with lintr

lintr found the following 12 potential issues:

message number of times
Lines should not be more than 80 characters. 12


Package Versions

package version
pkgstats 0.0.3.59
pkgcheck 0.0.2.205


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with ✖️ have been resolved.

@lwinfree just fyi, here's the (start of the) rOpenSci software peer review thread for the "frictionless" R package.
Folks, Lilly Winfree is Product Manager @ Frictionless Data.

A codemeta.json file has now been added.

Thanks, about to send the query.

🚀

Editor check started

👋

Checks for frictionless (v0.9.0.9000)

git hash: 794ca7f6

  • ✔️ Package name is available
  • ✔️ has a 'CITATION' file.
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✔️ Package has continuous integration checks.
  • ✔️ Package coverage is 100%.
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.

Package License: MIT + file LICENSE


1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 15 files) and
  • 2 authors
  • 1 vignette
  • 1 internal data file
  • 8 imported packages
  • 8 exported functions (median 18 lines of code)
  • 26 non-exported functions in R (median 24 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 15 73.0
files_vignettes 1 68.4
files_tests 10 90.7
loc_R 528 49.2
loc_vignettes 119 31.1
loc_tests 1192 89.2
num_vignettes 1 64.8
data_size_total 1364 61.3
data_size_median 1364 66.0
n_fns_r 34 44.1
n_fns_r_exported 8 38.3
n_fns_r_not_exported 26 48.5
n_fns_per_file_r 1 21.7
num_params_per_fn 2 11.9
loc_per_fn_r 20 59.8
loc_per_fn_r_exp 18 42.5
loc_per_fn_r_not_exp 24 70.4
rel_whitespace_R 13 42.4
rel_whitespace_vignettes 41 38.1
rel_whitespace_tests 13 80.4
doclines_per_fn_exp 30 34.5
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 46 64.6

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


2. goodpractice and other checks

Details of goodpractice and other checks (click to open)

3a. Continuous Integration Badges

R-CMD-check

GitHub Workflow Results

name conclusion sha date
pages build and deployment success 48cca6 2022-01-04
pkgdown success 794ca7 2022-01-04
R-CMD-check success 794ca7 2022-01-04
test-coverage success 794ca7 2022-01-04

3b. goodpractice results

R CMD check with rcmdcheck

rcmdcheck found no errors, warnings, or notes

Test coverage with covr

Package coverage: 100

Cyclocomplexity with cyclocomp

No functions have cyclocomplexity >= 15

Static code analyses with lintr

lintr found the following 12 potential issues:

message number of times
Lines should not be more than 80 characters. 12


Package Versions

package version
pkgstats 0.0.3.72
pkgcheck 0.0.2.205


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

@peterdesmet Thanks for the submission - an editor will be assigned as soon as possible, but it may take a few days.

Assigned! @melvidoni is now the editor

Hello @peterdesmet, I'll be the handling editor. I'll start looking for reviewers, and let you know once they are assigned. Please, bare with me for a bit.

Hi @melvidoni, 2 questions:

  1. I'm about to merge a PR with updated functionality into the package. Would it be ok if the reviewers review the resulting 0.10.0 version?
  2. Can I add the peer review badge? rOpenSci

Hello @peterdesmet . 1) They will. None of those contacted replied yet, so they will review the latest once they accept. 2) Not yet, once the reviewing process has finished.

@zambujo added to the reviewers list. Review due date is 2022-02-06. Thanks @zambujo for accepting to review! Please refer to our reviewer guide.

@zambujo: If you haven't done so, please fill this form for us to update our reviewers records.

Hello @peterdesmet I'm still searching for another reviewer. The reviewing deadline for @zambujo is 2022-02-06

@melvidoni @zambujo Thanks!

Version 0.10.0 of the package has just been released, which would be the preferred version for review.

Version 0.10.0 of the package has just been released, which would be the preferred version for review.

Yes, that would be the version to review. Could you please make the link clearer and/or merge to master?

@melvidoni version 0.10.0 has been merged to the default branch (main), but that branch is also used for further development.

To install 0.10.0 specifically (recommended):

devtools::install_github("frictionlessdata/frictionless-r@v0.10.0")

To install the latest development version (0.10.0.9000):

devtools::install_github("frictionlessdata/frictionless-r")

@beatrizmilz added to the reviewers list. Review due date is 2022-02-09. Thanks @beatrizmilz for accepting to review! Please refer to our reviewer guide.

@beatrizmilz: If you haven't done so, please fill this form for us to update our reviewers records.

Hi! Thanks for inviting. Here is my review:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s): demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples: (that run successfully locally) for all exported functions
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 3h

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Congratulations to the authors. Everything worked well. The comments I wrote were mostly ideas that seems to me that could improve the user experience, but the package looks great as it is. Nice testing suite!

README

  • In the README, the authors use an abbreviation (FAIR). It might be good to clarify what is about (Findability, Accessibility, Interoperability, and Reusability) for readers that are not introduced to the concept of FAIR. Also, what do the authors think about linking a text to clarify it?

Vignette

  • In the vignette, the pipe from magrittr is used but it was not loaded. So if someone is trying to follow the code in the vignette, there is an could not find function "%>%" error.

  • Just for the record: When executing library(frictionless) there is a message saying that The following object is masked from ‘package:usethis’: create_package: https://usethis.r-lib.org/reference/create_package.html

  • There is some examples that are shown using lists. What do the authors think about using functions instead, to be more standardized? Example: create a function resources(package) to use it instead of package$resource_names.

  • Adding the descriptions to the schema does not seem trivial. There is an example with the purrr package. But the example might be not simple to understand if someone is not used to the purrr package.

I`m talking about this piece of code:

iris_schema <- create_schema(iris)

# Remove description for first field
iris_schema$fields[[1]]$description <- NULL

# Set descriptions for all fields
descriptions <- c(
  "Sepal length in cm.",
  "Sepal width in cm.",
  "Pedal length in cm.",
  "Pedal width in cm.",
  "Iris species."
)
iris_schema$fields <- purrr::imap(
  iris_schema$fields,
  ~ c(.x, description = descriptions[.y])
)

Do the authors think that it is possible to create a function to add descriptions to the schema, in a way that is used in a similarly to the other functions of the package? Example of the idea:

iris_schema <- create_schema(iris) |>
  add_description(
    c(
      "Sepal length in cm.",
      "Sepal width in cm.",
      "Pedal length in cm.",
      "Pedal width in cm.",
      "Iris species."
    )
  )
  • The vignette says that in order to validate the Data Package, the users need to use Python. Is there any plans to implement that funcionality in R aswell?

Validate your Data Package before depositing. You can do this in Python with the Frictionless Framework using frictionless validate datapackage.json.

  • In the vignette, there is some instructions to zip the csv to reduce size of the files:

Zip the individual csv files (and update their paths in datapackage.json) to reduce size, not the entire Data Package. That way, users still have direct access to the datapackage.json file. See this example.

Do the authors think that is a good idea to add an argument on write_package() to write the compressed csvs, to facilitate this step? Something like:

write_package(my_package, "my_directory", compress = TRUE) 

Since the function uses readr::write_csv(), if compress = TRUE it could add .gz at the end of the filepath and readr would zip it on the fly.

Example:

readr::write_csv(mtcars, "mtcars.csv")
file.size("mtcars.csv")
#> [1] 1281

readr::write_csv(mtcars, "mtcars_zipped.csv.gz")
file.size("mtcars_zipped.csv.gz")
#> [1] 558

I hope the review is usefull. Again, congratulations for the authors.

Full report is here: https://github.com/beatrizmilz/ropensci_reviews/blob/main/frictionless/review.md

Thanks @beatrizmilz for your review! My feedback:

  • README: FAIR -> Will link to https://www.go-fair.org/fair-principles/ and not expand in text, so sentence remains readable.
  • Vignette: Right, %>% was loaded via dplyr in the hidden first code chunk of the vignette, but that is not visible to users. Will adapt. I'm opting to load it via dplyr, since that package is already a dependency.
  • Masking of usethis::create_package(): Yeah, it is a bit unfortunate that the term package is used for different things in Frictionless vs R (as explained at the start of the vignette). Luckily in R it is often referred to as pkg in function names, reducing masking. In the Frictionless Community "Data Package" does seem to be consistently referred to as package in implementations in other languages, not dp, seldom as datapackage, which is why I adopted that term for frictionless functions and parameters. I think alternatives like create_datapackage(), create_data_package(), create_dataset() are less desirable, but 👉 feedback welcome 👈. Was the term package confusing in any way?
  • Add resources(package) function: Good idea, added as todo in frictionlessdata/frictionless-r#97
  • Add add_description() function: Agree that this is not trivial with purrr, which is why included an example (also as a reminder for myself how to do it ☺️). There is already an issue that discusses a dedicated function to add field properties (description is one of those). I added your suggestion there: frictionlessdata/frictionless-r#70 Might be implemented in a future release.
  • Package validation: "The vignette says that in order to validate the Data Package, the users need to use Python. Is there any plans to implement that funcionality in R aswell?" Data Package validation is vast and must perform at scale, so this is a daunting task. 😅 Would be cool to have it in frictionless, but I currently have no plans to include this.
  • Zipped resources: Since v0.10.0, add_resource() supports adding (zipped) CSVs directly from disk, but that is indeed not the case for added data frames. So write_package(my_package, "my_directory", compress = TRUE) is a cool idea, and likely sufficient to support at a package level (rather than deciding for each resource). Added as todo in frictionlessdata/frictionless-r#98

Dear all, apologies for the delay. Please find my comments below:

Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s): demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples: (that run successfully locally) for all exported functions
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 4h

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

The package integrates the frictionlessdata framework library collection which facilitates the packaging of tabular text data along with their schemas across different programming languages. The framework provides a set of tools intended to facilitate the creation of "FAIR-compliant" datasets. The umbrella project is led by the Open Knowledge Foundation and the framework is most known for its command-line tool written in Python.

Regarding the R package, the documentation is well organised and complete. The same applies to the unit tests. (Apropos, I like seeing how the authors handle errors with abundant assertions directly on the main code.) I was unable to find any relevant issues and have only a few minor optional suggestions as well as some open points/questions for discussion. All in all, the package has been beautifully crafted. Well done!

Optional suggestions/questions:

(in no particular order)

  • check_path(path) (in utils.R): would it make sense to include other protocols other than http by using something like grepl("^http://|^https://|^ftp://|sftp://", path) - instead of starts_with("http")
  • in unique_sorted() (in utils.R), I was wondering whether stats::aggregate() is necessary. Intuitively I would have used table() from base R: names(sort(table(x), decreasing = TRUE)), provided it passes the unit test.
  • Is there a reason to pass a default value to the file name parameter in read_package()? Certain functions such as bmp() provide names by default with automatic file numbering, but I think it is more common not to provide any default value.
  • would it make sense to replace {readr} with {vroom} to improve reading and writing times for large files?
  • if I understand correctly, when creating a data package, factors are converted to strings of characters. Would it make sense to package extra dedicated lookup tables to account for the information contained factor levels and their order?
  • in the future, would it make sense to develop {frictionless-r} more towards being a frictionless-py wrapper (somewhat similarly to spacyr/spacy), to make it easier to keep the R package in sync with the Python project?

Logged review for beatrizmilz (hours: 3)

Logged review for zambujo (hours: 4)

Thank you both @beatrizmilz and @zambujo for the thoughtful reviews!

@peterdesmet please proceed with the outstanding changes whenever you have time. I'll ask both reviewers to stay tuned to see how your changes are being addressed.

Thanks @zambujo for your review. My feedback:

  • Other protocols for check_path(): Other protocols like FTP could indeed be implemented, but the specs for "URL or path" states that only http, https, and local POSIX paths are allowed. It makes sense though to allow (S)FTP, so have asked the Frictionless community for guidance.
  • Using table() in unique_sorted(): Nice! More elegant indeed. Only had to update to handle all NA_character_ values.
  • No default value for read_package(): Although it is unlikely that a datapackage.json will be present in the working directory, I'm tempted to keep it, because according to the specs a descriptor file must be named datapackage.json, so the default name hints that the user should provide a path to such a file.
  • vroom: {readr} 2.0.0 and up use {vroom} under the hood. frictionless-r requires readr >= 2.1.0 and thus {vroom}, so I don't think there is going to be a speed difference using {vroom} directly. Even though many functions in {readr} can be exchanged for {vroom} functions, I'll keep {readr} because 1) I communicate in the documentation that {readr} is used by some functions and that is a more well-known package to end users than {vroom} and 2) I rely on readr::guess_encoding().
  • factors: when reading a data package, str/integer/numeric values that have an enum in their schema are converted to factors, with the enum values as levels, in the order they are listed in enum. When writing a data package, factors keep their data type and the levels are written in an enum field. No (re)ordering is done.
    {
      "name": "str_factor",
      "type": "string",
      "constraints": {
        "enum": ["foo", "bar"]
      }
    },
    {
      "name": "num_factor",
      "type": "number",
      "constraints": {
        "enum": [3.1, 3.2, 3.3]
      }
    },
    {
      "name": "int_factor",
      "type": "integer",
      "constraints": {
        "enum": [3, 4, -1]
      }
    }
  • r package as wrapper around Python: An option indeed, but:
    1. Currently out of my area of expertise
    2. Python dev is going fast and currently a bit of a moving target
    3. The main target to keep in sync with are the specs and they luckily don't change that fast.
    4. In my opinion the previous R package {datapackage.r} suffers from being un-R-like, introducing OO concepts that are foreign to most R users, which might happen again on a wrapper.
      However, for Data Package validation specifically (currently not in scope for {frictionless-r}), wrapping around the Python toolbox is likely useful!

Many thanks @peterdesmet. You have addressed all my comments and questions. Impressive work!

Ps. I have to confess that I had to update my packages to be able to review frictionless-r. Incidentally, I did notice a huge improvement in the performance of {readr} when I ran some code this morning. 🤓

Thanks @zambujo. The suggested change for unique_values() is implemented in frictionlessdata/frictionless-r#101.

@melvidoni, the comments suggested by @beatrizmilz are addressed in #495 (comment) and where actionable, all implemented in the latest version of the package. Both reviewers were included with rev roles: thanks to you both!

One lingering question I have for the reviewers is the use of the word package. I'm copy/pasting my question from higher up:

  • Masking of usethis::create_package(): Yeah, it is a bit unfortunate that the term package is used for different things in Frictionless vs R (as explained at the start of the vignette). Luckily in R it is often referred to as pkg in function names, reducing masking. In the Frictionless Community "Data Package" does seem to be consistently referred to as package in implementations in other languages, not dp, seldom as datapackage, which is why I adopted that term for frictionless functions and parameters. I think alternatives like create_datapackage(), create_data_package(), create_dataset() are less desirable, but 👉 feedback welcome 👈. Was the term package confusing in any way?

Since you both didn't remark on that, I assume that the word package was not confusing in read_package(), create_package() or write_package(), but I want to make sure.

Okay, given that @zambujo gave the okay, we are only missing @beatrizmilz's comments on the latest changes, and the answer for your question. Let's wait for her, then.

Hi! Peter, the word package was not confusing for me since there was an explaination in the documentation! I pointed out about the masking with usethis because for users that uses only library() and are not familiar with the possibility of conflict between functions with the same names, can eventually encounter errors and ask for help (and that is not a problem with the package!). I think i thought about that because I answer a lot of questions in foruns in portuguese, and is frequent questions about errors caused by masking.

You have addressed all the questions, and as @zambujo said, this is an impressive work. Congratulations!

@ropensci-review-bot approve frictionless

Approved! Thanks @peterdesmet for submitting and @zambujo, @beatrizmilz for your reviews! 😁

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so.
  • After transfer write a comment @ropensci-review-bot finalize transfer of <package-name> where <package-name> is the repo/package name. This will give you admin access back.
  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.
  • Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
  • Fix any links in badges for CI and coverage to point to the new repository URL.
  • Increment the package version to reflect the changes you made during review. In NEWS.md, add a heading for the new version and one bullet for each user-facing change, and each developer-facing change that you think is relevant.
  • We're starting to roll out software metadata files to all rOpenSci packages via the Codemeta initiative, see https://docs.ropensci.org/codemetar/ for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.
  • You can add this installation method to your package README install.packages("<package-name>", repos = "https://ropensci.r-universe.dev") thanks to R-universe.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent).

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved.

Last but not least, you can volunteer as a reviewer via filling a short form.

@melvidoni Is it required to transfer the {frictionless-r} repository to rOpenSci? Because this is also a question of branding. Ping @lwinfree @sapetti9 @roll from Frictionless Data.

  • Package repository URL: I would prefer to keep https://github.com/frictionlessdata/frictionless-r .That way 1) it is clearer that it is endorsed/maintained by Frictionless Data and 2) it keeps living alongside other implementations of Frictionless Data standards, such Python and JS. -> Keep under frictionlessdata
  • Package website: currently https://frictionlessdata.github.io/frictionless-r/ with generic pkgdown (Bootstrap v5). We could adopt the rOpenSci central docs building and branding, so it looks more like https://docs.ropensci.org/wateRinfo/. That will visually tie it to rOpenSci and required an update of its URL (including a redirect of the old URL). Branding wise, my opinion is that it is fine to adopt the rOpenSci branding, because I hope to add a logo to the package soon, which will visually tie it to Frictionless Data.
  • rOpenSci package family: how can I get the package listed under https://ropensci.org/packages/all/ ?
  • rOpenSci R universe: how can I get the package added to the rOpenSci R universe?
  • Code of conduct: Which code of conduct should we adopt? Frictionless Data vs rOpenSci I think Frictionless Data, since the package is maintained there and conflicts should be resolved there. -> Use Frictionless

Once those questions are answered I can make the necessary changes and then hopefully submit to CRAN! 🎉🤞

TODO based on #495 (comment)

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so.
  • After transfer write a comment @ropensci-review-bot finalize transfer of <package-name> where <package-name> is the repo/package name. This will give you admin access back.
  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.
  • Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
  • Customize sidebar so COC appears there.
  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
  • Fix any links in badges for CI and coverage to point to the new repository URL.
  • Increment the package version to reflect the changes you made during review. In NEWS.md, add a heading for the new version and one bullet for each user-facing change, and each developer-facing change that you think is relevant. Done in https://frictionlessdata.github.io/frictionless-r/news/index.html
  • We're starting to roll out software metadata files to all rOpenSci packages via the Codemeta initiative, see https://docs.ropensci.org/codemetar/ for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.
  • You can add this installation method to your package README install.packages("<package-name>", repos = "https://ropensci.r-universe.dev") thanks to R-universe.

Hi all! First of all, it has been really lovely to watch this process unfold, so a big thank you to everyone that has been involved!

Speaking as product manager of Frictionless Data:

  • we would like to keep the repo under the Frictionless organization as Peter suggests
  • we would like to keep the Frictionless code of conduct to not confuse potential users if that is OK

everything else looks great to me!

Thanks!

Thanks @lwinfree!

@melvidoni What would be the instructions to do the remaining points in #495 (comment) i.e. using the rOpenSci CI and website building for a repo not under rOpenSci (cf. https://github.com/CornellLabofOrnithology/auk/)?

Hello all. Please, bear with me while I discuss with the other Associate Editors. In the meantime, complete what you can, please.

Update @peterdesmet @lwinfree. We are discussing the CoC issue. Will get back to you soon-ish, please bear with us.

Thanks for your work on this package. 😸

Thanks @maelle!

  • I've added a small .github/CODE_OF_CONDUCT.md page that points to Frictionless Data COC. This makes it appear in sidebar.
  • Rather than setting up a frictionlessdata.github.io repository (required when the repository moves organizations - not the case here), I have placed a redirect index.html on the gh-pages branch of the repository, disabled automatic pkgdown building, but kept GitHub Pages active. https://frictionlessdata.github.io/frictionless-r/ now successfully redirects
  • I have kept some minor styling tweaks to _pkgdown.yml for local pkgdown building. I don't think it will affect building at rOpenSci docs.
  • I have kept the other GitHub Actions (e.g. test-coverage.yaml) intact. Or are those checks provided by rOpenSci CI and thus not necessary in the repository?

Thank you!

I have kept some minor styling tweaks to _pkgdown.yml for local pkgdown building. I don't think it will affect building at rOpenSci docs.

Indeed, we override those in https://github.com/ropensci-org/rotemplate

I have kept the other GitHub Actions (e.g. test-coverage.yaml) intact. Or are those checks provided by rOpenSci CI and thus not necessary in the repository?

It is good to keep them indeed. R-universe does build the package but you wouldn't get notified and you can't share credentials for instance. You can see the R-universe status of your package at https://ropensci.r-universe.dev/ui#builds