New submission: tidyhydat

Question

New submission: tidyhydat

boshek opened this issue 7 years ago · comments

Summary

What does this package do? (explain in 50 words or less): tidyhydat provides functions to extract historical and real-time national hydrometric data from Water Survey of Canada data sources
Paste the full DESCRIPTION file inside a code block below:

Package: tidyhydat
Title: Extract and Tidy Canadian Hydrometric Data
Version: 0.2.9
Authors@R: c(person("Sam", "Albers", email = "sam.albers@gov.bc.ca", role = c("aut", "cre"),
           ), person("David", "Hutchinson", email = "david.hutchinson@canada.ca", role = "ctb"),
           person("Province of British Columbia", role = "cph"))
Description: tidyhydat provides functions to extract historical and real-time national hydrometric
  data from Water Survey of Canada data sources (http://dd.weather.gc.ca/hydrometric/csv/ and
  http://collaboration.cmc.ec.gc.ca/cmc/hydrometrics/www/) and then apply tidy data principles.
Depends: R (>= 3.4.0)
License: Apache License (== 2.0) | file LICENSE
URL: https://github.com/bcgov/tidyhydat
BugReports: https://github.com/bcgov/tidyhydat/issues
Encoding: UTF-8
Imports:
    dplyr (>= 0.7.3),
    readr (>= 1.1.1),
    lubridate (>= 1.6.0),
    tidyr (>= 0.7.1),
    DBI (>= 0.7),
    RSQLite (>= 2.0),
    tibble (>= 1.3.4),
    httr (>= 1.3.1)
Suggests:
    tidyverse,
    dbplyr,
    knitr,
    rmarkdown,
    testthat
LazyData: true
RoxygenNote: 6.0.1
VignetteBuilder: knitr

URL for the package (the development repository, not a stylized html page): https://github.com/bcgov/tidyhydat
Please indicate which category or categories from our package fit policies this package falls under *and why:
- data retrieval: tidyhydat retrieves data from the Water Survey of Canada datamart via the download_realtime_dd() function from http://dd.weather.gc.ca/hydrometric/csv/. download_realtime_ws provides access via the httr package to a new password protected web service provided by Environment and Climate Change Canada (ECCC). Differences between the download realtime functions are outlined here
- data extraction: The majority of tidyhydat's exported functions provide easy access to ECCC's publicly available and open-licenced HYDAT sqlite3 database (http://collaboration.cmc.ec.gc.ca/cmc/hydrometrics/www/) of historical river and lake levels in Canada. HYDAT is updated quarterly and is distributed via the Canadian Open Government Licence. HYDAT is not included with this package because it is regularly updated and because the file size is prohibitively large (>1GB) for inclusion in an R package. Rather tidyhydat provides the download_hydat() function that downloads it for the user.
Who is the target audience?
- Anyone who is interested in making use of the Water Survey of Canada data sources in R including industry, government and academics.
Are there other R packages that accomplish the same thing? If so, how does
yours differ or meet our criteria for best-in-category?
- A search of the term "hydat" on github reveals two R packages that provide similar (but not all the same) functionality:
- HYDAT: the original author of HYDAT has contributed to this package and further development or maintenance of HYDAT is not imminent.
- hydatr was developed at approximately the same time as tidyhydat.
- tidyhydat documentation is more comprehensive than both packages.
- Through the realtime functions tidyhydat provides added functionality not present in both packages. Neither HYDAT nor hydatr provide any functionality to access the web service and hydatr does not access any real-time datamart data.
- An express goal of tidyhydat is to provide data in a tidy format. This conceptual data science goal provides a clear objective that is missing from both other packages.

Requirements

Confirm each of the following by checking the box. This package:

does not violate the Terms of Service of any service it interacts with.
has a CRAN and OSI accepted license.
contains a README with instructions for installing the development version.
includes documentation with examples for all functions.
contains a vignette with examples of its essential functions and uses.
has a test suite.
has continuous integration, including reporting of test coverage, using services such as Travis CI, Coeveralls and/or CodeCov.
I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

Do you intend for this package to go on CRAN?
Do you wish to automatically submit to the Journal of Open Source Software? If so:
- The package has an obvious research application according to JOSS's definition.
- The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
- The package is deposited in a long-term repository with the DOI:
- (Do not submit your package separately to JOSS)

Detail

Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:
Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

Taken from the list reviewers as folks that might be appropriate:

Joseph Stachelek: @jsta
Jeffrey Hollister: @jhollist
Alison Appling: @aappling-usgs
Laura DeCicco: @ldecicco-USGS

Laura DeCicco commented 7 years ago

Agreed!

Luke Winslow commented 7 years ago

nope.

Sam Albers · Answer 1 · Thu Sep 14 2017 07:11:14 GMT+0800 (China Standard Time)

Results of goodpractice (Updated 2017-09-14)

> goodpractice::gp()
Preparing: covr
Preparing: cyclocomp
* installing *source* package 'tidyhydat' ...
** R
** data
*** moving datasets to lazyload DB
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (tidyhydat)
Preparing: description
Preparing: lintr
Preparing: namespace
Preparing: rcmdcheck
── GP tidyhydat ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 69% of code lines are covered by test cases.

    R/ANNUAL_INSTANT_PEAKS.R:43:NA
    R/ANNUAL_INSTANT_PEAKS.R:44:NA
    R/ANNUAL_INSTANT_PEAKS.R:48:NA
    R/ANNUAL_INSTANT_PEAKS.R:49:NA
    R/ANNUAL_INSTANT_PEAKS.R:50:NA
    ... and 391 more lines

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    inst\doc\tidyhydat.R:2:1
    R\ANNUAL_INSTANT_PEAKS.R:84:1
    R\ANNUAL_INSTANT_PEAKS.R:87:1
    R\ANNUAL_INSTANT_PEAKS.R:90:1
    R\ANNUAL_INSTANT_PEAKS.R:97:1
    ... and 397 more lines

  ✖ not import packages as a whole, as this can cause name clashes between the imported packages. Instead, import only the
    specific functions you need.
  ✖ fix this R CMD check NOTE: Note: found 20 marked UTF-8 strings
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Scott Chamberlain · Answer 2 · Fri Sep 15 2017 00:16:11 GMT+0800 (China Standard Time)

thanks for your submission @boshek - We're discussing this now and will get back to you soon

Scott Chamberlain · Answer 3 · Fri Sep 15 2017 03:15:59 GMT+0800 (China Standard Time)

Editor checks:

Fit: The package meets criteria for fit and overlap
Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
License: The package has a CRAN or OSI accepted license
Repository: The repository link resolves correctly
Archive (JOSS only, may be post-review): The repository DOI resolves correctly
Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

again, thanks for your submission @boshek

Thanks for including goodpractice output

Seeking reviewers now.

Reviewers: @ldecicco-USGS @lawinslow
Due date: 2017-10-18

Sam Albers · Answer 4 · Tue Sep 26 2017 06:37:13 GMT+0800 (China Standard Time)

NOTE: Version update since original submission. goodpractice output is unchanged and the DESCRIPTION file above has updated accordingly. Changes are catalogued in the NEWS.md file in the tidyhydat repo.

Scott Chamberlain · Answer 5 · Thu Sep 28 2017 02:51:43 GMT+0800 (China Standard Time)

Reviewers are @ldecicco-USGS and @lawinslow - thanks for reviewing! due date above

Scott Chamberlain · Answer 6 · Thu Oct 12 2017 11:26:25 GMT+0800 (China Standard Time)

😸 @ldecicco-USGS / @lawinslow reminder that your reviews are due in 1 week 😸

Luke Winslow · Answer 7 · Mon Oct 16 2017 23:16:17 GMT+0800 (China Standard Time)

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions in R help
Examples for all exported functions in R Help that run successfully locally
Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

For packages co-submitting to JOSS

The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software

Authors: A list of authors with their affiliations

A statement of need clearly stating problems the software is designed to solve and its target audience.

References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5

Review Comments

Overall

The tidyhydat package offers interfaces to Environment and Climate Change Canada stream and river data.
It primarily interfaces with a downloadable database, "HYDAT", that contains a great deal of historical data.
It also provides functions for accessing some real-time data through select web interfaces.

Overall, the scientific computing world requires more tools like these. Many a student, reseracher, or manager
has had to re-create the wheel to access available data and providing such interfaces goes a long way to
accelerating and streamlining such data use.

I do have a few reservations and suggestions about the current implementation of tidyhydat that I think, if addressed,
would improve the package functionality and enable broader use.

One issue I encountered was the database download, with personal handling of the database path, is awkward. It seems that much of the work
of downloading the database, and then storing the path could be automated. The storage location could be supported by
the rappdirs package. There are existing examples of such file downloading and use, such as getlandsat,
hydrolinks, and LAGOS to name a few
with various implementations. This would make it easier for users to quickly get started using the data and eliminate
the need to muck around in the Renviron file.

Another overarching issue (which is somewhat subjective) is to the outside reviewer, the choice of function capitalization
seems arbitrary and not helpful. While the choices of function style are subject to personal preferences, I would suggest
the usage of consistent capitalization scheme, preferrably a lowercase-based one. I am personally especially partial to
the consistent prefix-style function naming, which is used in a number of rOpenSci
and other related packages. It eases function discovery as well as makes packages
seem more unified and consistent.

Documentation

First thing I noticed were the typos and grammatical errors on the Github README. I'd suggest reviewing it for
grammatical issues if it will be the github face of tidyhydat (noted mistakes in first two bullet points).

I would suggest you build and commit your Rd files to github. Otherwise, devtools::install_github
will end up installing a package with no documentation. Took me 5 minutes of trying to get documentation before
I realized why no functions were documented.

Line 29 zzz.R I would drop the #' comments which results in R trying to build an Rd file for .onLoad.

Lastly. I would consider revising your tidyhydat-package documentation. This is frequently the first interaction
a user might have with the package. Currently it is very focused on the data format and output style (tidy),
not the data itself. This documentation (along with a Vignette) would be an excellent place to better give an overview
of the data available. What data are availble through this package? Stream/river level, flow, and it seems, sediment
flux estimates at daily and monthly timescales (also real-time in some cases)?

On the overview side, I'd recommend, if possible, pointing to further HYDAT documentation. A lot of the documentation here
relies on the user's knowledge of the HYDAT database. For example, much of the function documentation refers to the functions
as "Provides wrapper to turn the XXX table in HYDAT into a tidy data frame." which implies knowledge of the HYDAT database.

There seems to be no indication of the units for any of the data. The only unit I could find is in the DATA_TYPES
table, though that was only for Sediment. This is something I feel is important for data packages such as this as units
are critical to understanding the data, and are often the one piece of information I find myself spending excessive time
hunting for (or trying to reverse engineer).

Lastly, a natural question that comes to my mind is, "are there other data"? How about temperature? Or Nutrients? Or
spatial data that reflect the local watersheds and flowlines.

Functionality

One thing about the examples. Using devtools I can run them, but a bunch of them are hard-coded
to use h:/Hydat.sqlite3. It would be better if these examples followed the style outlined in the github
README which uses an .Renviron variable to set the path (which would then allow anyone to quickly use/run examples).

Example for SED_MONTHLY_LOADS is not working. I played around a bit with it but couldn't get it to run. This
may require better output of empty return values as it seems like it should be ok that a query returns no data.

> SED_MONTHLY_LOADS(PROV_TERR_STATE_LOC = "PE", hydat_path = "H:/Hydat.sqlite3", start_date = "2001-01-01", end_date = "2011-01-01")
 
 Error: !length(data) || is_named(data) is not TRUE 
11.stop(msg, call. = FALSE, domain = NA) 
10.stopifnot(!length(data) || is_named(data)) 
9.env_bind_impl(env, dots_list(...)) 
8.child_env(env_, ...) 
7.env_bury(overscope_top, !(!(!data))) 
6.vars_select_eval(.vars, quos) 
5.tidyselect::vars_select(names(data), !(!(!quos))) 
4.unname(tidyselect::vars_select(names(data), !(!(!quos)))) 
3.gather.data.frame(sed_monthly_loads, variable, temp, -(STATION_NUMBER:NO_DAYS)) 
2.tidyr::gather(sed_monthly_loads, variable, temp, -(STATION_NUMBER:NO_DAYS)) at SED_MONTHLY_LOADS.R#107
1.SED_MONTHLY_LOADS(PROV_TERR_STATE_LOC = "PE", hydat_path = "H:/Hydat.sqlite3", 
    start_date = "2001-01-01", end_date = "2011-01-01")

I found that this same function failed on the testthat unit tests as well.

Overall, I look forward to the tidyhydat package being available on CRAN and using its functionality to
integrate hydrologic data at large scales.

Sam Albers · Answer 8 · Tue Oct 17 2017 00:18:18 GMT+0800 (China Standard Time)

Thank you for this review @lawinslow. At first glance all changes suggested seem entirely appropriate. I will wait for @ldecicco-USGS's review, however, to ensure there are no conflicting changes requested.

Luke Winslow · Answer 9 · Tue Oct 17 2017 01:01:37 GMT+0800 (China Standard Time)

👍 Sounds great!

Laura DeCicco · Answer 10 · Thu Oct 19 2017 03:52:46 GMT+0800 (China Standard Time)

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions in R help
Examples for all exported functions in R Help that run successfully locally
Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

For packages co-submitting to JOSS

The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software

Authors: A list of authors with their affiliations

A statement of need clearly stating problems the software is designed to solve and its target audience.

References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5

Review Comments

Random train-of-thoughts:

LOVE the hex icon...totally jealous.

The example:

file.edit("~/.Renviron")

That path does not automatically work for Windows users I think. You might want to clarify that you will probably want a .Renviron file in the same directory as your RStudio project (assuming you use RStudio)...or something like that.

In download.R, there's a message: stop("No exists for this station query")...might want to change to "No data...".
For me, none of the download_realtime_ws examples worked for me (I did get the token working). Each example ended with "No exists for this station query"

I'm not sure CRAN will be able run those examples (in the download_realtime_ws function specifically). You probably will need to wrap them in dontrun like:
https://github.com/USGS-R/dataRetrieval/blob/master/R/getWebServiceData.R#L30

Generally, I'm a big fan of every exported function having at least one functional example. It helps immensely, especially when teaching new R users. A package such as this could be very useful for new R users with a flavor for Canadian hydrology.

Always good to explicitly state the units for the data. Some might be obvious, but some not.

I agree with Luke that it was frustrating to have the examples use a path to the database that I did not use. If there was a cleaner way to deal with this (he gave a few example packages to check out), that would make life for the user easier. You could probably have the examples work if you pointed them to:

tinyflow <- DLY_FLOWS(STATION_NUMBER = c("08LA001","08NL071"),
                      hydat_path = system.file("test_db/tinyhydat.sqlite3", package = "tidyhydat"))

of course...you'd need to adjust stations and what-not that are in the tiny db.

It would be really cool to have (maybe another) vignette that works through a complete analysis. Maybe from looking for data, to getting the data, to visualizing the data. Alternatively, maybe expanding the help files a bit to describe more completely what is returned from each function/data set. It wasn't obvious to me until I ran the function and looked at the result.

Cleaning up the style (some caps, some name_function) would be great.

Great job on using dplyr::, it's something I'd like to refactor on a lot of my old code.

I noticed ifelse(FULL_MONTH == 1, "Yes", "No") in SED_MONTHLY_SUSCON...wouldn't it make more sense to be a logic? ie:

sed_monthly_suscon <- dplyr::mutate(sed_monthly_suscon, FULL_MONTH = FULL_MONTH == 1)

Running the tests locally, I got:

  1. Error: SED_MONTHLY_LOADS accepts single and multiple province arguments (@test_SED_MONTHLY_LOADS.R#5) 
  2. Error: SED_MONTHLY_LOADS accepts single and multiple province arguments (@test_SED_MONTHLY_LOADS.R#12) 
  3. Error: SED_MONTHLY_LOADS can accept both arguments for backward compatability (@test_SED_MONTHLY_LOADS.R#22) 
  4. Error: download_realtime_ws returns the correct data header (@test_download_realtime.R#6)

At least for the download tests, you'll probably want to add testthat::skip_on_cran()

Notes on the implications of focusing on "tidy":
When I saw the name of the package, I thought that there was going to be something like:

find_stations(bbox=c(w,x,y,z)) %>%
  get_data(parameter="flow") %>%
  calc_stats("annual")

I see what was meant is that it converts the data from messy to tidy....but just a side not that it would be cool get some hydro tidy tools too....maybe another day!
https://bookdown.org/rdpeng/RProgDA/gaining-your-tidyverse-citizenship.html

The dplyr coding within the functions looks pretty straightforward...so for that reason I probably wouldn't change it, but it might be good time to look into some of the newer features of dplyr that don't require the utils::globalVariables call .onLoad. There's a good collection of blogs and stuff here:
https://maraaverick.rbind.io/2017/08/tidyeval-resource-roundup/

IN CONCLUSION:
Mostly the functions made sense and worked as expected (the download and SED_MONTHLY_LOADS being the exceptions). In the grand scheme of things, my comments here were for improving the help files and examples. That is something that is never really done. Therefore, I'd like to see those few things cleaned up, the tests to pass, and then I'd say it would be good-to-go (realizing improving documentation should be a goal continuously).

Sam Albers · Answer 11 · Thu Oct 19 2017 06:02:57 GMT+0800 (China Standard Time)

Thank you to the reviewers. Below are some clarifications or questions. For dealing with the full review I will provide a more lengthy response addressing each comment. This is going to vastly improve the package.

Some general questions/clarifications:
@lawinslow comments:

Function capitalization: I am open to moving all functions to lowercase. I had originally capitalized all functions that access HYDAT to distinguise those in lower case which that accessed realtime data. From the comments it seems like the reviewers would prefer lowercase. That is fine. So just to confirm @lawinslow recommends switching from DLY_FLOWS() to hydat_dly_flows() with all other HYDAT functions following a similar convention.

I would suggest you build and commit your Rd files to github. Otherwise, devtools::install_github will end up installing a package with no documentation. Took me 5 minutes of trying to get documentation before I realized why no functions were documented.

This is weird. I think they are committed? Here the man folder. How did you install the package? What version are you using?

Lastly, a natural question that comes to my mind is, "are there other data"? How about temperature? Or Nutrients? Or spatial data that reflect the local watersheds and flowlines.

The webservice (accessed with download_realtime_ws()) outputs temperature data where it exists. Additional nutrient or temperature data could be added as it becomes available in future versions. I think including spatial data in this particular package removes this package from the axiom "do one thing and do it well". I think there is a need for spatial data that work well with this package but I think ultimately that should reside in another package.

@ldecicco-USGS comments:

I noticed ifelse(FULL_MONTH == 1, "Yes", "No") in SED_MONTHLY_SUSCON...wouldn't it make more sense to be a logic? ie:
sed_monthly_suscon <- dplyr::mutate(sed_monthly_suscon, FULL_MONTH = FULL_MONTH == 1)

This is a good point and raises another. Could you, @sckott or @lawinslow comment on artificial data codes in general? For example would the preference be to have FULL_MONTH as a 1 or 0 OR as a YES or NO. The YES and NO option seems clearer to me but would add to the size of the returned tibble. There are several instances of this in tidyhydat.

Luke Winslow · Answer 12 · Thu Oct 19 2017 06:17:57 GMT+0800 (China Standard Time)

This is weird. I think they are committed? Here the man folder. How did you install the package? What version are you using?

Weird. I now have no issue getting them. I swear there was a glitch in the matrix or something. Feel free to disregard.

I think including spatial data in this particular package removes this package from the axiom "do one thing and do it well". I think there is a need for spatial data that work well with this package but I think ultimately that should reside in another package.

Sure. Fair point. But I'll point out that "one thing" might refer to "all useful data pertaining to the hydrologic monitoring network". I mainly pointed it out because I was thinking about the type of data researchers might want that connects with these data. Data you might classify as "metadata" definitely came to mind. Of course one person's metadata is another person's independent package.

So just to confirm @lawinslow recommends switching from DLY_FLOWS() to hydat_dly_flows() with all other HYDAT functions following a similar convention.

Yes, my preference is for the lowercase version.

Sam Albers · Answer 13 · Tue Oct 24 2017 06:00:50 GMT+0800 (China Standard Time)

@lawinslow @ldecicco-USGS @sckott

Two questions:

Given that I have incorporated Luke's suggestion to rappdirs and am looking to standardize documentation from a user perspective, is it better to provide examples that use the included tiny test database in which the user can use the package right away:
- DLY_FLOWS(STATION_NUMBER = c("08LA001","08NL071"), hydat_path = system.file("test_db/tinyhydat.sqlite3", package = "tidyhydat"))

Or would it better to have examples that use the full hydat database like this:

DLY_FLOWS(STATION_NUMBER = c("08LA001","08NL071"))
Knowing that users would have to download the full database before being able to use those functions.

Any opinions on which approach is less annoying? The "get started right away" approach or the "learn it how I am going to use it" approach?

The recommendation was to switch from DLY_FLOWS() to hydat_dly_flows(). However I did not consider the length of other function names. In particular this function ANNUAL_INSTANT_PEAKS() becomes hydat_annual_instant_peaks() (similar situation with SED_MONTHLY_SUSCON()). This results in, at least in my estimation, a function name that is too long. Are there any objections to omitting the hydat_* portion of the function name to save space?

Scott Chamberlain · Answer 14 · Tue Oct 24 2017 06:28:26 GMT+0800 (China Standard Time)

hi @boshek , i can't decipher this text above

is it better to provide examples that user the tiny database like this which the user can use right away

can you clarify?

Sam Albers · Answer 15 · Tue Oct 24 2017 06:33:01 GMT+0800 (China Standard Time)

@sckott Updated the offending lines. Does that help?

Luke Winslow · Answer 16 · Tue Oct 24 2017 19:40:13 GMT+0800 (China Standard Time)

If I were you, I'd be careful with the ambiguity of a "working" function without downloading the database. i.e., would it be possible for the user to see that it is working via the example, and then try using it in a fully-functional manner and simply not get any data?

That's definitely the mistake I would make. Immediately download the package, not read the documentation, and start trying to run the examples. Seeing they run, I would then try to expand to all sites (and start plotting data), but find that I could only get data for one or two sites.

Just a thought and a user-case (non-documentation reader but example user) that you might want to consider.

Scott Chamberlain · Answer 17 · Wed Oct 25 2017 00:02:29 GMT+0800 (China Standard Time)

For function names, prefixes are a good idea to avoid namespace conflicts with other packages. Looking at your exported fxn names, they don't strike me as too likely to conflict, but some might, e.g., stations, You could choose a shorter prefix like e.g., hy or so.

Sam Albers · Answer 18 · Thu Nov 02 2017 04:15:10 GMT+0800 (China Standard Time)

Thank you to @lawinslow , @ldecicco-USGS and @sckott for the super helpful input on this package. I have tracked my changes to the package via a series of issues. As far as I can tell I have addressed the reviewers comments. Those are summarized in the NEWS section of v0.3.0 which is replicated below. I have also created a PR merging the review branch into the master with all the changes requested which can be reviewed (if desired) here: ropensci/tidyhydat#58

List of changes

tidyhydat 0.3.0

NEW FEATURES

New NEWS template!
Moved station_number to first argument to facilitate piped analysis (ropensci/tidyhydat#54)
search_stn_name and search_stn_number now query both realtime and historical data sources and have tests for a more complete list (ropensci/tidyhydat#56)
With credential stored in .Renviron file, ws_token can successfully be called by ws_token().
.onAttach() checks if HYDAT is downloaded on package load.

MINOR IMPROVEMENTS

Significant function and argument name changes (see below)
Adding rappdirs to imports and using to generate download path for download_hydat() (ropensci/tidyhydat#44)
Adding rappdirs so that all the hy_* functions access hydat from rappdirs::user_data_dir() via hy_dir() (ropensci/tidyhydat#44)
Revised and cleaned up documentation including two vignettes (ropensci/tidyhydat#48)
FULL MONTH evaluate to a logic (ropensci/tidyhydat#51)
All download tests are skipped on cran (ropensci/tidyhydat#53)
Removed time limit for download_realtime_ws() with some documentation on actual limits. (3234c22)

BUG FIXES

Add informative error message for a single missing station input (ropensci/tidyhydat#38)
Fixed failing behaviour of get_ws_token (ropensci/tidyhydat#43)
Fixed httr:content parsing error in download_realtime_ws. (ropensci/tidyhydat#42)
No longer trying to build .Rd file for .onload (ropensci/tidyhydat#47)
Fixed SED_MONTHLY_LOADS (ropensci/tidyhydat#51)
Wrapped download_realtime_ws examples in dontrun (ropensci/tidyhydat#52)

FUNCTION NAME CHANGES (ropensci/tidyhydat#45)

hy_agency_list <- AGENCY_LIST
hy_annual_instant_peaks <- ANNUAL_INSTANT_PEAKS
hy_annual_stats <- ANNUAL_STATISTICS
hy_daily_flows <- DLY_FLOWS
hy_daily_levels <- DLY_LEVELS
hy_monthly_flows <- MONTHLY_FLOWS
hy_monthly_levels <- MONTHLY_LEVELS
hy_sed_daily_loads <- SED_DLY_LOADS
hy_sed_daily_suscon <- SED_DLY_SUSCON
hy_sed_monthly_loads <- SED_MONTHLY_LOADS
hy_sed_monthly_suscon <- SED_MONTHLY_SUSCON
hy_sed_samples <- SED_SAMPLES
hy_sed_samples_psd <- SED_SAMPLES_PSD
hy_stations <- STATIONS
hy_stn_remarks <- STN_REMARKS
hy_stn_datum_conv <- STN_DATUM_CONVERSION
hy_stn_datum_unrelated <- STN_DATUM_UNRELATED
hy_stn_data_range <- STN_DATA_RANGE
hy_stn_data_coll <- STN_DATA_COLLECTION
hy_stn_op_schedule <- STN_OPERATION_SCHEDULE
hy_stn_regulation <- STN_REGULATION
hy_agency_list <- AGENCY_LIST
hy_reg_office_list <- REGIONAL_OFFICE_LIST
hy_datum_list <- DATUM_LIST
hy_version <- VERSION
realtime_dd <- download_realtime_dd
realtime_ws <- download_realtime_ws
token_ws <- get_ws_token
realtime_stations <- realtime_network_meta
search_stn_name <- search_name
search_stn_number <- search_number

ARGUMENT NAME CHANGES (ropensci/tidyhydat#45)

station_number <- STATION_NUMBER
prov_terr_state_loc <- PROV_TERR_STATE_LOC

Scott Chamberlain · Answer 19 · Sat Nov 04 2017 01:37:26 GMT+0800 (China Standard Time)

Ran through some examples and such - no comments from me.

@lawinslow and @ldecicco-USGS - are you happy with the changes?

Luke Winslow · Answer 20 · Sat Nov 04 2017 01:37:57 GMT+0800 (China Standard Time)

👍 yup, ship it!

Scott Chamberlain · Answer 21 · Tue Nov 07 2017 04:26:10 GMT+0800 (China Standard Time)

Approved! Thanks again for your submission @boshek

Please transfer the package to ropensci- I've invited you to a team within our ropensci organization. After you accept you should be able to move it.
Make sure to
- add rOpenSci footer to README
  [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)
- Change any needed links, such those for CI badges
- Travis CI should update to the new location automatically - you may have to update other CI systems manually
Add dotgithub files, see https://github.com/ropensci/dotgithubfiles to make sure users report session info, etc. and to give guidance on how to contribute - issue and PR template probably can be used as is, but feel free to modify CONTRIBUTING as you like
Add the "Peer-reviewed" status badge to your README:

[![](https://badges.ropensci.org/152_status.svg)](https://github.com/ropensci/onboarding/issues/152)

Are you interested in doing a blog post for our blog https://ropensci.org/blog/ ? either a short-form intro to it (https://ropensci.org/technotes/) or long-form post with more narrative about its development (https://ropensci.org/blog/). If so, i'll get you the details on that

Sam Albers · Answer 22 · Wed Nov 08 2017 01:26:14 GMT+0800 (China Standard Time)

@sckott Thanks! A few questions:

I still would like to submit the paper to JOSS. What is the process there? Whatever the case, I would still like to opportunity to review it once more.
I am interested in doing a blog post. One idea I had was to outline all that I learned. One of the things that I love about R is that no matter what your skill level is, your input can always provide value to someone who is just learning that particular skill. I learned 5 or 6 really useful things when making this package that could be wrapped up into a blog post. Is that something that would be useful? I could also simply submit a "how-to" if that was more desirable.

Sam Albers · Answer 23 · Wed Nov 08 2017 01:32:30 GMT+0800 (China Standard Time)

Also this should probably be changed to "tidyhydat"

https://github.com/orgs/ropensci/teams/tidyhat/members

Scott Chamberlain · Answer 24 · Wed Nov 08 2017 01:35:36 GMT+0800 (China Standard Time)

@boshek Sorry about that, forgot about the JOSS. Will ping back soon with some thoughts.

team name changed.

I like that idea for a blog post! The community would really benefit from that I think. @stefaniebutland thoughts on his idea for a post?

Sam Albers · Answer 25 · Wed Nov 08 2017 01:37:55 GMT+0800 (China Standard Time)

@sckott Great! One other question - I am working through something right now and would prefer to have it all sorted out prior publicizing a release. How does ropensci decide when to publicize it's releases? Should only take a day or so to figure this out.

Scott Chamberlain · Answer 26 · Wed Nov 08 2017 01:47:04 GMT+0800 (China Standard Time)

What do you mean by publicize a release? Do you mean to CRAN or on twitter? or somewhere else?

Sam Albers · Answer 27 · Wed Nov 08 2017 02:06:44 GMT+0800 (China Standard Time)

I guess mostly twitter. When does that happen or it is automated?

Scott Chamberlain · Answer 28 · Wed Nov 08 2017 02:14:32 GMT+0800 (China Standard Time)

If you're thinking of this account https://twitter.com/roknowtifier that is an automated bot. It sends a tweet when there is a new package on CRAN or a new version of an existing package on CRAN - so if the pkg is not on CRAN yet, there's no tweets about the pkg.

In ropensci news https://ropensci.github.io/biweekly/ it's a pattern similar to the twitter bot where we mention new packages on CRAN and new versions on CRAN.

Stefanie Butland · Answer 29 · Wed Nov 08 2017 02:31:14 GMT+0800 (China Standard Time)

Sam! @boshek I'm thrilled for so many reasons - your work to get BC gov't to allow this package to be hosted at rOpenSci, the fact that they have allowed this (Leadership!!), and that you'd like to contribute a blog post. Thank you!!

One idea I had was to outline all that I learned. One of the things that I love about R is that no matter what your skill level is, your input can always provide value to someone who is just learning that particular skill. I learned 5 or 6 really useful things when making this package that could be wrapped up into a blog post.

This is 💯 a great theme for the post. As Scott said, the community would really appreciate and benefit from this perspective. It shows the value you gained by doing something (submit to review and incorporate the feedback) that takes more work on your part than just using the code and moving on to the next thing on your to-do list.

Would also be great to give a shoutout to BC Gov't (your dept) for showing leadership in submitting code for review and making it open. Feels like a good PR opportunity for your department (but I'm an idealist).

Here are editorial and technical details for submitting a draft post: https://github.com/ropensci/roweb2#contributing-a-blog-post.

What do you think of Nov 28 deadline for submitting draft? We would publish Dec 5 or 12. Ping me when draft is submitted and I'll review it and give feedback.

Stefanie Butland · Answer 30 · Wed Nov 08 2017 02:32:37 GMT+0800 (China Standard Time)

regarding publicizing, I tweet about new blog post from @ropensci when it goes up

Scott Chamberlain · Answer 31 · Thu Nov 09 2017 01:53:22 GMT+0800 (China Standard Time)

@boshek for JOSS thing, can you

generate a new release with a DOI. this doesn't mean you have to submit to CRAN of course
Submit to JOSS at http://joss.theoj.org/papers/new

Sam Albers · Answer 32 · Thu Nov 16 2017 05:37:05 GMT+0800 (China Standard Time)

What do @sckott , @lawinslow , @ldecicco-USGS feel about having their names listed on this package as reviewers a la: https://github.com/ropensci/rrricanes/blob/cbb39aecec6dd82b78c80a81a72408e8c177de77/DESCRIPTION#L10-L12

Any objection?

Scott Chamberlain · Answer 33 · Thu Nov 16 2017 05:44:39 GMT+0800 (China Standard Time)

thx, but no need to include me @boshek - laura and luke did the lion's share

Noam Ross · Answer 34 · Sat Dec 16 2017 00:17:43 GMT+0800 (China Standard Time)

@boshek Question: I see that you put this on CRAN with the rev category for your contributors. Did you get a CRAN notes when doing so, or did you build and submit using r-devel? We've been quietly encouraging this but not too much because there were some snafus with CRAN submission. If it went through with no issues that's a nice milestone.

Sam Albers · Answer 35 · Sat Dec 16 2017 01:46:08 GMT+0800 (China Standard Time)

@noamross No snafus! Some UTF-8 string notes that I haven't figured out how to deal with yet but the rev category worked fine. I included the link to the onboarding issue but I think that I would omit that as it quite ugly on the CRAN package page.

Sam Albers · Answer 36 · Sat Dec 16 2017 02:49:19 GMT+0800 (China Standard Time)

Removing the links is particularly important because I actually linked to wrong issue. GAH!

Sam Albers · Answer 37 · Thu Dec 21 2017 01:12:46 GMT+0800 (China Standard Time)

@noamross @sckott Any ideas on why the rOpenSci review badge shows "Peer reviewed" as the badge but when rendered in the tidyhydat README still shows "Under Review'? Also renders locally displaying the "Peer Reviewed" badge and over at CRAN. Thoughts?

Karthik Ram · Answer 38 · Thu Dec 21 2017 01:21:02 GMT+0800 (China Standard Time)

@boshek I suspect that might have to do with GitHub caching? I've set the badges to expire immediately. I'll look into this and get back to you.

Karthik Ram · Answer 39 · Thu Dec 21 2017 01:21:41 GMT+0800 (China Standard Time)

Looking over your README, it does show as peer reviewed to me as well.

Sam Albers · Answer 40 · Thu Dec 21 2017 01:24:00 GMT+0800 (China Standard Time)

@karthik Yep you are right. Caching was the issue. Thanks.