alan-turing-institute / environmental-ds-book

A computational notebook community for open environmental data science 🌎

Home Page:https://edsbook.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[NBI] River Ice Variables

Jacqui-123 opened this issue · comments

What is the notebook about?

This notebook would be about the data exploration, tidying, and analysis of ice-cycle variables calculated from the open-sourced database, the Water Survey of Canada (https://wateroffice.ec.gc.ca/)

Climate change is causing shifts in the timing and volume of river flow, ice coverage, and ice break-up. Northern boreal rivers maintain thick ice coverage every winter, and a predictable seasonal spring-break up season called freshet. This ice-break up is linked to key ecological processes: for example, it creates ice jams and floods, which replenish nearby wetlands with water and sustain rich habitat for aquatic life. In some cases, these flood events are the main source of water, and many species rely on these fresh water inputs to complete their life cycles and reproduce.

With climate change, it is unclear how the timing of freeze-up and spring break-up events is changing, and what this might mean for life downstream. Understanding historical ice cycles in northern rivers in addition to how they might be changing can help scientists, managers, and communities know how to adapt in the face of this change.

In this notebook, I will extract, tidy, explore, and clean open-sourced hydrological data, and calculate the timing and onset of freshet, the timing of freeze-up and break-up dates, and length of continuous ice coverage each year. I will calculate a Mann-Kendall statistical test, which is a non-parametric test that detects a monotonic trend (upwards or downwards) in the data. I will also visualize the data using line graphs.

Data is open-source, from the Water Survey of Canada: https://wateroffice.ec.gc.ca/
R package (tidyhydat) used to extract the data from the database: https://cran.r-project.org/web/packages/tidyhydat/index.html
R package (trend) used for change analysis: https://cran.r-project.org/web/packages/trend/index.html
Functions to calculate the ice-variables: functions created by author: https://github.com/Jacqui-123/EFlows-Project/blob/main/Eflows_FUNCTIONS.R
other packages used: tidyverse/ggplot

Data Science Component

  • Exploration
  • Preprocessing
  • Modelling
  • Post-processing
  • Other (e.g. Reproducibility):

Submission type

  • Standard
  • Special Issue
  • Other (e.g. CI2023 Reproducibility Challenge):

Programming language

  • Python
  • R
  • Julia
  • Other:

Checklist:

  • Input data, pipeline and/or model are public with license/citation
  • The proposed notebook reuses existing codebase
  • The proposed notebook uses open-source packages
  • The proposed notebook is associated to existing publication(s)

Additional information

@Jacqui-123 thanks for submitting the notebook idea. The analysis of ice cycles is fundamental to a better characterization of the impacts of climate change across boreal ecosystems. The proposed notebook is within the scope of EDS book notebooks too.

The code snippets in R to compute ice variables are also very interesting. I strongly encourage to use them in a first working version of the notebook. Reviewers will provide suggestions about the code as part of their constructive/collaborative feedback. According to their feedback and progress in the notebook, you can work on an additional (optional) section to validate the results of your scripts against any established R package if relevant.

Finally, may I ask how you are thinking to fetch the input data from the Water Survey of Canada? Which is the size of the datasets? Is there any particular R package to fetch dataframes from remote locations? For instance, most of python notebooks in EDS book use the pooch package (see here) to retrieve datasets from remote sources.

Thank you for the comment, @acocac. Yes, I will submit the functions I have calculated with the notebook for feedback, that will be much appreciated.

As for fetching the input data, there is an R package called tidyhydat: https://cran.r-project.org/web/packages/tidyhydat/index.html. Users download the relatively small SQLite database to the same location as the R script, and then use the functions built in the tidyhydat package to fetch tidied data.

However, for this notebook, it's not necessary to have all of the 400+ water stations available, and only a few will be enough. I could download a few example stations, but I'm not sure where or how to host this sample dataset. Do you have any ideas? It doesn't seem like pooch is available in R. I'm sorry that I don't have a simple answer.

Thank you for your feedback and questions!

Thank you for the comment, @acocac. Yes, I will submit the functions I have calculated with the notebook for feedback, that will be much appreciated.

Awesome. One of the main goals of EDS book is to improve skills in software writing, so I hope reviewers can provide feedback on the custom functions.

As for fetching the input data, there is an R package called tidyhydat: https://cran.r-project.org/web/packages/tidyhydat/index.html. Users download the relatively small SQLite database to the same location as the R script, and then use the functions built in the tidyhydat package to fetch tidied data.

Great. tidyhydat looks a very well-suited package for your analysis, in this case from Water Survey of Canada data sources.

However, for this notebook, it's not necessary to have all of the 400+ water stations available, and only a few will be enough. I could download a few example stations, but I'm not sure where or how to host this sample dataset. Do you have any ideas? It doesn't seem like pooch is available in R. I'm sorry that I don't have a simple answer.

It'd be great if you can sample on-the-fly using the tidyhydat package. Otherwise, I'd suggest to double check the license of the dataset, and see if you can archive/upload a sample dataset in Zenodo. We've practised this for some previous EDS book notebooks, see for instance: https://zenodo.org/record/6567018. If you aren't familiar with Zenodo, here's a quick guide how to upload datasets. Please note you can tag Environmental Data Science book commnunity within the section of communities. This will facilitate the discovery of the input data of your notebook among EDS book readers and community.

I hope the above info is helpful.

Thanks @acocac - much appreciated ! I'll look into using Zenodo, I think that's probably fine given that the water data is open source. But I'll check the license to be sure.

But I'll check the license to be sure.

Please this is a very healthy step to do. Let me know if you need any help in creating the sample dataset. Then I think it's ok to proceed to the next step of preparing the notebook repository (:

Great, looks like the water data is approved for upload to Zenodo, as long as there is a citation:
https://wateroffice.ec.gc.ca/contactus/faq_e.html#Q18

I have been busy but will work on this next week - thank you.

Great, looks like the water data is approved for upload to Zenodo, as long as there is a citation: https://wateroffice.ec.gc.ca/contactus/faq_e.html#Q18

Glad to hear the data providers allow mirroring sample data in Zenodo (:

I have been busy but will work on this next week - thank you.

No worries. Feel free to post updates here at your own pace.

Just an update that I have been working on this but have been running in to some challenges with running an R kernel in Jupyter and importing data with Zenodo using R, but I will keep troubleshooting. Thank you :)

Just an update that I have been working on this but have been running in to some challenges with running an R kernel in Jupyter and importing data with Zenodo using R, but I will keep troubleshooting. Thank you :)

Thanks for the update! May I ask if the issue is when using R notebook repository template? If not, I encourage you to use it and share the link to the public repo here. This might facilitate to have a look together at the kernel and Zenodo data issues. Hope this helps (:

@Jacqui-123 I'm wondered if you have any progress on using the suggested R notebook repository template. Feel free to indicate here if you have any technical difficulties. We also welcome to open issues if the documentation how to contribute isn't clear.

Hi @acocac, thank you for checking in. Work has been a little bit crazy, but thank you for following up on this. I don't think the issue is with the R notebook repository template itself, but rather it is my own inexperience in using Jupyter notebooks and Anaconda. I didn't know that it was necessary to install packages in an Anaconda environment and run an environment before opening the notebook, etc, so it has been a bit of a learning curve for me. But it's always good to learn new things. I have been able to run a Jupyter kernel in R, and to install most of the R packages that are needed. I will keep working on this, sorry again for the delay!

@Jacqui-123 thanks for the reply. No rush, please prioritize your work/research and get back to the notebook when it's more convenient for you. It seems it'd be useful to add some instructions or link to tutorials on how to setup R packages in an Anaconda environment. I'll open an issue on this (: