jacobbien / litr-project

Writing R Packages with Literate Programming

Home Page:https://jacobbien.github.io/litr-project/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Set up github actions

jacobbien opened this issue · comments

Make it so that when a user pushes create-pkg.Rmd (and any accompanying files source files), it gets litr-knitted, generating both create-pkg.html and pkg/.

I've been working on a github action that should be run on the push of a new change to the create-pkg.Rmd file. I wanted to run the structure by you and check whether I am missing anything. This version of the workflow adds an additional commit with the re-built create-pkg.html and pkg/ directory.

This workflow is derived from the render-rmarkdown workflow from r-lib. The first part here specifies that we only want this to be run when there are changes to the create-pkg.Rmd file

# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
  push:
    paths: ['create-*.Rmd', 'source-files/**']

name: render-litr

This second part sets up R and pandoc for knitting the file. The most important change here is that we install litr after installing knitr and rmarkdown. We might not need to install knitr and rmarkdown first since they are dependencies for litr.

jobs:
  render-litr:
    runs-on: ubuntu-latest
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3
        with:
          fetch-depth: 0

      - uses: r-lib/actions/setup-pandoc@v2

      - uses: r-lib/actions/setup-r@v2

      - uses: r-lib/actions/setup-renv@v2
      
      - name: Install litr and dependencies
        run: |
          Rscript -e 'install.packages(c("knitr", "rmarkdown"))'
          Rscript -e 'remotes::install_github("jacobbien/litr-project@*release", subdir = "litr")'

This last part is the meat of the workflow. I've kept in the debugging printing since I still need to test this on a repo. TOP_LEVEL_RMD uses find to find the create-pkg.Rmd file and we allow for different capitalizations of Rmd just to be safe. RMD_FILENAME strips the file extension to get create-pkg and PKG_NAME pulls out pkg from create-pkg.

RMD_PATH looks for our top level Rmd file in the git changes and then we call litr::render on that file.

We then call litr::render on the create-pkg.Rmd file and commit the new versions of the html output and the package directory.

      - name: Render Rmarkdown files and Commit Results
        run: |
          TOP_LEVEL_RMD=($(find . -maxdepth 1 -type f  -regex '.*create-.*\.[Rr][Mm][Dd]' -exec basename {} \;))
          echo "$TOP_LEVEL_RMD"
          RMD_FILENAME=${TOP_LEVEL_RMD/.*/}
          PKG_NAME=$(echo "$RMD_FILENAME" | rev | cut -d- -f1 | rev)
          echo "$PKG_NAME"
          RMD_PATH=($(git diff --name-only ${{ github.event.before }} ${{ github.sha }} | grep "$TOP_LEVEL_RMD"))
          echo "$RMD_PATH"
          Rscript -e 'for (f in commandArgs(TRUE)) if (file.exists(f)) litr::render(f)' ${RMD_PATH[*]}
          git config --local user.name "$GITHUB_ACTOR"
          git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com"
          git commit "${RMD_FILENAME}.html" "${PKG_NAME}/"  -m 'Re-build Litr file and package' || echo "No changes to commit"
          git push origin || echo "No changes to commit"

This only handles the case of a single .Rmd file and we will likely need to create a different workflow for packages that use bookdown.

Next step is to try this out on a minimal example and iron out any syntactical issues that come up.

This looks great! To litr-knit a bookdown, one runs litr::render("index.Rmd"), so potententially the only difference would be that instead of looking for create-pkg.Rmd we would look for index.Rmd?

Good point about the change needed for bookdown. I've created a proof of concept github actions repo here: https://github.com/patrickvossler18/litr_test

The biggest issue right now is figuring out how to cache R packages properly to decrease the run time. It takes ~20 minutes to install litr and all of its dependencies. This might not be a big issue if we don't mind there being a delay between the push of the new Rmd file and the addition of the output file and the package directory

Thanks @patrickvossler18 ! Oh yeah, having to wait 20 minutes would not be good. In terms of speeding this up, I guess you've seen this or equivalent?

Great that caching is working! Current times appear to be at most about 2 minutes.

Additional thoughts:

  • If anything in source-files/ or any top-level .Rmd files are modified, then this should trigger rendering. Should we have a .gitignore that ignores all files other than these?
  • Should outputs of the Github action be put onto main branch or onto a separate branch? (Is there a standard convention for where downstream outputs go?)
  • Currently only the .html and the package itself are pushed, but I think we want all files that were created to be pushed. (I'm thinking about when a _book/ and a docs/ with a pkgdown site are create.)
  • It'd be great to write a vignette for litr that describes how to setup Github actions with litr.

Another thought:

  • Should devtools::test() be run on the package after it is created?
* [ ]  If anything in `source-files/` or any top-level .Rmd files are modified, then this should trigger rendering.  Should we have a .gitignore that ignores all files other than these?

Hmm this might be a good idea. Git allows for branch-specific .gitignore files so we can have a .gitignore.main file and a general .gitignore file for the build_output branch. The .gitignore.main would look something like:

*
!*.Rmd
!source-files/*

where the * tells git to ignore everything and the ! tells git to not ignore these files.

* [ ]  Should outputs of the Github action be put onto main branch or onto a separate branch?  (Is there a standard convention for where downstream outputs go?)

I haven't been able to find other repos where they are trying to store the outputs of an action in the repo. Typically action outputs (test results, etc.) are stored as artifacts that can be downloaded but aren't version-controlled.

I’m trying out a setup where the build package is pushed to a build_output branch so if we wanted, the main branch would only contain the Rmd file and workflow yaml file as shown here: https://github.com/patrickvossler18/litr_test/tree/main (we could keep other folders and files needed for building litr on the main branch like source-files/, etc.). The drawback of this approach is that I need to force push to the build_output branch but there might be a better way to do this? EDIT: also implemented this with our bookdown test example here: https://github.com/patrickvossler18/litr_bookdown_test/tree/main

* [ ]  Currently only the .html and the package itself are pushed, but I think we want all files that were created to be pushed.  (I'm thinking about when a `_book/` and a `docs/` with a pkgdown site are created.)

I've generalized the script to handle the _book/ case but I will need to think about how to handle the possibility of _book/ or docs/

* [ ]  Should `devtools::test()` be run on the package after it is created?

I added in running devtools::test. Do we want it to stop on failure?

This is great!! I tried it out on my fork and it mostly worked but then I got an error at this step:

Run git config --local user.name "$GITHUB_ACTOR"
  
Switched to a new branch 'build_output'
fatal: the requested upstream branch 'origin/build_output' does not exist
hint: 
hint: If you are planning on basing your work on an upstream
hint: branch that already exists at the remote, you may need to
hint: run "git fetch" to retrieve it.
hint: 
hint: If you are planning to push out a new local branch that
hint: will track its remote counterpart, you may want to use
hint: "git push -u" to set the upstream config as you push.
hint: Disable this message with "git config advice.setUpstreamFailure false"
Error: Process completed with exit code 128.

Ugh, so close! Here is a fixed version of the action file:

# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
  push:
    # this should run on any changes to *.Rmd files and any file in source-files
    paths: ['*.Rmd', 'source-files/**']

name: render-litr

jobs:
  render-litr:
    runs-on: ubuntu-latest
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3
        with:
          fetch-depth: 0

      - uses: r-lib/actions/setup-pandoc@v2

      - uses: r-lib/actions/setup-r@v2

      - name: Install ubuntu dependencies avoid installation errors
        run: sudo apt-get install libcurl4-openssl-dev libharfbuzz-dev libfribidi-dev
      
      - uses: r-lib/actions/setup-renv@v2
        with: 
          cache-version: 2

      - name: Install litr
        run: renv::install("jacobbien/litr-project:litr@*release")
        shell: Rscript {0}

      - name: Set environment variables
        run: |
          IS_BOOKDOWN=false
          # find doesn't allow capture groups so we'll have to separate searches
          SEARCH_FOR_INDEX=($(find . -maxdepth 1 -type f  -regex '.*index\.[Rr][Mm][Dd]' -exec basename {} \;))
          
          # if SEARCH_FOR_INDEX is not an empty string
          if [[ -n  "$SEARCH_FOR_INDEX" ]]; then
            # if the file name is "index" then we need to look for the package name elsewhere
            # we will look for the *-package.R file and get the package name
            IS_BOOKDOWN=true
            TOP_LEVEL_RMD=$(echo "$SEARCH_FOR_INDEX")
          else
            TOP_LEVEL_RMD=($(find . -maxdepth 1 -type f  -regex '.*create-.*\.[Rr][Mm][Dd]' -exec basename {} \;))
            RMD_FILENAME=${TOP_LEVEL_RMD/.*/}
            PKG_NAME=$(echo "$RMD_FILENAME" | rev | cut -d- -f1 | rev)  
          fi
          
          RMD_PATH=($(git diff --name-only ${{ github.event.before }} ${{ github.sha }} | grep "$TOP_LEVEL_RMD"))

          # add variables to GH env to access them in the next step
          echo "RMD_PATH=$RMD_PATH" >> $GITHUB_ENV
          echo "RMD_FILENAME=$RMD_FILENAME" >> $GITHUB_ENV
          echo "PKG_NAME=$PKG_NAME" >> $GITHUB_ENV
          echo "TOP_LEVEL_RMD=$TOP_LEVEL_RMD" >> $GITHUB_ENV
          echo "IS_BOOKDOWN=$IS_BOOKDOWN" >> $GITHUB_ENV

      - name: Render Rmarkdown files and Generate Package 
        run: |
          Rscript -e 'for (f in commandArgs(TRUE)) if (file.exists(f)) litr::render(f)' ${RMD_PATH[*]}
          
      - name: Run tests on the knitted package 
        run: |
          # For bookdown, get the package name after everything has been rendered
          if [[ $IS_BOOKDOWN = true ]]; then
            RMD_FILENAME=($(find . -type f -regex '.*-package\.R' -exec basename {} \;))
            PKG_NAME=$(echo "$RMD_FILENAME" | cut -d- -f1 )
            echo "PKG_NAME=$PKG_NAME" >> $GITHUB_ENV
          fi
          Rscript -e 'args = commandArgs(TRUE); pkg_path = paste0(args[1], "/"); devtools::test(pkg_path)' ${PKG_NAME}

      - name: Commit Results
        run: |
          git config --local user.name "$GITHUB_ACTOR"
          git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com"
          git config --local push.autoSetupRemote true
          git fetch --all
          git checkout -B build_output
          if [[ $IS_BOOKDOWN = true ]]; then
            git add ${PKG_NAME}/ _book/ 
          else
            # TODO: check that this won't add the .Rmd file
            git add ${RMD_FILENAME}.* ${PKG_NAME}/
          fi
          git commit -m 'Re-build Litr file and package' || echo "No changes to commit"
          git push -fu origin build_output || echo "No changes to commit"

I set up a clean repo at https://github.com/patrickvossler18/litr_bookdown_test_clean/ and got everything to work properly so hopefully this fixed version will work for you too