Question - setting up html proofer to skip new pages added to a website which will return a 404

Question

Question - setting up html proofer to skip new pages added to a website which will return a 404

lwasser opened this issue 5 months ago · comments

hey there 👋 Happy early new year!

i've been trying to understand how to implement html-proofer so it ignores new files (that have new links that are not online yet). these files of course return a 404.

i found this in the readme file but i don't understand how i'd implement that approach in a github action such as this one where i'm calling the htmlproofer action

      - name: Check HTML using htmlproofer
        uses: chabad360/htmlproofer@master
        with:
          directory: "_site"
          arguments: |
            --ignore-urls "https://fonts.googleapis.com,https://fonts.gstatic.com,_site/_posts/README/index.html"
            --ignore-files "/.+\/_posts\/README.md"
            --ignore-status-codes "0,403, 429, 503, 999"

can someone provide me with some guidance so that pr's with new pages don't result in red x's in CI?
do i need to create a vanilla ruby script to run in the workflow? or can i somehow use the chabad360 workflow action but add something custom?
Many thanks!!

Garen Torikian · Answer 1 · Wed Jan 03 2024 02:24:23 GMT+0800 (China Standard Time)

You have a couple of options here:

If the new links point to known URLs, you can ignore the URLs directly: htmlproofer --ignore-urls "/www.github.com/,/foo.com/"
If you are able to edit the HTML directly, adding a data-proofer-ignore attribute to any element ignores any checks: <a href="https://notareallink" data-proofer-ignore>Not checked.</a>
You can also explicitly ignore specific files from checks: htmlproofer --ignore-files "/dir_of_new_files/, new_file.html"

Would any of these work for you? The GitHub Action example just collects all the new files and passes them into --ignore-files--you can provide your own list or a directory if that's easier.

Garen Torikian · Answer 2 · Wed Jan 03 2024 02:25:01 GMT+0800 (China Standard Time)

(Closing this not because I won't keep helping if you have questions, but because I like to keep a clean issue list in my repos.)

Leah Wasser · Answer 3 · Tue Jan 09 2024 09:48:22 GMT+0800 (China Standard Time)

hi 👋 thank you!! i definitely understand issue lists becoming unwieldy!
i think i may have poorly described the issue.

Essentially we are creating a new piece of content in the PR so the new link is a new page on the website. please see here: for an example (screenshot below as well). Essentially installable-code.html is a new page in our guidebook that we are adding. so every time we add a new page html proofer can't find it because it isn't online yet.

In the html-proofer readme, i see a section on ignoring new files.

the code is below and i think it's trying to parse through the files but skipping the newly added file (maybe)?:

directories = ['content']
merge_base = %x(git merge-base origin/production HEAD).chomp
diffable_files = %x(git diff -z --name-only --diff-filter=AC #{merge_base}).split("\0")
diffable_files = diffable_files.select do |filename|
  next true if directories.include?(File.dirname(filename))

  filename.end_with?(".md")
end.map { |f| Regexp.new(File.basename(f, File.extname(f))) }

HTMLProofer.check_directory("./output", { ignore_urls: diffable_files }).run

but

i'm not sure how to implement this in my github action here.
And i'm also not sure if i do implement that fix will it totally skip checking the new page for bad links and such as well?

i hope this makes more sense! Essentially each time i create a new website page, our build breaks because the new page is not yet online and as such it's a broken link according to HTML proofer (i think anyway that is what is happening). many thanks again!!