Question - setting up html proofer to skip new pages added to a website which will return a 404
lwasser opened this issue Β· comments
hey there π Happy early new year!
i've been trying to understand how to implement html-proofer so it ignores new files (that have new links that are not online yet). these files of course return a 404.
i found this in the readme file but i don't understand how i'd implement that approach in a github action such as this one where i'm calling the htmlproofer action
- name: Check HTML using htmlproofer
uses: chabad360/htmlproofer@master
with:
directory: "_site"
arguments: |
--ignore-urls "https://fonts.googleapis.com,https://fonts.gstatic.com,_site/_posts/README/index.html"
--ignore-files "/.+\/_posts\/README.md"
--ignore-status-codes "0,403, 429, 503, 999"
can someone provide me with some guidance so that pr's with new pages don't result in red x's in CI?
do i need to create a vanilla ruby script to run in the workflow? or can i somehow use the chabad360 workflow action but add something custom?
Many thanks!!
You have a couple of options here:
- If the new links point to known URLs, you can ignore the URLs directly:
htmlproofer --ignore-urls "/www.github.com/,/foo.com/"
- If you are able to edit the HTML directly, adding a
data-proofer-ignore
attribute to any element ignores any checks:<a href="https://notareallink" data-proofer-ignore>Not checked.</a>
- You can also explicitly ignore specific files from checks:
htmlproofer --ignore-files "/dir_of_new_files/, new_file.html"
Would any of these work for you? The GitHub Action example just collects all the new files and passes them into --ignore-files
--you can provide your own list or a directory if that's easier.
(Closing this not because I won't keep helping if you have questions, but because I like to keep a clean issue list in my repos.)
hi π thank you!! i definitely understand issue lists becoming unwieldy!
i think i may have poorly described the issue.
Essentially we are creating a new piece of content in the PR so the new link is a new page on the website. please see here: for an example (screenshot below as well). Essentially installable-code.html is a new page in our guidebook that we are adding. so every time we add a new page html proofer can't find it because it isn't online yet.
In the html-proofer readme, i see a section on ignoring new files.
the code is below and i think it's trying to parse through the files but skipping the newly added file (maybe)?:
directories = ['content']
merge_base = %x(git merge-base origin/production HEAD).chomp
diffable_files = %x(git diff -z --name-only --diff-filter=AC #{merge_base}).split("\0")
diffable_files = diffable_files.select do |filename|
next true if directories.include?(File.dirname(filename))
filename.end_with?(".md")
end.map { |f| Regexp.new(File.basename(f, File.extname(f))) }
HTMLProofer.check_directory("./output", { ignore_urls: diffable_files }).run
but
- i'm not sure how to implement this in my github action here.
- And i'm also not sure if i do implement that fix will it totally skip checking the new page for bad links and such as well?
i hope this makes more sense! Essentially each time i create a new website page, our build breaks because the new page is not yet online and as such it's a broken link according to HTML proofer (i think anyway that is what is happening). many thanks again!!