create-report-index fails for URLs without a trailing slash
thibaudcolas opened this issue · comments
Some of the logic in create-report-index
does not support URLs without a trailing slash. I spotted this while testing another site, and it doesn’t affect any of the URLs currently being monitored on consumerfinance.gov, so feel free to discard this issue if irrelevant.
Current behavior
If you test a URL like /news/news.php?feature=7790
, the Lighthouse auditing works as expected, but create-report-index
will fail with:
2021-01-04T15:01:15.332Z error: TypeError: Cannot read property '1' of null
TypeError: Cannot read property '1' of null
at /cfgov-lighthouse/scripts/lib/reports.js:81:25
at Array.map (<anonymous>)
at processManifestRuns (/cfgov-lighthouse/scripts/lib/reports.js:76:15)
at reducer (/cfgov-lighthouse/scripts/create-reports-index.js:59:27)
at async /cfgov-lighthouse/scripts/create-reports-index.js:68:19
(see #21 for the separate issue of error reporting).
Expected behavior
No crash for all valid URLs. Again, this works as expected for all URLs currently tested in this repository – it only fails when adding new URLs that do not have the trailing slash.
Steps to replicate behavior (include URLs)
- Run an audit on a URL that does not have a trailing slash, for example
https://www.jpl.nasa.gov/news/news.php?feature=7790
. This will create a report filename ofwww_jpl_nasa_gov-_news_news_php-2021_01_04_14_58_51.report.json
. - Run
create-report-index
Looking at the code, I can see the issue comes from logic that extracts the report’s slug and date from the filename:
cfgov-lighthouse/scripts/lib/reports.js
Lines 75 to 81 in 0df4053
The regex assumes slugs end with _-
. It won’t if there is no trailing slash (www_jpl_nasa_gov-_news_news_php-2021_01_04_14_58_51.report.json
).
For my case I decided to fix this by changing the report filename pattern so there is a more predictable separator. There might be other viable approaches. Here is the relevant part of my lighthouserc.js
:
module.exports = {
ci: {
/* […] */
upload: {
target: 'filesystem',
outputDir: path.join(REPORTS_ROOT, timestamp),
reportFilenamePattern:
'%%HOSTNAME%%-%%PATHNAME%%___%%DATETIME%%.report.%%EXTENSION%%',
},
},
}
You’d then need to update the corresponding logic to match that ___
separator.
Additionally to this trailing slash issue, I think there is also a problem with the "form factor" logic for URLs that contain a query string already. Just like the above, this isn’t an issue with URLs currently tested in the repository – I only stumbled upon this while testing another site / reviewing the code.
The code that generates URLs to test by Lighthouse correctly handles the query string and generates the appropriate URL, since it uses the URL
interface rather than processing URLs as strings. Here is the generated URL for the Lighthouse logs:
Running Lighthouse 3 time(s) on https://www.jpl.nasa.gov/news/news.php?feature=7790&mobile=1
You can see that this doesn’t contain the ?
expected by the report index code:
cfgov-lighthouse/scripts/lib/reports.js
Lines 86 to 88 in 0df4053
I imagine it would work to use the URL
interface here as well to remove the query parameter / check for its presence regardless of where it is in the query string.
Thanks for reporting these @thibaudcolas! We really appreciate it and it's nice to see others trying to use this code. I'll give your suggestions a shot and open some PRs to address.
Lovely, let me know if further details would help.
@thibaudcolas I've opened #24 to address the first issue you mention -- would this solution work for you?
Indeed, at least based on the code only this looks like it would do just as well, and is much simpler!
#24 above has been merged, addressing the title of this issue.
@thibaudcolas I've opened #25 to address the second issue you discovered about query strings in tested URLs. Please give it a try if you get a chance!