RedHatProductSecurity / advisory-parser

A library for parsing security advisories

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MySQL advisory parser not parsing full URL

tausif-rh opened this issue · comments

MySQL advisory parser does not pull the full URL from the advisory for external reference. It pulls "/security-alerts/cpujan2020.html" while it should be full "https://www.oracle.com/security-alerts/cpujan2020.html"

This happens because advisory parser attempts to extra URL of the main advisory / CPU page from the "Advisory" link on the "verbose" / Text Form page. That link is no longer full URL including host name. The minimal fix is to prefix that path with https://www.oracle.com.

However, I'm rather considering re-doing the URL juggling in the MySQL parser. The current approach of extracting CPU page link form the "verbose" page was required because URLs of those pages used to contain some sort of page id that wasn't predictable. So it wasn't possible to derive CPU URL form verbose page URL or the other way round without fetching the page and finding the right link there.

However, those numeric ids are gone now, and it's easy to figure out one URL form the other. Example:

https://www.oracle.com/security-alerts/cpuapr2020.html
https://www.oracle.com/security-alerts/cpuapr2020verbose.html

So my idea is to make parser accept either of the two URLs (and no longer require verbose URL), strip trailing .html or vebose.html (possibly with trailing #AppendixMSQL or #MSQL or #anything), and append again either verbose.html for URL of the page to fetch, or .html#AppendixMSQL for advisory_url.

I believe this was addressed in #16 so we can close this issue.