rushmorem / publicsuffix

An implementation of Mozilla's Public Suffix List in Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trivial case does not parse correctly

dead10ck opened this issue · comments

I hit a case where cdn.fbsbx.com does not parse correctly:

let domain = dbg!(list.parse_domain("cdn.fbsbx.com")).unwrap();
assert_eq!(Some("com"), domain.suffix());

This outputs:

[src/main.rs:46] list.parse_domain("cdn.fbsbx.com") = Ok(
    Domain {
        full: "cdn.fbsbx.com",
        typ: None,
        suffix: None,
        registrable: None,
    },
)
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `Some("com")`,
 right: `None`', src/main.rs:47:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

The same thing happens with fbsbx.com, foo.cdn.fbsbx.com, foo.fbsbx.com, etc. The closest thing in the public suffix list I can find is apps.fbsbx.com, so I'm not sure what's happening.

I believe I see where the issue is:

In Domain::find_match, when looking for the suffix of the input domain, it will keep traversing the tree until it encounters a label that is not in the chain, and then check if the node it stopped at is a "leaf." If not, it assumes that there was no suffix in the input. This will lead to incorrect results in the case of any subdomain of fbsbx.com. For example, the chain for apps.fbsbx.com looks like:

com → fbsbx → apps

so when it receives cdn.fbsbx.com, it will first find com:

com   → fbsbx → apps
(com)

then fbsbx

com   → fbsbx       → apps
        (fbsbx.com)

Then it will stop at the fbsbx node, because the next label was not apps. Since fbsbx.com is not a suffix, it is not a "leaf."

I believe in order to be correct, you must track the longest suffix you've seen as you traverse the tree. In this case, after the first iteration, it should see that com is a suffix and remember that we encountered a valid suffix. When it stops traversing, instead of checking the node where we stopped in the traversal, we need to check if we encountered any valid suffix along the way.