ebassi / xdg-mime-rs

Rust crate for querying the shared-mime-info database

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HTML files without doctype detected as plain text, despite extension

chmln opened this issue · comments

Hi @ebassi

Here's a reproduction of the bug.

test.html: (incorrect: mime guess is text/plain despite html extension)

<p>test</p>

test_doctype.html (correct: mime guess is text/html)

<!DOCTYPE html>
asdf

Thanks for your patience; I'll have a look as soon as I can.

If I had to venture a guess, I'd say that test.html with some XML into it ends up matching some rule, and thus the extension gets ignored.

In the meantime, you can always start from a pure file-based guess, and use the content-based one only if the guess result is uncertain.

is there any plan to address this issue @ebassi ?

Not really; as I said: an HTML file is not defined to be some plain text file with XML markup thrown in. If you pass <p>foo</p> then the extension takes less of a precedence over some other rule that will look into the file contents.

The appropriate algorithm if you have a file name is:

  • check if the extension has a high confidence match
  • if you have some data, check if there's a high confidence match
  • if the two matches disagree, you will need to figure something out in your code—like presenting a choice of applications to the user

File names and extensions lie all the time: there's no way to rely on something just because of what it says it is.