clipperhouse / jargon

Tokenizers and lemmatizers for Go

Home Page:https://clipperhouse.com/jargon/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Detect HTML for command line

clipperhouse opened this issue · comments

Two situations where jargon should use the HTML tokenizer (instead of plain text):

  • Fetching via the -u flag, where Content-Type starts with text/html
  • Reading a file via the -f flag, and the file extension is .html or .htm

Done, see 8daf4fb