microformats / tests

Microformats test suite

Home Page:http://microformats.org/wiki/microformats2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

add meta charset=utf-8 to tests

kylewm opened this issue · comments

mf2py is having trouble handling test cases with "Tantek Çelik" (e.g. microformats-v2/h-event/attendees.html). It guesses windows-1252 (or ISO-8859-2 with chardet installed), but the test case outputs are written in utf-8.

One possible fix would be to add <meta charset="utf-8"> to the top of (affected? all?) tests. Would that be acceptable?

I don't think this is python-specific, but if it somehow is I can look for another workaround.

This a little bit like <base href="http://example.org"/> issue where page level context is missing for the parser because we are testing only a fragment of HTML.

People seemed unhappy for <base href="http://example.org"/> to be add to every test and I think the same will be true of <meta charset="utf-8">.

If there are just a couple of affected test do you want to add the meta tag to the top of the HTML and create a Pull Request. The current HTML parsing tools I am using do not have this issue so it hard for me to spot tests that need to be changed. If a lot of tests are affected, maybe we should look at others solutions

I have often had issues with HTML parsers been a bit of a blackbox and getting this wrong. Happen a lot in c# microformats parser I wrote. The node.js HTML parsers are pretty good at this stuff.