ext-mbstring should be required
alecpl opened this issue · comments
Aleksander Machniak commented
Here's why:
Masterminds\HTML5\Parser\CharacterReference::lookupDecimal()
usesmb_decode_numericentity()
unconditionally.- Looking at
Masterminds\HTML5\Parser\UTF8Utils::convertToUTF8()
either iconv or mbstring must be available (if the input encoding is not 'auto').
This would allow to:
- Get rid of iconv() use. In my experience mbstring is really a better solution.
- Remove use of utf8_decode() which is not really valid and not needed when mbstring is available.
- Get rid of the fallback code.
Aleksander Machniak commented
Actually utf8_decode() is deprecated in PHP 8.2, and will be removed later. So, this is more like a bug now.
Asmir Mustafic commented
sorry for the late reply. makes sense what you are suggesting. would be happy to see a PR