voku / simple_html_dom

📜 Modern Simple HTML DOM Parser for PHP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parsing json variable in <script> leaves $domReplaceHelper values

ruud-altenburg opened this issue · comments

What is this feature about (expected vs actual behaviour)?

I'm parsing a <script> variable containing json. Characters in $domReplaceHelper apparently are replaced when the page is parsed but not restored when the data is returned.

How can I reproduce it?

See example.txt for a real world script plus my (slightly crude) code to extract the data.

Does it take minutes, hours or days to fix?

I suppose minutes.

Any additional information?

Forgot to add that the example returns "RBR Holt 00626 SIMPLE_HTML_DOM__VOKU__AMP RBR Holt 00732".

here I added a test-case for your problem: 2e65479#diff-f9e35e3ee28495a595a36e0f7a4ae154R1454

The main problem here is that we need to use special internal encoding, to keep the input encoding, but we need to decode this internal encoding via HtmlDomParser->fixHtmlOutput()