voku / simple_html_dom

📜 Modern Simple HTML DOM Parser for PHP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"Not valid HTML fragment!" on default WordPress theme HTML

Inclushe opened this issue · comments

What is this feature about (expected vs actual behaviour)?

I was using @patrickposner's simply-static WordPress plugin and came across a bug when generating static files through his plugin using the 'twentytwentytwo' theme. You can see the related issue here: Simply-Static/simply-static#27
@patrickposner believed it was an issue with the theme not producing valid HTML according to W3, but when you remove the <style id='wp-block-image-inline-css'>...</style> tag, the static file generation works. Also, the CSS inside of the style tag is valid, according to W3's CSS validator (you'll need to plug it in).

How can I reproduce it?

I created this repository with the bare minimum code that reproduces this bug: https://github.com/Inclushe/voku-simple-html-dom-style-bug
test.html comes from https://twentytwentytwodemo.wordpress.com/

It should show this error when run:

Fatal error: Uncaught RuntimeException: Not valid HTML fragment!
.wp-block-image%7Bmargin%3A001em%7D.wp-block-imageimg%7Bvertical-align%3Abottom%7D.wp-block-image%3Anot%28.is-style-rounded%29%3Ea%2C.wp-block-image%3Anot%28.is-style-rounded%29img%7Bborder-radius%3Ainherit%7D.wp-block-image.aligncenter%7Btext-align%3Acenter%7D.wp-block-image.alignfullimg%2C.wp-block-image.alignwideimg%7Bheight%3Aauto%3Bwidth%3A100%25%7D.wp-block-image.aligncenter%2C.wp-block-image.alignleft%2C.wp-block-image.alignright%7Bdisplay%3Atable%7D.wp-block-image.aligncenter%3Efigcaption%2C.wp-block-image.alignleft%3Efigcaption%2C.wp-block-image.alignright%3Efigcaption%7Bcaption-side%3Abottom%3Bdisplay%3Atable-caption%7D.wp-block-image.alignleft%7Bfloat%3Aleft%3Bmargin%3A.5em1em.5em0%7D.wp-block-image.alignright%7Bfloat%3Aright%3Bmargin%3A.5em0.5em1em%7D.wp-block-image.aligncenter%7Bmargin-left%3Aauto%3Bmargin-right%3Aauto%7D.wp-block-imagefigcaption%7Bmargin-bottom%3A1em%3Bmargin-top%3A.5em%7D.wp-block-image.is-style-circle-maskimg%2C.wp-block-imag in C:\Users\ejw98\Projects\voku-simple-html-dom-style-bug\vendor\voku\simple_html_dom\src\voku\helper\SimpleHtmlDom.php on line 196

Removing the <style id='wp-block-image-inline-css'>...</style> tag from test.html and running the script again produces no errors.

PHP Version: 7.4.27

Does it take minutes, hours or days to fix?

No clue.

Any additional information?

Nope.

Thanks for the bug report, the problem is that php do not support html5 or svg by default, so I added one more hack.

Almost!

Getting the same error in a different place:

[2022-02-25 04:25:40] Error: (1) Uncaught RuntimeException: Not valid HTML fragment! .wp-block-image%7Bmargin%3A001em%7D.wp-block-imageimg%7Bvertical-align%3Abottom%7D.wp-block-image%3Anot%28.is-style-rounded%29%3Ea%2C.wp-block-image%3Anot%28.is-style-rounded%29img%7Bborder-radius%3Ainherit%7D.wp-block-image.aligncenter%7Btext-align%3Acenter%7D.wp-block-image.alignfullimg%2C.wp-block-image.alignwideimg%7Bheight%3Aauto%3Bwidth%3A100%25%7D.wp-block-image.aligncenter%2C.wp-block-image.alignleft%2C.wp-block-image.alignright%7Bdisplay%3Atable%7D.wp-block-image.aligncenter%3Efigcaption%2C.wp-block-image.alignleft%3Efigcaption%2C.wp-block-image.alignright%3Efigcaption%7Bdisplay%3Atable-caption%3Bcaption-side%3Abottom%7D.wp-block-image.alignleft%7Bfloat%3Aleft%3Bmargin%3A.5em1em.5em0%7D.wp-block-image.alignright%7Bfloat%3Aright%3Bmargin%3A.5em0.5em1em%7D.wp-block-image.aligncenter%7Bmargin-left%3Aauto%3Bmargin-right%3Aauto%7D.wp-block-imagefigcaption%7Bmargin-top%3A.5em%3Bmargin-bottom%3A1em%7D.wp-block-image.is-style-circle-maskimg%2C.wp-block-image.is-style-roundedimg%7Bborder-radius%3A9999px%7D%40supports%28%28-webkit-mask-image%3Anone%29or%28mask-image%3Anone%29%29or%28-webkit-mask-image%3Anone%29%7B.wp-block-image.is-style-circle-maskimg%7B-webkit-mask-image%3Aurl%28%27data%3Aimage%2Fsvg+xml%3Butf8%2C____simple_html_dom__voku__broken_html____197036189%27%29%3Bmask-image%3Aurl%28%27data%3Aimage%2Fsvg+xml%3Butf8%2C____simple_html_dom__voku__broken_html____197036189%27%29%3Bmask-mode%3Aalpha%3B-webkit-mask-repeat%3Ano-repeat%3Bmask-repeat%3Ano-repeat%3B-webkit-mask-size%3Acontain%3Bmask-size%3Acontain%3B-webkit-mask-position%3Acenter%3Bmask-position%3Acenter%3Bborder-radius%3A0%7D%7D.wp-block-imagefigure%7Bmargin%3A0%7D.wp-block-imagefigcaption%7Bcolor%3A%23555%3Bfont-size%3A13px%3Btext-align%3Acenter%7D.is-dark-theme.wp-block-imagefigcaption%7Bcolor%3Ahsla%280%2C0%25%2C100%25%2C.65%29%7D .wp-block-image%7Bmargin%3A001em%7D.wp-block-imageimg%7Bvertical-align%3Abottom%7D.wp-block-image%3Anot%28.is-style-rounded%29%3Ea%2C.wp-block-image%3Anot%28.is-style-rounded%29img%7Bborder-radius%3Ainherit%7D.wp-block-image.aligncenter%7Btext-align%3Acenter%7D.wp-block-image.alignfullimg%2C.wp-block-image.alignwideimg%7Bheight%3Aauto%3Bwidth%3A100%25%7D.wp-block-image.aligncenter%2C.wp-block-image.alignleft%2C.wp-block-image.alignright%7Bdisplay%3Atable%7D.wp-block-image.aligncenter%3Efigcaption%2C.wp-block-image.alignleft%3Efigcaption%2C.wp-block-image.alignright%3Efigcaption%7Bdisplay%3Atable-caption%3Bcaption-side%3Abottom%7D.wp-block-image.alignleft%7Bfloat%3Aleft%3Bmargin%3A.5em1em.5em0%7D.wp-block-image.alignright%7Bfloat%3Aright%3Bmargin%3A.5em0.5em1em%7D.wp-block-image.aligncenter%7Bmargin-left%3Aauto%3Bmargin-right%3Aauto%7D.wp-block-imagefigcaption%7Bmargin-top%3A.5em%3Bmargin-bottom%3A1em%7D.wp-block-image.is-style-circle-maskimg%2C.wp-block-image.is-style-roundedimg%7Bborder-radius%3A9999px%7D%40supports%28%28-webkit-mask-image%3Anone%29or%28mask-image%3Anone%29%29or%28-webkit-mask-image%3Anone%29%7B.wp-block-image.is-style-circle-maskimg%7B-webkit-mask-image%3Aurl%28%27data%3Aimage%2Fsvg+xml%3Butf8%2C%3Csvgviewbox%3D%2200100100%22xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%3E%3Ccirclecx%3D%2250%22cy%3D%2250%22r%3D%2250%22%3E%3C%2Fsvg%3E%27%29%3Bmask-image%3Aurl%28%27data%3Aimage%2Fsvg+xml%3Butf8%2C%3Csvgviewbox%3D%2200100100%22xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%3E%3Ccirclecx%3D%2250%22cy%3D%2250%22r%3D%2250%22%3E%3C%2Fsvg%3E%27%29%3Bmask-mode%3Aalpha%3B-webkit-mask-repeat%3Ano-repeat%3Bmask-repeat%3Ano-repeat%3B-webkit-mask-size%3Acontain%3Bmask-size%3Acontain%3B-webkit-mask-position%3Acenter%3Bmask-position%3Acenter%3Bborder-radius%3A0%7D%7D.wp-block-imagefigure%7Bmargin%3A0%7D.wp-block-imagefigcaption%7Bcolor%3A%23555%3Bfont-size%3A13px%3Btext-align%3Acenter%7D.is-dark-theme.wp-block-imagefigcaption%7Bcolor%3Ahsla%280%2C0%25%2C100%25%2C.65%29%7D in C:\MAMP\htdocs\fresh-install-with-composer-update\wp-content\plugins\simply-static\vendor\voku\simple_html_dom\src\voku\helper\SimpleHtmlDom.php:199 Stack trace: #0 C:\MAMP\htdocs\fresh-install-with-composer-update\wp-content\plugins\simply-static\vendor\voku\simple_html_dom\src\voku\helper\AbstractSimpleHtmlDom.php(160): voku\helper\SimpleHtmlDom->replaceChildWithString('.wp-block-image...', false) #1 C:\MAMP\htdocs\fresh-install-with-composer-update\wp-content\plugins\simply-static\src\class-ss-url-extractor.php(289): voku\helper\AbstractSimpleHtmlDom->__set('innerhtmlkeep', '.wp-block-image...') #2 C:\MAMP\htdocs\fresh-install-with-composer-update\wp-content\plugins\simply-static\src\class-ss-url-extractor.php(157): Simply_Static\Url_Extractor->extract_and_replace_urls_in_html() #3 C:\MAMP\htdocs\fresh-install-with-composer-update\wp-content\plugins\simply-static\src\tasks\class-ss-fetch-urls-task.php(129): Simply_Static\Url_Extractor->extract_and_update_urls() #4 C:\MAMP\htdocs\fresh-install-with-composer-update\wp-content\plugins\simply-static\src\tasks\class-ss-fetch-urls-task.php(97): Simply_Static\Fetch_Urls_Task->handle_200_response(Object(Simply_Static\Page), true, true) #5 C:\MAMP\htdocs\fresh-install-with-composer-update\wp-content\plugins\simply-static\src\class-ss-archive-creation-job.php(122): Simply_Static\Fetch_Urls_Task->perform() #6 C:\MAMP\htdocs\fresh-install-with-composer-update\wp-content\plugins\simply-static\vendor\a5hleyrich\wp-background-processing\classes\wp-background-process.php(301): Simply_Static\Archive_Creation_Job->task('fetch_urls') #7 C:\MAMP\htdocs\fresh-install-with-composer-update\wp-content\plugins\simply-static\vendor\a5hleyrich\wp-background-processing\classes\wp-background-process.php(175): WP_Background_Process->handle() #8 C:\MAMP\htdocs\fresh-install-with-composer-update\wp-includes\class-wp-hook.php(307): WP_Background_Process->maybe_handle('') #9 C:\MAMP\htdocs\fresh-install-with-composer-update\wp-includes\class-wp-hook.php(331): WP_Hook->apply_filters('', Array) #10 C:\MAMP\htdocs\fresh-install-with-composer-update\wp-includes\plugin.php(474): WP_Hook->do_action(Array) #11 C:\MAMP\htdocs\fresh-install-with-composer-update\wp-admin\admin-ajax.php(202): do_action('wp_ajax_nopriv_...') #12 {main} thrown in C:\MAMP\htdocs\fresh-install-with-composer-update\wp-content\plugins\simply-static\vendor\voku\simple_html_dom\src\voku\helper\SimpleHtmlDom.php on line 199

Commenting out the check here "fixes" it, but of course it needs some tooling: https://github.com/voku/simple_html_dom/blob/3636fe85b0bdc96fc397a4718942455b86d8f986/src/voku/helper/SimpleHtmlDom.php#L198L204

And just to clarify, does innerhtml also uses the hack?

Never mind! I just set up a pull request: Simply-Static/simply-static#28

And just to clarify, does innerhtml also uses the hack?

innerhtml is also using the hack but if we replace some tags in the middle of the process, we need to keep these changes via innerhtmlKeep so that we still can parse the document.

PS: do you still see the Not valid HTML fragment error? Can you give me an example, thanks.

I forgot to change all instances of innertext with innerhtmlKeep in the plugin code and that fixed the problem. Already got the bug fix merged. Thanks for the assistance ✌️