voku / simple_html_dom

📜 Modern Simple HTML DOM Parser for PHP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Selector support for DOMText is missing

heldchen opened this issue · comments

What is this feature about (expected vs actual behaviour)?

in the original simple_html_dom it is possible to use text in a css selector to get the DOMText element (i.e. xpath's text() equivalent) back. in voku/simple_html_dom this unfortunately fails.

How can I reproduce it?

$html = '<div> foo <br /> bar </div>';

$dom = (new voku\helper\HtmlDomParser())->loadHtml($html);
var_dump($dom->find('div text', 0)->plaintext);

$dom = (new simple_html_dom())->load($html);
var_dump($dom->find('div text', 0)->plaintext);

output:

string '' (length=0)
string ' foo ' (length=5)

Does it take minutes, hours or days to fix?

hours

Any additional information?

there seems to be already some sort of support for selecting the text node, just not in combination with a css selector:

$html = '<div> foo <br /> bar </div>';

$dom = (new voku\helper\HtmlDomParser())->loadHtml($html);
var_dump($dom->find('div', 0)->find('text', 0)->plaintext);

$dom = (new simple_html_dom())->load($html);
var_dump($dom->find('div', 0)->find('text', 0)->plaintext);

output:

string 'foo' (length=3)
string ' foo ' (length=5)

I think css selector support could be added by checking if the last token in the css selector is text, and if so stripping it from the selector, then applying the existing //text() xpath replacement on the nodes result set. having text appear at any other place of the selector does not make much sense. it does require a bit of logic though, as the css selector after all could be using multiple targets (i.e. ->find('div text, span text', 0))

that said, unfortunately the current ->find('text') implementation behaves a bit weird as it's trimming the white space which more often than not is an important part when explicitly looking for text nodes.