voku / simple_html_dom

📜 Modern Simple HTML DOM Parser for PHP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can't remove anchor tag from within p tag?

rgbaman opened this issue · comments

What is this feature about (expected vs actual behaviour)?

I don't know if I've misunderstood how to use this or whether this is a bug.

Any additional information?

I'm trying to parse this HTML to display only the content in the p tag, stripping the a tags out completely:

<section class="section" id="section-47" data-period="#period1">
   <div class="content">
      <p>The quick brown fox
         <a class="noteRef commentary F" href="#c563672">F1</a>
         jumps over 
         <a class="noteRef commentary F" href="#c563672">F1</a> 
         the lazy
         <a class="noteRef commentary F" href="#c1844523">F5</a>
         dog.
      </p>
   </div>
</section>

This is what I have tried so far with no luck and by following the examples and API docs:

$html = HtmlDomParser::file_get_html($url); // Gets full HTML - above is just a snippet.

foreach ($html->find('section .content p') as $section) {
   foreach($section->find('p a') as $a) {
      $a->outertext = '';
   }
   $content[] = $section->save;
}

My result is a blank. I have tried following the example but adding parenthesis to the save $content[] = $section->save(); throws an error:

BadMethodCallException
Method does not exist

I'm using Laravel.

$templateHtml = '
<div>
    <section class="section" id="section-46" data-period="#period1">
       <div class="content">
          <p>The quick brown fox
             <a class="noteRef commentary F" href="#c563672">F1</a>
             jumps over 
             <a class="noteRef commentary F" href="#c563672">F1</a> 
             the lazy
             <a class="noteRef commentary F" href="#c1844523">F5</a>
             dog.
          </p>
       </div>
    </section>
    <section class="section" id="section-47" data-period="#period1">
       <div class="content">
          <p>The quick brown bear
             <a class="noteRef commentary F" href="#c563672">F2</a>
             jumps over 
             <a class="noteRef commentary F" href="#c563672">F1</a> 
             the lazy
             <a class="noteRef commentary F" href="#c1844523">F6</a>
             cat.
          </p>
       </div>
    </section>
</div>
';

// remove: "<a>" from "<p>"
$htmlTmp = HtmlDomParser::str_get_html($templateHtml);
foreach ($htmlTmp->find('a.noteRef') as $a) {
    $a->outertext = '';
}

$templateHtml = $htmlTmp->save();

// dump contents
echo $templateHtml;

result:

<div>
    <section class="section" id="section-46" data-period="#period1">
       <div class="content">
          <p>The quick brown fox
             
             jumps over 
              
             the lazy
             
             dog.
          </p>
       </div>
    </section>
    <section class="section" id="section-47" data-period="#period1">
       <div class="content">
          <p>The quick brown bear
             
             jumps over 
              
             the lazy
             
             cat.
          </p>
       </div>
    </section>
</div>

Thank you for your help. Am I right in thinking you cannot use this on a remote URL without converting it to a string and storing it in a variable first?

HtmlDomParser::file_get_html($url)

should also work

Thank you very much for your help, I've now got it working with file_get_html($url).

I'm not sure if you have a support forum as I had another question but I didn't want to jam up the issues log. Is there anywhere I can post it?

Yust use this issue tracker and write e.g. "Question:" in the subject.

Hint 💡: if you look for good examples, a good placec to look at, are always the test directory