oalders / html-restrict

HTML::Restrict - Strip away unwanted HTML tags

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Breaks with formatting like a > li

youradds opened this issue · comments

commented

For some reason this breaks with:

               <a title='Contact' href='contact.html'>
                  <li>Contact</li>
               </a>

I've tried:

use HTML::Restrict;
my %rules = (
    a       => [qw( href li )],
    b       => [],
    caption => [],
    center  => [],
    em      => [],
    i       => [],
    #img     => [qw( alt border height width src style )],
    li      => [],
    ol      => [],
    p       => [],
    span    => [],
    strong  => [],
    sub     => [],
    sup     => [],
    table   => [qw()],
    tbody   => [],
    td      => [],
    tr      => [],
    u       => [],
    ul      => [],
    title   => [],
    head    => [],
    div     => [],
    meta    => [qw(name content property)],
    html    => [],
    iframe  => [qw(src)]
);

my $hr = HTML::Restrict->new( rules => \%rules, uri_schemes => [ 'undef', 'http', 'https', 'tel', 'mailto', 'li' ] );

For some reason it always outputs as:

<a>
<li>Contact</li>
</a>

Why would it do that?

Thanks

Andy

commented

You know what - scrap that. I just realised how BAD their HTML is!

  • xxx
  • isn't even valid html!!!!

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    use feature qw( say );
    
    use HTML::Restrict;
    my %rules = (
        a       => [qw( href title )],
        li     => [],
    );
    
    my $hr = HTML::Restrict->new(
        rules       => \%rules,
        uri_schemes => [ undef, 'http', 'https', 'tel', 'mailto' ]
    );
    
    my $content = q[
                   <a title='Contact' href='contact.html'>
                      <li>Contact</li>
                   </a>
                   ];

    Output:

    <a href="contact.html" title="Contact">
                      <li>Contact</li>
                   </a>
    

    A few things:

    • We need to add href to the a tag rules
    • In uri_schemes, undef needs to have the quotes removed, otherwise it's interpreted as a string with the literal content undef
    • li is not a property of a, so we can remove that as it will have no effect on the outcome
    commented

    Hi @oalders thanks for the reply :)

    Ahhh ok - so it seems to have been the 'undef' that broke it. If I keep it as 'undef', it doesn't work. As soon as I take it out of quotes, its all good and works as expected. Man now I feel stupid I didn't see that!

    Thanks again :)

    Thanks for letting me know, @youradds. I'm glad to hear you got it sorted out. :)