thunderer / Shortcode

Advanced shortcode (BBCode) parser and engine for PHP

Home Page:http://kowalczyk.cc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Strip out <p> elements

Firesphere opened this issue · comments

For example TinyMCE, wraps everything in

tags. Block elements like embedded elements, should be stripped of this tag, for HTML5 compliance.
So if the block-element is <p>[embedcode]</p>, the resulting output should be just the embedcode, without the <p> elements.

Hi @Firesphere, thanks for reporting this issue. As you said, it's TinyMCE that inserts extra <p> and this behavior can be turned off. I don't think this is something that should be done inside this library.

If you have any specific use-case that would make me reconsider that, please update the issue. If this answer is sufficient for you, just let me know and close the issue.

@Firesphere It's me again, did you resolve your issue? Do you have anything to add or can this be closed?

Hi Thunderer, sorry for getting back this late.
I agree it can be turned off in TinyMCE, but that's not really a solution to the issue at hand I think? Since turning off the p-tags in TinyMCE turns the P-tag off for everything, while it should just be for the embedded shortcode.
Ping @tractorcow or @chillu, I'm not sure which of you wrote the P-stripper for the SilverStripe embedding, but it might be useful here?

@Firesphere I agree that turning off <p> tags everywhere does not solve the issue. I just wrote a code fragment (below) that attempts to strip <p> tags from every line that contains shortcode, but this does not take into account any "edge case" such as:

  • what happens if there are multiple shortcodes in one line,
  • what if these shortcodes contain multiline content that should be (or shouldn't be) fixed,
  • what if there is any content that should be left as-is before or after the shortcode,
  • what if those unwanted tags are different, eg. <div>, how to provide customization,
  • etc.

In general this is a tricky issue that is local to the target environment, eg. will be different for TinyMCE, CKEditor, any home-cooked WYSIWYG, customised CMS, and so on. If you can get the people you pinged to point me to any working generic and customizable solution, I'll see what I can do then.

$text = '/* INPUT TEXT */';

$parser = new RegularParser();
$shortcodes = array_reverse($parser->parse($text));
/** @var $shortcode ParsedShortcodeInterface */
foreach($shortcodes as $shortcode) {
    $lineStart = strrpos(substr($text, 0, $shortcode->getOffset()), "\n");
    $shortcodeEnd = $shortcode->getOffset() + strlen($shortcode->getText());
    $lineEnd = strpos($text, "\n", $shortcodeEnd) ?: strlen($text);

    $before = substr($text, $lineStart + 1, $shortcode->getOffset() - $lineStart - 1);
    $after = substr($text, $shortcodeEnd, $lineEnd - $shortcodeEnd);

    if('<p>' === $before && '</p>' === $after) {
        $text = substr_replace($text, $shortcode->getText(), $lineStart + 1, $lineEnd - $lineStart - 1);
    }
}

Sorry, I don't know anything about <p> stripping code in tinymce. :)

Could you configure some shortcodes to have a pre-defined wrapper, e.g. <div>, to suppress the additional of non-block type container elements?

Hi @Firesphere, @tractorcow, @sminnee, @chillu, and others from @silverstripe community! I'm sorry this issue had to wait so long. Today I found #5487, #5535, #5987, RFC#1, and RFC#2, which provided me with the necessary context for the described problem (BTW golonka/bbcodeparser will use Shortcode from v3.0.0: #29, #33). I don't know why, but while answering the original question from May I forgot that in February I introduced an event subsystem for the v0.6.0 release which can help easily solve this case. The solution is to add a REPLACE_SHORTCODES event handler that expands the replacement onto the unwanted preceding and following fragments. This may be a good addition to the Shortcode's core, for now, you can use the code below. Note that you should customise the detection logic (those preg_matches) in the inner condition. Please let me know if that solves your issue.

final class ReplaceAroundEventHandler
{
    public function __invoke(ReplaceShortcodesEvent $event)
    {
        $event->setResult(array_reduce(array_reverse($event->getReplacements()), function($state, ReplacedShortcode $r) {
            $offset = $r->getOffset();
            $length = mb_strlen($r->getText());
            $prefix = mb_substr($state, 0, $offset);
            $postfix = mb_substr($state, $offset + $length);

            if(preg_match('~(<p>\s*)$~', $prefix, $prefixMatch) && preg_match('~(^\s*</p>)~', $postfix, $postfixMatch)) {
                $prefix = mb_substr($state, 0, $offset - mb_strlen($prefixMatch[0]));
                $postfix = mb_substr($state, $offset + $length + mb_strlen($postfixMatch[0]));
            }

            return $prefix.$r->getReplacement().$postfix;
        }, $event->getText()));
    }
}

I wrote a simple script to test the idea:

$handlers = new HandlerContainer();
$handlers->add('code', function() { return 'inner'; });
$events = new EventContainer();
$events->addListener(Events::REPLACE_SHORTCODES, new ReplaceAroundEventHandler());

$processor = new Processor(new RegularParser(), $handlers);
$processor = $processor->withEventContainer($events);

echo $processor->process('random ><p> [code /] </p>< string')."\n";
echo $processor->process('random ><p>[code /]</p>< string')."\n";
echo $processor->process('random ><p>  '."\n".' [code /]    '."\n\n\n".'</p>< string')."\n";
echo $processor->process('random ><div>[code /]</div>< string')."\n";

Output:

random >inner< string
random >inner< string
random >inner< string
random ><div>inner</div>< string

I've left my comment over at silverstripe/silverstripe-framework#5987 (comment) 🗡️

I think this addresses our issue. We'll probably look at your proof of concept and clean it up to suit our needs. Thanks very much for your help!

There does not seem to be more discussion going on about Shortcode integration in SilverStripe, so I'm closing this issue. If I can help in any way, please open a new one.

Anyways, having the ReplaceAroundEventHandler as a standard class in this repo would be awesome.

@thunderer It seems to me that this approach always removes the <p> tags around all shortcodes.

Do you think there is a way to do this per-handler?

In my use case, some shortcodes (imagine [current_year]) should just insert text into the $content as-is, without removing surrounding <p>s.

Other shortcodes, for examle [copyright_block] are supposed to insert complete HTML blocks, having the need to remove surrounding <p>s.

@mpdude I agree about having the class in the repository, but I didn't find a way to implement it in the generic, configurable way. I'd be happy to talk about that if you want to contribute such solution.

As for your other question, the example class above can be easily modified to not replace certain shortcodes by adding false === in_array($r->getName(), ['current_year'], true) to the condition inside __invoke() method. That way it won't match prefixes and postfixes of shortcodes names inside the array and replace them using the computed replacement only.

@mpdude I tinkered a bit with the idea of generic ReplaceAroundEventHandler right now and this is an untested prototype that should work (example at the bottom). The configuration is quite flexible although 1) it needs to be specified for every shortcode 2) there is no way to provide default configuration 3) I'm not sure it covers all cases, for example, what should happen if the string manipulation is too complex to be handled by regular expressions.

final class ReplaceAroundEventHandler
{
    private $options;

    public function __construct(array $options)
    {
        foreach($options as $name => $option) {
            if(empty($option)) {
                continue;
            }
            if(array_diff_key($option, ['prefix' => 1, 'postfix' => 2])) {
                throw new \InvalidArgumentException(sprintf('Invalid replace around configuration for shortcode `%s`!', $name));
            }
        }

        $this->options = $options;
    }

    public function __invoke(ReplaceShortcodesEvent $event)
    {
        $event->setResult(array_reduce(array_reverse($event->getReplacements()), function($state, ReplacedShortcode $r) {
            $name = $r->getName();
            $offset = $r->getOffset();
            $length = mb_strlen($r->getText());
            $prefix = mb_substr($state, 0, $offset);
            $postfix = mb_substr($state, $offset + $length);

            if(false === array_key_exists($name, $this->options[$name])) {
                return $prefix.$r->getReplacement().$postfix;
            }
            if(preg_match($this->options[$name]['prefix'], $prefix, $prefixMatch)) {
                $prefix = mb_substr($state, 0, $offset - mb_strlen($prefixMatch[0]));
            }
            if(preg_match($this->options[$name]['postfix'], $postfix, $postfixMatch)) {
                $postfix = mb_substr($state, $offset + $length + mb_strlen($postfixMatch[0]));
            }

            return $prefix.$r->getReplacement().$postfix;
        }, $event->getText()));
    }
}

$handler = new ReplaceAroundEventHandler([
    'year' => [],
    'copy' => ['prefix' => '~(<p>\s*)$~', 'postfix' => '~(^\s*</p>)~'],
]);

Thanks @thunderer for following up so quickly!

I still need to have a look at your suggested solution above – due to urgency, I had to deploy an (intermediate) solution this morning that works with your snippet above.

Two short remarks:

  • The ReplaceShortcodesEvent will have its $replacements field set to null when no replacements are given. Either initialize that field as array in the constructor, or add a check in the above code to catch the $event->getReplacements() === null case. (array_reverse will complain otherwise).
  • Might be a good idea to explicitly pass utf-8 to all the mb_* functions in the code above.

You might want to update the code in case anybody comes along and tries to use it.