thunderer / Shortcode

Advanced shortcode (BBCode) parser and engine for PHP

Home Page:http://kowalczyk.cc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

(question) Also retrieve content outside shortcodes when parsing

thdebay opened this issue · comments

Hello,

I discovered this library while searching for a solution to parse content with Wordpress shortcodes. It is fantastic, thank you for all the efforts you put into this project :)

I'm trying to make sure I'm using it the right way. What I need to achieve is transforming this initial content:

Block of regular text
[my-shortcode id="1"]Shortcode content[/my-shortcode]
Other block of regular text

into an array of blocks

[
  0 => [
    'type' => 'text',
    'content' => 'Block of regular text'
  ],
  1 => [
    'type' => 'my-shortcode',
    'id' => 1,
    'content' => 'Shortcode content'
  ],
  2 => [
    'type' => 'text',
    'content' => 'Other block of regular text'
  ]
]

So everything is working great to parse and extract content from shortcodes, but I am wondering whether there is a way to also extract the content from outside the shortcodes?

Hi @thdebay, thanks for the kind words! I still have an idea in my mind for a shortcode "AST" (Abstract Syntax Tree) where instead of returning shortcodes you would get a kind of tree with all metadata about elements' positions, structure, etc. In your case, I think what you need are shortcode offsets by which you can split the text after parsing. Look at the example below, the final variable $parts contains a list of arrays with type (text or shortcode) and text properties. You can tweak this to have the shortcode instance. As for the "ID", you can use the offset value as it will be unique in the whole text. Also note that RegularParser (unlike WordPress or any other regex-based parser) properly handles nesting, so you need to repeat the process for nested shortcodes. Hope that helps!

<?php
declare(strict_types=1);
namespace X;

use Thunder\Shortcode\Parser\RegularParser;
use Thunder\Shortcode\Shortcode\ParsedShortcodeInterface;
use Thunder\Shortcode\Syntax\CommonSyntax;

require_once __DIR__.'/vendor/autoload.php';

$text = <<<EOF
Block of regular text
[first /][my-shortcode id="1"]Shortócode❤️ content[/my-shortcode][last /]
Other block of [inner] regular text
EOF;

$parser = new RegularParser(new CommonSyntax());
/** @var ParsedShortcodeInterface[] $shortcodes */
$shortcodes = $parser->parse($text);
$parts = [];
$offsets = [];
if($shortcodes && $shortcodes[0]->getOffset() !== 0) {
    $offsets[] = 0;
}
$shortcodeOffsets = [];
foreach ($shortcodes as $shortcode) {
    $shortcodeOffsets[] = $shortcode->getOffset();
    $offsets[] = $shortcode->getOffset();
    $offsets[] = $shortcode->getOffset() + mb_strlen($shortcode->getText());
}
$offsets[] = strlen($text);
for($i = 0; $i < count($offsets) - 1; $i++) {
    if($offsets[$i] === $offsets[$i + 1]) {
        continue; // one shortcode right after the other
    }
    $parts[] = [
        'type' => in_array($offsets[$i], $shortcodeOffsets, true) ? 'shortcode' : 'text',
        'text' => mb_substr($text, $offsets[$i], $offsets[$i + 1] - $offsets[$i]),
    ];
}

var_dump($parts); // your actual logic here

Result, compressed a bit, but don't worry - actual result properly handles newlines:

[
    0 => ['type' => 'text', 'text' => 'Block of regular text'],
    1 => ['type' => 'shortcode', 'text' => '[first /]'],
    2 => ['type' => 'shortcode', 'text' => '[my-shortcode id="1"]Shortócode❤️ content[/my-shortcode]'],
    3 => ['type' => 'shortcode', 'text' => '[last /]'],
    4 => ['type' => 'text', 'text' => 'Other block of ',],
    5 => ['type' => 'shortcode', 'text' => '[inner]'],
    6 => ['type' => 'text', 'text' => ' regular text'],
]

Thanks a lot @thunderer, this is exactly what I needed. I had thought of another solution based on str_replace() but yours is much better and more efficient with mb_substr() and getOffset(). Again, thank you, I really appreciate your help! I'll implement your solution in my project, and I'd be glad to help with the testing if this feature makes it to the code of the library some day.

No problem, @thdebay. I'm happy that it helped you and hopefully, I'll find some time to tinker with the AST I talked about above. I'm closing this issue now, but feel free to open a new one if you have more questions!