(question) Also retrieve content outside shortcodes when parsing

Question

(question) Also retrieve content outside shortcodes when parsing

thdebay opened this issue 2 years ago · comments

Hello,

I discovered this library while searching for a solution to parse content with Wordpress shortcodes. It is fantastic, thank you for all the efforts you put into this project :)

I'm trying to make sure I'm using it the right way. What I need to achieve is transforming this initial content:

Block of regular text
[my-shortcode id="1"]Shortcode content[/my-shortcode]
Other block of regular text

into an array of blocks

[
  0 => [
    'type' => 'text',
    'content' => 'Block of regular text'
  ],
  1 => [
    'type' => 'my-shortcode',
    'id' => 1,
    'content' => 'Shortcode content'
  ],
  2 => [
    'type' => 'text',
    'content' => 'Other block of regular text'
  ]
]

So everything is working great to parse and extract content from shortcodes, but I am wondering whether there is a way to also extract the content from outside the shortcodes?

Tomasz Kowalczyk · Answer 1 · Tue Nov 29 2022 20:37:30 GMT+0800 (China Standard Time)

Hi @thdebay, thanks for the kind words! I still have an idea in my mind for a shortcode "AST" (Abstract Syntax Tree) where instead of returning shortcodes you would get a kind of tree with all metadata about elements' positions, structure, etc. In your case, I think what you need are shortcode offsets by which you can split the text after parsing. Look at the example below, the final variable $parts contains a list of arrays with type (text or shortcode) and text properties. You can tweak this to have the shortcode instance. As for the "ID", you can use the offset value as it will be unique in the whole text. Also note that RegularParser (unlike WordPress or any other regex-based parser) properly handles nesting, so you need to repeat the process for nested shortcodes. Hope that helps!

<?php
declare(strict_types=1);
namespace X;

use Thunder\Shortcode\Parser\RegularParser;
use Thunder\Shortcode\Shortcode\ParsedShortcodeInterface;
use Thunder\Shortcode\Syntax\CommonSyntax;

require_once __DIR__.'/vendor/autoload.php';

$text = <<<EOF
Block of regular text
[first /][my-shortcode id="1"]Shortócode❤️ content[/my-shortcode][last /]
Other block of [inner] regular text
EOF;

$parser = new RegularParser(new CommonSyntax());
/** @var ParsedShortcodeInterface[] $shortcodes */
$shortcodes = $parser->parse($text);
$parts = [];
$offsets = [];
if($shortcodes && $shortcodes[0]->getOffset() !== 0) {
    $offsets[] = 0;
}
$shortcodeOffsets = [];
foreach ($shortcodes as $shortcode) {
    $shortcodeOffsets[] = $shortcode->getOffset();
    $offsets[] = $shortcode->getOffset();
    $offsets[] = $shortcode->getOffset() + mb_strlen($shortcode->getText());
}
$offsets[] = strlen($text);
for($i = 0; $i < count($offsets) - 1; $i++) {
    if($offsets[$i] === $offsets[$i + 1]) {
        continue; // one shortcode right after the other
    }
    $parts[] = [
        'type' => in_array($offsets[$i], $shortcodeOffsets, true) ? 'shortcode' : 'text',
        'text' => mb_substr($text, $offsets[$i], $offsets[$i + 1] - $offsets[$i]),
    ];
}

var_dump($parts); // your actual logic here

Result, compressed a bit, but don't worry - actual result properly handles newlines:

[
    0 => ['type' => 'text', 'text' => 'Block of regular text'],
    1 => ['type' => 'shortcode', 'text' => '[first /]'],
    2 => ['type' => 'shortcode', 'text' => '[my-shortcode id="1"]Shortócode❤️ content[/my-shortcode]'],
    3 => ['type' => 'shortcode', 'text' => '[last /]'],
    4 => ['type' => 'text', 'text' => 'Other block of ',],
    5 => ['type' => 'shortcode', 'text' => '[inner]'],
    6 => ['type' => 'text', 'text' => ' regular text'],
]

Thomas Debay · Answer 2 · Thu Dec 01 2022 04:11:51 GMT+0800 (China Standard Time)

Thanks a lot @thunderer, this is exactly what I needed. I had thought of another solution based on str_replace() but yours is much better and more efficient with mb_substr() and getOffset(). Again, thank you, I really appreciate your help! I'll implement your solution in my project, and I'd be glad to help with the testing if this feature makes it to the code of the library some day.

Tomasz Kowalczyk · Answer 3 · Wed Dec 07 2022 00:52:42 GMT+0800 (China Standard Time)

No problem, @thdebay. I'm happy that it helped you and hopefully, I'll find some time to tinker with the AST I talked about above. I'm closing this issue now, but feel free to open a new one if you have more questions!