Parsedown: get all image links
MarkMessa opened this issue · comments
Is it possible to get all image links parsed by Parsedown?
I'm considering something like:
$Parsedown = new Parsedown();
$file = file_get_contents('filename.txt');
echo $Parsedown->text($file);
# output
image1.png
image2.png
filename.txt
![][image1]
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam porttitor nulla id luctus hendrerit.
![](image2.png)
Integer sed ultricies ante, sed mattis mauris. Donec et nisl sapien.
[image1]: image1.png
Hook to the inlineImage
method and capture all src
value to a public property:
class ParsedownGetImageSrc extends Parsedown {
public $imageSrcData = [];
public function inlineImage($Excerpt) {
if ($Inline = parent::inlineImage($Excerpt)) {
if (isset($Inline['element']['attributes']['src'])) {
$this->imageSrcData[] = $Inline['element']['attributes']['src'];
}
}
return $Inline;
}
}
$parser = new ParsedownGetImageSrc;
$text = $parser->text(' ... ');
# All image `src` data now stored in `imageSrcData`
echo json_encode($parser->imageSrcData);
Ok, seems to work fine. Thnks!
php > require 'Parsedown.php';
php > require 'ParsedownGetImageSrc.php';
php > $parser = new ParsedownGetImageSrc;
php >
php > $parser->text('Lorem ![](filename1.ext) ipsum.
php ' Dolor ![][image] sit amet.
php ' [image]: filename2.ext');
php >
php > echo json_encode($parser->imageSrcData);
["filename1.ext","filename2.ext"]
Considering that your extension requires the overhead of executing the whole Parsedown, I was considering a lighter alternative such as regex:
\!\[.*\]\((\S+)\s*.*\)
to match![title](filename.ext 'alt')
\[.+\]\:\s(\S+)(?:\s".*")?
to match[image1]: image1.png "some title"
Any comment?
You will fail on this case:
![a](b)
aaa ![a](b) bbbb
![a](b)
~~~
![a](b)
~~~
aaa `![a](b)` bbb
It also fail with escaped references (demo):
![a](b)
![c](d)
\![a](b)
~~~
![a](b)
~~~
`![a](b)`
Any idea how to fix that?
Not possible without parsing it. The other solution is to parse the Markdown syntax to HTML and search for <img>
tag with DOMDocument
and such. So you don’t need to extend the Parsedown
class.
Not possible without parsing it.
Parsing the document against the full Parsedown syntax to get just the image links is somewhat inefficient.
The other solution is to parse the Markdown syntax to HTML and search for tag with DOMDocument and such.
Again, this seems inefficient. There is a lot of overhead to create a full HTML version and then searching for tags. It would be better to search for image links directly into the markdown syntax.
Then just match every image URL. You should be able to get it somewhere from the internet.
/^https?:\/\/\S+\.(?:gif|jpe?g|png|svg)$/
This way you will get the url from <img>
, but also from <a>
which is not the case.
Besides, it will fail in the following cases:
# local file path instead of url
![title](filename.ext)
# escaped reference
\![a](b)
# code block
~~~
![a](b)
~~~
# code span
`![a](b)`
Note: The current accepted answer is already fine to me. This issue of overhead is just a comment rather than a bottleneck.