thunderer / Shortcode

Advanced shortcode (BBCode) parser and engine for PHP

Home Page:http://kowalczyk.cc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

On large HTML files, RegularParser does not fire any handler.

devnix opened this issue · comments

I've been implementing this library on a project in my company, which scraps a whole local Wordpress with httrack and then modifies the files in order to fit to our needs.

To make some parts dynamic, we're implementing shortcodes (which we translate to custom PHP code). I've been testing your library without any problems, but when I gave it a definitive file, it didn't triggered any handler, and the parser pushed PHP memory consumption over 800MB.

After several tests, I've decided to try the RegexParser. It parsed the files correctly, and using a riddiculous amount of time and memory compared with the standard one.

I can understand the extra time and memory consumption (uncer certain limits), but I don't get why the parser didn't saw any of my tags, I am the only one who suffered this issue?

BTW, thank you very much for your work, your library is awesome and I'm enjoying it a lot!

Hi @devnix, thanks for your kind words! I'm happy that you like this library.

Can you send me a script to reproduce the issue along with the problematic file? Also, can you verify that you're using the latest version of Shortcode (v0.6.5)?

@devnix did you see my previous comment? Can you provide the information I need?

Yes, excuse me!

I have to get some time to prepare you a couple of files to reproduce the issue.

No problem, I just wanted to know what is the current status. Send them my way when you're ready.

Hi @devnix, did you have the chance to prepare the test data? It's fine if you need more time, just let me know if I can help you in any way. Meanwhile, you may be interested in #72, I found over 100x performance gain in RegularParser so there should be no penalty of running it vs other parsers.

@devnix good news, during verification phase I managed to reproduce your issue and I'm happy to tell you that it's fixed! Once #72 will be checked by others and merged, I'm releasing a new version right away. Unfortunately the fix ate some performance gains I mentioned above, but the result is still very good - above 10x gains in both time and memory.

This also means that you don't need to prepare anything for me, but I'd be very grateful if you could report back and confirm what I've found. Thanks!

Wow, that's great! I'm afraid I'm very busy lately and I've tried to do more things that I've been able to, so it was difficult for me to prepare the test data removing all the sensible info to replicate the issue...

I think I can confirm you in the following days if the patch works. Should I use the latest version of master?

@devnix Yes, please use the latest master. I'm currently preparing the next release so it would be great to have your confirmation before tagging it.

It seems like it worked well! I think I can close the issue and thank you a lot for your work, besides I could not give you all the help I would liked to.

@devnix v0.7.0 was just tagged, thanks for your cooperation!

FYI, I still think there is an issue for large documents. I'm working on moving some Grav documentation to use a new prism.js shortcode, and it does not render the shortcodes at all with the Regular parser. Regex parser works in this case though:

https://raw.githubusercontent.com/getgrav/grav-learn/feature/learn4/pages/06.forms/01.blueprints/01.fields-available/docs.md

@rhukster can you open a new issue for that? Please include an example code to reproduce the problem and describe the expected result, ie. which handlers are not fired. I'll be happy to look at it in the next few days. Thanks in advance!