marcioAlmada / annotations

The KISS PHP annotations library.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deflate Parser + Speed Improvements

marcioAlmada opened this issue · comments

Parser method is too long and it's getting complicated. CRAP index is currently around 8. This needs a fix before version 2.

15% faster now af0b221

Nice work !

😆 still we have so much room for improvements, though

This latest improvement ebabf85 benefits dynamic annotations only, usually the most frequent ones.

Hi,
Is it possible to add your speed tests to the source code ? So that if someone make a change or try to improve the code he/she can directly test it against your performance tests.

Hi @nyamsprod,

I'm currently using the unit tests as benchmark. Steps:

  1. disable xdebug extension (very important)
  2. run phpunit tests at least 300 times against old code: phpunit --repeat 300
  3. wait for cpu to settle down (just in case it's necessary)
  4. run phpunit tests the same number of times against optimized code: phpunit --repeat 300
  5. compare time and memory usage

I also just added some test groups [parser, bag, facade] so it gets easier to aim at some specific code. For this issue, I'm using just the parser group for measurements: phpunit --group parser.

Hope it was helpful. Cheers!

PS: Due to license reasons, some linux distros are using different versions of json extension. Parser relies a lot on json_decode so results may vary (just a little). Usually ext-json is much faster than pecl-json-c. To check which extension you're using just run defined(JSON_C_VERSION). True means you're using pecl-json-c, false means you're using json-ext.

The current bottleneck is the implicit boolean type. It depends too much on incremental string scanning 😉

if ($line->scanImplicitBoolean($identifier_pattern)) { // if implicit boolean
    $parameters[$key][] = true;
    while (! $line->hasTerminated()) {
        $line->skip("/\\{$identifier_pattern}/");
        $key = $line->scanKey($key_pattern);
        $parameters[$key][] = true;
    }
    continue;
}

Thanks for the response, I'll use your method then for test standardisation. For the json extension, I'm well aware of the differences and I'm already taking into account those changes in my codes.

I have a problem with your optimizations on the implicit loop.
First, I've added the following property to AnnotationsFixture class :

    /**
     * @get @post @ajax float 2.1
     */
    private $multiple_values_fixture2;

Then I've added the following test in the ParserTest class:

    public function testParseMultipleValuesFixture2()
    {
        $res = $this->getParser('multiple_values_fixture2')->parse();
        $this->assertSame(['get' => true, 'post' => true, 'ajax' => 2.1], $res);
    }

This test does not work but you code introduce an infinite loop !! so the Unit testing does not end at all. I'm still working on the code but I think this bug is important enough to be listed here

Interesting! In fact, this has never been tested before. I always assumed that no one would ever do this, so I never even tried to see what happens.

Well, you found one more motive to stop using while + incremental scanning. A simple one pass preg_match_all is the best solution IMMO, both for performance and to avoid creation of parser black holes like this one ;)

Would you mind to create another issue for this? Also, to keep consistency, we should name the fixture like this:

/**
 * @get @post @ajax float 2.1
 */
private $bad_implicit_boolean_fixture;

The original intentions with the implicit boolean annotations was that they shouldn't have explicit values (because their presence already means the value is true) and that only them can be declared in the same line. Annotations with declared values should never be in the same line with others.

I've resolve the problem while keeping the "while" :) soon on Github. I've also rewrote the Scanner class. I'll upload my changes so that you can see what I did but it's still works on progress

When you say "rewrote", you mean "from scratch"?

No .. I have simplified the class I've just push the code on my account https://github.com/nyamsprod/annotations/tree/parser-improvement

Well, these are some very substantial changes. Did you achieve any relevant optimization?

yes .. but the last stable does not work on my local machine and when I'm trying phpunit --group parser --repeat 30 phpunit does not run the code but I don't know why ? but when use separately (group and repeat) works

so I want to confirm the performance gain before stating that the code is OK. The main idea behind the performance optimization is less function calls but still keeping the code readable and well decoupled. so for instance the json detection is done only once and I avoid wrapping php native function in methods

I fired an issue reporting this phpunit bug yesterday sebastianbergmann/phpunit#1085, no response yet.

But testing with phpunit --repeat 300 might be enough to reveal any relevant optimization. A bit noisy, but the relevant numbers are the ones that can emerge from noise anyway.

yes .. but the last stable does not work on my local machine and when I'm trying...

The lastest stable phpunit or minime/annotations?

The lastest stable minime/annotations I have a error on the float fixture test

We really need to track this down. It might be related to the json\ext being superseded by pecl-json-c extension. Please, could you create a new issue reporting the problem? You can also use issue #20 too.

yes of course let met do this now otherwise I migh forget 👍

ok, thanks

I have:

  • upgraded my json-c lib
  • added the ReaderTest class into the unit test facade group
  • added a ScannerTest class to have a 100% code coverage for my Scanner class and added it into the unit test parser group
  • compared the master branch to the parser-improvement branch with the following settings:
    phpunit --exclude-group bag,facade --repeat 300

Here is my results:

Time: 3.4 minutes, Memory: 35.25Mb (master branch)
Time: 1.55 minutes, Memory: 20.25Mb (parser-improvement branch)

So there's a 45% improvement... but you should run the test to see by yourself

I'm sorry, you really meant minutes? I usually run phpunit --repeat 300 in 2.8 seconds or less...

must be xdebug presence .. I should disable it :)

Okay now with xdebug disabled I get:

Time: 15.64 seconds, Memory: 7.25Mb (master branch)
Time: 7.72 seconds, Memory: 5.00Mb (parser-improvement branch)

So the conclusion stay the same. I should point that my dev computer is not very fast :) (Intel® Pentium(R) D CPU 2.80GHz × 2 ) so with a more modern computer I'm sure it would be faster.

Tried to merge the code on a test branch. Sooo much conflicts.

Yes probably because of the Unit tests I had to rewrite :( I think you can completely remove the ParserTest and with the one from the master branch. I have the bad habit to overly renamed function with the standard test prefix ... my fault

Yes, that prevented me to merge a lot of your contributions lately 😄. Could you please create another branch based on current develop branch and commit changes there? without the unnecessary code, of course. Develop is always ahead master, so if you base yourself on master there will always be many conflicts to solve.

Also, if you're optimizing something, you shouldn't touch already existent unit tests, just add new ones. Let the modified tests for another pull request.

Code standard fixes should be on a separate pull requests too, but please skip all those PSR-1 and PSR-2 related stuff, this is done automatically with code fixing tools.

Please do that so I can test the code here too without loose track of the really important changes.

Ok I'll do that tomorrow morning

Nice, I'll comment the code in your branch.

@nyamsprod I just cloned your repository and run the tests against current develop. I don't understand how you got this 45% improvement rating... your code is running slower than current develop branch even without the latest applied JSON_PARSER_NOTSTRICT patch.

I found quite strange that you said it was running so faster, none of the commits seemed to really improve speed. Here are the results:

pecl-json-c 1.3.2

pecl-json-c

ext-json

With ext-json, difference against the improvements gets even more noticeable:

ext-json

I guess you're probably running your benchs against a very very outdated master, right? Something has to be wrong cause ones don't simply improve 45% of speed on a specific machine only with not so critical source code changes.

yes I've added an upstream branch on my local machine to keep with the changes in master and now the performance gain is indeed lower

I guess we should focus on improve this https://github.com/marcioAlmada/annotations/blob/master/src/Minime/Annotations/Parser.php#L67 before any big step:

if ($line->scanImplicitBoolean($identifier_pattern)) { // if implicit boolean
    $parameters[$key][] = true;
    while (! $line->hasTerminated()) {
        $line->skip("/\\{$identifier_pattern}/");
        $key = $line->scanKey($key_pattern);
        $parameters[$key][] = true;
    }
    continue;
}

To something like this:

if ($line->checkImplicitBoolean($identifier_pattern)) { // if implicit boolean
    //  a single regex call here: `preg_match_all`
    // merge results into parameters all at once
    continue;
}

Or maybe refactor this https://github.com/marcioAlmada/annotations/blob/master/src/Minime/Annotations/Parser.php#L61, so it comprehends the implicit boolean types before we get into the parsing loop.

The thing I don't get is why you want to removed the while ? On its own the while construct is not a bottleneck in the code. Micro-optimization can be dangerous if done everywhere. The code as it stand as a good CRAP index (around 3 or 4 I think) and is easily readable adding more optimization can lead to bug like the implicit boolean one. And don't forget that you want to add multi lines value parsing. so a if will have to be added somehow in the parse method as some point which will lead to a greater CRAP index.

The problem is not the while itself, it's the string scanning operations.

StrScan class uses regex to advances towards the end of the string, this is like walking, step by step, like when someone is blind and doesn't know the path.

When using a single regex call (using a more complex regex), it's like doing a big jump, like when someone can see what's in front and can just jump to the end of the path. This was done before here: af0b221

Oki no problem .. I've updated my parser-improvement branch with your suggestion. So I no longer require StrScan at all .. I did not change the composer file .. I'll leave it to you this change. The boost is significiant .. I've gained almost 2 seconds 👍

I cloned your fork again with the latest updates and run the same tests again. 😄 WE'RE FASTER NOW (10~12%) + your code looks nice and clean. So we have:

  • no more incremental string scanning
  • less code to maintain

bench2

Awesome result! I would love to merge the new optimized code.

Not yet, I've found a bug in the data_pattern regex I'll work on a fix tomorrow

Thanks for the improvements, specially @nyamsprod. I think we can close this one for now and reopen if necessary.

Cheers!