cviebig / lib-sql

SQL parse tree in C++

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support identifiers without extra allocation

eyalroz opened this issue · comments

At the moment, identifiers incur the stiff penalty of constructing a string - including allocation and copying of data from the unparsed string. I would be useful that could be just optional, and that we save a mere std::string_view (or gsl::string_view) - pointing into where we parsed from.

Possible? I should speed up parsing...

I would expect, though I did not check it, that the allocation only takes place after successfully parsing the rule, and so at least for identifiers probably happens rarely. Still I see that for large amounts of strings being parsed, for example in inserts without parameters this indeed could be an issue.

Maybe we can make it configurable by introducing a macro? Using C++17 variable template parameters for grammar rules could be interesting too, as it could also help with enabling extension points for non-standard syntax amendments. What do you think?

An alternative performance issue that might be worth looking into is the parsing of sub queries and precedence climbing. If I remember correctly that gave noticeable slow downs.

Well, even if you only had a single allocation after parsing, you would still have a very high number of allocations; and an allocation is very costly - hundreds up to tens of thousands of clock cycles; using a string_view would require no further allocation at all. Also, when you parse, you try to match rules, and some of that might be speculative, so there might be even more allocations.

Another possible detriment of allocation is exceeding cache sizes, but that's secondary.

About making this configurable: I would think you need to figure / it needs to be figured out whether there's a benefit to using std::string's. Also, there are actually multiple options:

  1. The current behavior with std::string
  2. Using an std::string with a custom allocator (we might pre-allocate space the size of the query, then sub-allocate space for individual strings within it, then finally trim the area).
  3. Use of sring_views, no allocation

Hey, I'm sorry for not answering earlier. I must have missed the notification e-mail.

True, with large amounts of allocations this will not be efficient. I was prefering a non-string-view solution because of my personal preference of having a self-contained data structures. When using string-views you would have to make sure to always keep the original text input alive. Also I was expecting the SQL parsing to be not in the hot path, but I see the point of not wasting resources.

Do you think that such a pool or arena allocator could also be of use for allocations of recurvie variant types? Though http://en.cppreference.com/w/cpp/utility/variant is mentioning that std::variant does not support allocators. I guess I have to read a bit up on this?

Umm, I don't see how std::variant should need allocators though... even recursive variants actually have the maximum size of the largest possible value over the entire variant options "tree".

But how can they determine the maximum size without knowing how deep the recursion will be at runtime? :-) https://stackoverflow.com/a/39456323/1531656 clarifies that std::variant does not support recursion on it's own and thus not allocate while boost::variant indeed does.