benbalter / jekyll-include-cache

A Jekyll plugin to cache the rendering of Liquid includes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Potential memory bloat

ashmaroli opened this issue · comments

Tl;dr

This ticket is to pitch the possibility of the following:

  • Bail out earlier if the rendering won't be cached eventually.
  • Consider hashing the params (params.hash) prior to being passed to Digest
  • Cache generated Digest key itself.

Summary

Using {% include ... %} within a Liquid loop is known to increase build times. But in such cases,
this plugin should consider memory usage (within practical limits) as well.

While I agree that the Cache itself would grow over time, the intermediate usage can be reduced.

Details

This ticket is regarding the following lines:

path = path(context)
params = parse_params(context) if @params
key = key(path, params)
return unless path

def key(path, params)
Digest::MD5.hexdigest(path.to_s + params.to_s)
end

👉 L#10 is executed irrespective of whether the tag will be rendered or not.. (and therefore allocate memory even if the program returns in L#11)

The allocation due to L#10 can be huge in certain situations. For example (sourced from an actual repo):

<ul class="list">
  {% for post in page.posts %}
    {% if post.categories contains 'links' %}
      <li class="list__item">
        <div class="card card--link">
    =>    {% include_cached components/link-card.html link=post %}
        </div>
      </li>
    {% else %}
      <li class="list__item list__item--large">
        <div class="card card--article">
    =>    {% include_cached components/post-card.html page=post %}
        </div>
      </li>
    {% endif %}
  {% endfor %}
</ul>

In both uses above, #parse_params is going to yield the post object jsonified (or maybe the actual Jekyll::Document object) either of which isn't a small object.
Consequently when the params Hash is stringified in L#31, the entire json string or perhaps the result of Jekyll::Document#to_s gets passed to Digest::MD5

Since this operation occurs before the Cache is traversed, this operation will always allocate significant memory.

@ashmaroli thanks for this. Would the solution be to return after the path is calculated?

To me the solution is multi-pronged as listed under ## Tl;dr above. To elaborate further:

  1. Yes, 👍 to return unless path right after calculating path.
  2. However, if path is valid, then #key is still going to take the path and large object params (based on the cited example) as arguments. I was pitching for having key = key(path, params.hash) instead.
  3. That said, the key has to be technically, computed for every instance of the tag before the Cache can be checked. Therefore, a layout containing {% include_cached file.html foo='bar' %} will generate a new "String object" (but with same Digest value) for every page render. So it'd be awesome if the "digest key" itself can be cached as well.
commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.