readium / readium-sdk

A C++ ePub renderer SDK

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Really slow retrieval of manifest item by relative path with massive spines.

mickael-menu-mantano opened this issue · comments

Part of this pull request is fixing this issue:
#198

Media Overlays code reference points (copy / paste from the email discussion):

I fixed a similar performance issue a long time ago with the Media
Overlays path matcher (I resorted to a cached map).

std::map<string, std::shared_ptr<ManifestItem>> cache_smilRelativePathToManifestItem;

std::map<std::shared_ptr<ManifestItem>, string> cache_manifestItemToAbsolutePath;

So, a lot of string concatenation / manipulations occur at that point:

string ManifestItem::AbsolutePath() const
https://github.com/readium/readium-sdk/blob/develop/ePub3/ePub/manifest.cpp#L183

BaseHref
https://github.com/readium/readium-sdk/blob/develop/ePub3/ePub/manifest.cpp#L203

Also, note that when the first loop iteration fails, we have to check
for lower/upper-case percent encoding mismatches! Hopefully, your test
did not include this codepath? (otherwise it would have added a huge
processing cost)
https://github.com/readium/readium-sdk/blob/develop/ePub3/ePub/package.cpp#L173

see getReferencedManifestItem():
https://github.com/readium/readium-sdk/blob/develop/ePub3/ePub/media-overlays_smil_model.cpp#L616

This improved performance by an order of magnitude!

I found that this is super slow:
string::size_type i = iri.find_first_of('#');

compared to plain old:

const char * str = iri.c_str(); 
for (int j = 0; j < size; j++) 
char c = str[j]; 
if (c == '#') 

See the timer I used to measure the difference:

4ee989d

Closed, see PR #208