facebookarchive / flashback

Capture and replay real mongodb workloads

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

flashback and pcap_converter: add getmore support

timvaillancourt opened this issue · comments

It looks like getmore isn't really supported in flashback and pcap_converter, tracking this here.

The executor currently does nothing in 'flashback': https://github.com/ParsePlatform/flashback/blob/master/ops_executor.go#L106-L108.

I'm curious what thoughts are out there on how to add this. Right now my best idea is tracking query -> cursorId mappings so that getmores can be called on real cursors, but I'm guessing this would introduce some memory usage problems in the right scenarios. I might start a branch to try this out unless there are other ideas.

If you make multiple passes through the replay file before playing (or at least do some kind of significant read ahead) maybe you could build a list of queries ahead of time that will need query -> cursorId mappings to minimize your memory footprint so you aren't building a query->cursorid map for every query.

I think you also have to bind the getmores that you want to replay to the connection that issued the original query, so that could affect the current worker model which just blindly grabs ops off the channel to replay them. Depending on the workload you could have a significant percentage of the worker pool dealing with getmores and not able to work on other kinds of ops, which could affect throughput in ways that are not representative of the original workload. This might warrant a separate worker pool that only handles the query/getmore operations.

As to how to tie the original queries to subsequent getmores - I'm not sure how you would do this with the original python 'record' tool. AFAIK the cursorid is not exposed in the profiler for the original query, nor is it in the mongodb logs. You could definitely get this out of the pcap file though since it is returned to the client.

Yikes! Thanks @tredman. This is a bit more difficult than I expected.

Yes, I think you're right that the cursor IDs should be bound to a connection, or at least I'd be worried to find out what happens if you don't.

I'm not sure if I'll try to tackle this but I'll leave this here for anyone who wants to if I don't.