Make import faster
aronwoost opened this issue · comments
Problem
Importing masses of email takes very long, since emails are imported one-by-one sequentially. The GMail API takes up to 5sec to handle a single email. This is independent from network conditions, I test from EC2 and Google Compute Engine.
Possible solutions
Batch request could be used. Upside would be that it still works somewhat sequentially, by importing chunks of mails. Downside is that this lib would require a serious rework, since we could no longer rely on the handy service wrapper.
Or API calls could be fired in parallel (see GMail quotas). This - of course - requires additional error and retry logic but is still the "lower hanging fruit", IMO.
I'm aware that this project is not actively maintained. Still, this might be a good starting point for a intern coming to the GMail team.
Gmail message import is blocking AFAIK, so only one message can get imported at a time. Even if we parallelize or batch, the improvement won't be huge (unless most messages are large, in which case this may make upload time more efficient).
If you do want to parallelize, as a workaround, you can run multiple instances of the script at the same time on separate mbox files. The script is set to 10 retries by default, but you can change it using the --num_retries
argument.
Alternatively, you can change the script to use insert
instead of import_
. This means much faster inserts, but has several disadvantages, such as no threading, no deduplicating, and no classification.
Watch this video & follow the step by step by this video, it may help you: https://youtu.be/beNmXaXQkr4
You can easily import all Gmail mbox file to you gmail account without using any software