google / import-mailbox-to-gmail

Import .mbox files into Google Workspace

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make import faster

aronwoost opened this issue · comments

Problem

Importing masses of email takes very long, since emails are imported one-by-one sequentially. The GMail API takes up to 5sec to handle a single email. This is independent from network conditions, I test from EC2 and Google Compute Engine.

Possible solutions

Batch request could be used. Upside would be that it still works somewhat sequentially, by importing chunks of mails. Downside is that this lib would require a serious rework, since we could no longer rely on the handy service wrapper.

Or API calls could be fired in parallel (see GMail quotas). This - of course - requires additional error and retry logic but is still the "lower hanging fruit", IMO.

I'm aware that this project is not actively maintained. Still, this might be a good starting point for a intern coming to the GMail team.

Gmail message import is blocking AFAIK, so only one message can get imported at a time. Even if we parallelize or batch, the improvement won't be huge (unless most messages are large, in which case this may make upload time more efficient).

If you do want to parallelize, as a workaround, you can run multiple instances of the script at the same time on separate mbox files. The script is set to 10 retries by default, but you can change it using the --num_retries argument.

Alternatively, you can change the script to use insert instead of import_. This means much faster inserts, but has several disadvantages, such as no threading, no deduplicating, and no classification.

Watch this video & follow the step by step by this video, it may help you: https://youtu.be/beNmXaXQkr4

You can easily import all Gmail mbox file to you gmail account without using any software