The current process sends an attachment to user only when user request the file. This causes two problem:
- There's a lag time from when user request the file to the time that they receive it (due to the file size)
- The same file is being sent multiple times when being requested.
- In addition this also solves the problem of transferring files from a bucket to another bucket
- Leverage the use of attachment_id by Facebook
- Create an additional collection in mongo using URL as a primary key.
- There are a few hundread links in our database for each client(each with file size 300KB-15MB), to upload each of them synchronously will take a long time. (IO bound)
- Convert all links to attachment_id provided by Facebook
- Using asynchronous access
- Migrate from dev bucket to production bucket
- aiohttp
- aiofiles
- Python
- Get all URLs
- Filter valid and working URLs
- Download all URLs (async)
- Upload to new cloud bucket (blocking)
- Upload to Facebook for attachment ID (async)
- Modify database entries(removing existing
attachment_id
and update URL)
p/s The program's semaphore is set at 8, can be increased or lowered depending on the network stability