alphagov / asset-manager

Manages uploaded assets (images, PDFs etc.) for applications on GOV.UK

Home Page:https://docs.publishing.service.gov.uk/apps/asset-manager.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow Asset Manager API to return size

chrislo opened this issue · comments

The AssetManagerAndQuarentinedFileStorage engine implements a size method that used the quarentined file to calculate the size.

We plan to switch the storage engine for AttachmentUploader to AssetManagerStorage and therefore store all files exclusively in Asset Manger. This means we will need to implment the size method on AssetManagerStorage and it will have to fetch the data from the Asset Manager API. on remove all of the assets from the disk, so we need to instead calculate it by making a call to the Asset Manager API. We have migrated all the existing attachments to Asset Manager and are adding new ones to both the disk and AM. That means we can make this change now - it will make it easier to subsequently store the attachments exlusively in Asset Manager.

  • Store file size information in the Asset Manager Database when a new asset is added
  • Add file size information to the JSON returned by Asset Manager
  • Run db:set_size_for_all_uploaded_assets on integration
  • Run db:set_size_for_all_uploaded_assets on staging
  • Run db:set_size_for_all_uploaded_assets on production

The code in PR #483 has been deployed to production, so we are now saving asset sizes for newly uploaded assets.

The worker-based migration added in #484 is currently been run on integration and staging.

The migration took just under an hour to run (rake task + processing all jobs) on staging. It's considerably slower on integration (around 150 job/s compared to almost 3000 job/s on staging).

screen shot 2018-02-20 at 10 22 21

I've asked 2ndline to run the task on production.

The migration completed in around 80 mins on production.

screen shot 2018-02-20 at 11 34 52

And, it completed in around 4.5 hours on integration.

screen shot 2018-02-20 at 12 52 36

Following continued work switching Whitehall AttachmentUploader to use the AssetManagerStorage engine, which prompted me to create this issue, I've discovered that we don't yet need to fetch the size of the asset into Whitehall. It was used to validate replaced assets, but I've found a way to avoid having to fetch the actual size. The size of the attachment is also displayed to the public, but Whitehall currently caches that in its database on initial upload.

I think it would be good to use size from Asset Manager rather than a cached version in Whitehall as it would mean we could make attachments behave more like other asset types. So for now, I don't think its worthwhile removing the size field I added in this issue from Asset Manager, but I am going to close this issue.