shrinerb / shrine

File Attachment toolkit for Ruby applications

Home Page:https://shrinerb.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hard to wrap uploads in a transactions

mcelicalderon opened this issue · comments

Brief Description

I find it difficult or not optimal to maintain the DB in a consistent state when using version and an exception is raised while processing. If I wrap the creation of an AR entity in a transaction and it fails while processing the versions, I will still end up with the entity persisted in the DB and even worse, the attached field that I expect to return a hash with the versions, returns an uploader instead.

Expected behavior

If transaction is rolled back, no trace of it should remain on the DB.

Actual behavior

Traces of the new or updated entity are persisted to the DB.

Simplest self-contained example code to demonstrate issue

require 'active_record'
require 'shrine'
require 'shrine/storage/file_system'
require 'tmpdir'
require 'down'

Shrine.storages = {
  cache: Shrine::Storage::FileSystem.new(File.join(Dir.tmpdir, 'shrine')),
  store: Shrine::Storage::FileSystem.new(File.join(Dir.tmpdir, 'shrine'))
}

Shrine.plugin :activerecord
Shrine.plugin :delete_promoted
Shrine.plugin :remove_invalid
Shrine.plugin :determine_mime_type, analyzer: :marcel
Shrine.plugin :infer_extension
Shrine.plugin :processing
Shrine.plugin :versions
Shrine.plugin :copy
Shrine.plugin :remote_url, max_size: 50 * 1024 * 1024
Shrine.plugin :validation_helpers, default_messages: {
  max_size: ->(max) { I18n.t('errors.file.max_size', max_size: max) },
  mime_type_inclusion: ->(whitelist) { I18n.t('errors.file.mime_type_inclusion', whitelist: whitelist) }
}

class MyUploader < Shrine
  process(:store) do |io, _context|
    io.download do |download|
      raise 'Something went wrong!'

      { original: io }
    end
  end
end

ActiveRecord::Base.establish_connection(adapter: 'sqlite3', database: ':memory:')
ActiveRecord::Base.connection.create_table(:posts) { |t| t.text :image_data }

class Post < ActiveRecord::Base
  include MyUploader::Attachment.new(:image)
end

post = Post.new(image: Down.download('https://dummyimage.com/600x400/000/fff'))

begin
  Post.transaction do
    post.save
  end
rescue StandardError
  puts post.id
  post.image
end

I rescued there just to check that the post has actually been persisted to the DB and has an uploader in post.image
This happens because finalization of the upload happens on an after_commit callback here. So transaction is already commited when it tries to generate and upload versions.
Maybe this could be fixed if different callbacks are used. Perhaps after_save and after_update. I tested that locally and it worked for my code snippet, not sure if that conflicts with another plugin or part of the code.

I have been doing something like this in my code to work around this. This way the exception is raised before the transaction is commited.

Post.transaction do
  post.save
  post_attacher = post.image_attacher
  post_attacher.finalize if post_attacher.changed?
end

Not sure if there is already a better way to do this. Let me know what you think.

System configuration

Ruby version: 2.5.3

Shrine version: 2.16.0

Shrine keeps promotion (which in this case includes processing versions) outside of a DB transaction on purpose. DB transactions should be kept open for as short a time as possible, because transactions that do things take out locks, so leaving them open for periods of time can cause other queries to stop executing. Promotion can take a long time; first the original cached file is downloaded, then processing is performed, and finally each of the processed files are uploaded. Depending on the speed of processing and size of files this can take a long time.

Another reason Shrine keeps promotion outside of a DB transaction is to make backgrounding easily pluggable. If you want to delay promotion into a background job, the background job needs to be spawned after the transaction commits. Otherwise it can happen that the background job starts up really early and cannot yet find the record in the DB, because the transaction inside which the records was created hasn't been committed yet. Having promotion triggered outside of the transaction allows the backgrounding plugin to simply override the Attacher#_promote method and spawn a background job instead.

While promotion is outside of the transaction that saves the model initially, I think it should never be triggered if the transaction was aborted. @mcelicalderon , are you saying you're seeing promotion happen even if the transaction was aborted? If so, that could be a bug or mis-design?

As I understand it, one of the main reasons to do it outside of the initial transaction, is to make sure it is done only when the transaction actually succeeds/is committed. If you were doing it inside the transaction, you could 'promote' the file to the 'store' location, then the transaction could be aborted, then the 'orphaned' file in store location would still be there, not actually pointed to by any db records.

But you seem to be seeing something different? I'm a bit confused what's going on.

Another thing to be clear about is that a file is written to the cache location on assignment even before save. So it will be there in cache location, even if the transaction was aborted (or even if the model was never saved). But the file should never be in store location unless the model was saved and the transaction was committed.

On the one hand you say:

So transaction is already commited when it tries to generate and upload versions.

That is expected.

but then you also suggest you are seeing things you don't expect even if a transaction is aborted. That part is confusing me a bit, and your reproduction example isn't clarifying it for me.

Ah wait, I think I understand -- you expect your raise in process(:store) to be able to abort the transaction? Yeah, that can't happen. Because instead, the design is ensuring that process(:store) can only happen if the original transaction was committed. Instead of allowing "promotion" to abort the original transaction, the design ensures that promotion will never be triggered from an aborted transaction. I don't think it's possible to do both, and shrine chooses the latter, to be able to ensure that there will never be "orphaned" files in the store storage.

If transaction is rolled back, no trace of it should remain on the DB.

That is quite true. It's just that your example code wasn't aborting the transaction.

I think you have a use case though, that we should probably help you figure out how to accomplish. Without talking about how it relates to transactions necessarily, what are you actually trying to do? Based on what condition, you want what to happen? This stuff gets so confusing to talk about because it can be so abstract, if you can give us your very specific use case, like what actual condition you want to trigger on, to do what, it might be easier to talk about.

Of course it couldn't be that simple, I completely overlooked other plugins. So, I see what you guys are saying. My workaround is working for now anyway, as I force versions to be processed and uploaded before committing the transaction just in case something goes wrong (should be common as file processing has many ways to fail). I do have to be careful with orphan files in store because of that, but exceptions might raise during processing and not after that in my scenario.

Anyway, if not simple, there might still be a way to handle this, some way to opt-in a different way to process files if the scenario requires entity creation to be fully atomic. Perhaps another plugin which saves the state (maybe even in the DB's JSON field) of the attached field if the entity already existed on the DB so it can be rolled back in case the after_commit callback fails and even delete that instance from the DB if it was a new one (even if backgrounding). Just an idea, of course it would require a lot of extra considerations, I'll try to explore this idea further.

So, just to be clear on the scenario I'm facing @jrochkind , my main problem is particularly with the versions plugin, as I expect a model's field never to exist without versions, so if for some reason version processing and creation fails the error should be thrown and no trace of the record should remain. Or if I update the record that lets say already had versions, if the new file I upload fails during processing, it will leave the previously correct entity (with versions) in an invalid state where using the code example, calling post.image should always return a hash instead of an uploader as it happens when promotion from cache to store fails for some reason.

Another alternative might be to do processing of the versions on cache, so promoting the files would just require thee files to be moved from cache to store, like this

class MyUploader < Shrine
  process(:cache) do |io, _context|
    { original: io }
  end
end

And this actually just worked, the only problem I'm getting here is with the validation plugin, as it doesn't expect a hash here. But removing the validations just works, and like that I could do processing even before thinking about touching the DB. For validation, perhaps the uploaded file could be checked before the file is moved into the cache storage where the processing would be happening? Not sure about that.

Well, let me know what you think. I'll drop it if it doesn't really make sense for the gem :D, anyway, thank you for looking into it.

To me orphaned files that were "promoted" even though the model wasn't saved refering to them -- is a worse problem.

As I understand it, the model of shrine, with the two stage cache->store (called "promotion"), is that you could see a file at any time that isn't "promoted" yet, and it's your app's responsibility to realize it could have these "in progress" files. Normally you can check to see if you have one by checking post.image.stored? -- if false, it hasn't been stored yet. If an exception was raised when "promoting", it might never be "stored?", it's up to you to notice/log/etc.

The versions plugin definitely makes this all even more confusing. Because it makes the API of post.image change on promotion. I think the shrine maintaineres are a bit down on the versions plugin, thinking retrospetively that there is a better way to do this, and hoping to come up with one.

But if you want to force the promotion to happen in your transaction, it might be possible to do so. I am not sure if this would work:

  Post.transaction do
    post.save
    post.image.promote if post.image
  end

Yes, I'm currently doing

Post.transaction do
  post.save
  post_attacher = post.image_attacher
  post_attacher.finalize if post_attacher.changed?
end

to force the processing of versions and it's working (just copied what the AR callback does on commit).

Yes, I could check post.image_attacher.stored? and know that something went wrong. But how would that be handled for updates? Previous value is already gone from the DB, no way to rollback. I could find those records, but I would still loose those that might have been valid once if I remove them or something like that.

And you are right, orphan files on the store storage are worse. But those on the cache storage not that much as we can do this. So, allowing to process(:cache) might only leave orphan files in cache if something else goes wrong after versions have been created. That could even be done outside of the transaction, and I would actually stop needing the transaction if my only or main concern was something failing during processing, the only thing left that could fail would be promoting the already created versions (network error or something).

So still unsure, but I think there might be a way to save a previous state on the DB (none if it is a new record), so during promotion we can rollback to the previous version or delete the record if there was none yet. That would even work with backgrounding, could work as a plugin maybe? So, not a transaction, but would feel like one as would take care of leaving the DB as it was before trying to promote a new file in case of failure (might do something like rescue StandardError, manual rollback and then raise the error again). Kind of like the remove_invalid plugin.

Still, just throwing ideas out there, I see the value on having some way to do transactional uploads, not sure if that also makes sense to you.

So this is clearly not a bug, so, should I close the issue? I would still love to get feedback if you think this is something that might work for the gem.

Closing this one as it's not really an issue. I'll still keep thinking about transactional updates specially and post updates here if a clearer idea comes up. Thank you for the feedback.