Undefined method *_data for nil:NilClass in plugins/entity
patrykk21 opened this issue · comments
Brief Description
I'm finally writing an issue as I wasn't able to understand and debug the reason for which this happens.
It's really hard to replicate it as well as for now it only occurred on production environment and it seems to be consistently happening to date.
If I manage to replicate the issue I promise to create a docker-compose configuration to test this on, but for now I'm not able to do it.
The issue is that the upload seems to be failing at random times in the Attacher.promote_block
.
The problematic code looks like so:
# frozen_string_literal: true
class TextUploader < Shrine
Attacher.promote_block do
cached_data = file_data
# Synchronous in order to get the file data immediately so we can store it in the DB
self.atomic_promote
# Asynchronous since we can delete the cached data eventually
Uploaders::DestroyWorker.perform_async(self.class, cached_data)
end
end
Shrine configuration
# frozen_string_literal: true
require 'oj'
require 'shrine'
require 'shrine/storage/s3'
# The `:cache` store is temporary, contrary to `:store` being permament.
# The cache store is used to upload the file before the data is persisted and finalised.
Shrine.storages =
if Rails.env.development? && ENV['DOCS_BUCKET_USE_LOCAL'] == 'true'
{
cache: Shrine::Storage::FileSystem.new('public', prefix: 'uploads/backlog'),
store: Shrine::Storage::FileSystem.new('public', prefix: 'uploads/store')
}
else
docs_bucket = Rails.application.secrets[:docs_bucket]
s3_options = {
bucket: docs_bucket[:bucket_name],
access_key_id: docs_bucket[:access_key_id],
secret_access_key: docs_bucket[:secret_access_key],
region: docs_bucket[:region],
public: true,
force_path_style: true
}
s3_options[:endpoint] = docs_bucket[:endpoint] if docs_bucket[:endpoint].present?
{
cache: Shrine::Storage::S3.new(prefix: 'backlog', **s3_options),
store: Shrine::Storage::S3.new(prefix: 'store', **s3_options)
}
end
Shrine.plugin :mongoid
Shrine.plugin :model
Shrine.plugin :restore_cached_data
Shrine.plugin :cached_attachment_data
Shrine.plugin :backgrounding
Shrine.plugin :column, serializer: Oj
The model that uploads stripped of unnecessary methods looks like so
# frozen_string_literal: true
class BinaryDocument < ApplicationModel
include Mongoid::Document
include Mongoid::Timestamps
...
include TextUploader::Attachment(:document)
DELEGATE_TO_DOCUMENT_METHODS = %i[url data storage metadata storage_key uploader read rewind].freeze
field :id, type: String, default: -> { SecureRandom.uuid }
field :owner_id, type: String
field :timestamp, type: DateTime, default: -> { ::BinaryDocument.default_timestamp }
field :document_data, type: Hash
field :asset_id, type: String
validates :id, :owner_id, :timestamp, presence: true
validates :id, length: { maximum: 255 }
before_save :set_asset_id
index({ id: 1, owner_id: 1 }, unique: true)
index(timestamp: -1)
default_scope -> { order_by(timestamp: :desc) }
delegate(*DELEGATE_TO_DOCUMENT_METHODS, to: :document)
def self.default_timestamp
DateTime.now.utc
end
def read_from_beginning
...
end
def stored_id
document&.id
end
def cached?
storage_key == :cache
end
def stored?
storage_key == :store
end
private
def set_asset_id
self.asset_id = stored_id
end
end
Stacktrace
Expected behavior
The expected behaviour is to consistently upload and in case of failed upload to return an error like "ImageDamaged" or "S3UploadFailed".
Actual behavior
The actual behaviour is that the promotion sometimes fails and it is unclear the reason why so happens.
Simplest self-contained example code to demonstrate issue
I'm sorry I haven't provided replication steps as I wasn't able to replicate it locally.
Should I create the linked template anyway?
System configuration
Ruby version:
ruby 2.5.7p206
Shrine version:
shrine (3.2.1)
shrine-mongoid (1.0.0)
LMK if I can help with more details.
Any pointers or directions on what to check/debug/try would be highly appreciated.
Thank you
a wild, wild guess, but the only thing I could think of, is that the cached files are removed before the promotions take place.
Uploaders::DestroyWorker.perform_async(self.class, cached_data) # consider removing this line and seeing if you still have the same problem?
Not a solution, but a side note: you can set a life cycle rule on your bucket (if that is feasible) so that cached files are removed after a set period of time .e.g 24 hours. That would obviate the need to run a back ground jobs to delete the cached files. the deletion of the cached files can be handled by amazon directly.
Mmm that is a good guess.
I could try a .perform_in(10.minutes, ..
or something. Will let you know if this makes our app happier.
About amazon, thank you for this tip. I was aware of this option however it really scares me the idea of not having control over it in the sense of the following scenario:
- An promotion fails for whatever reason
- I get back to work on Monday and check the issue
- I noticed amazon deleted the file and I'm not able to restore it anymore
I know I could set a retention time of weeks or something, however that would still mean I have a deadline to fix the issue.
For now I opted for this approach as it also makes me able to check for how many files are kept at any time in the cache directory of S3. If this keeps increasing it means we have a leak somewhere and we're creating files but never promoting them.
Does it make sense?
Thank you for your reply
It looks like you're nesting the Document record within the BinaryDocument record, and are including the attaching logic in BinaryDocument, and then subsequently delegating methods to Document: why is this the case? You could have all the attaching logic in the Document itself, without delegation. my concern is that shrine makes use of after commit call backs, so am unsure of the interplay between those call backs on the BinaryDocument record, and how that interacts with the record holding the actual storage data: the Document record, and the interplay with backgrounding and whether all methods that need to be delegated are in fact delegated. perhaps someone more knowledgeable in the library can comment.
If I understand correctly your message your concept would be
Model Document has one Model BinaryDocument.
However we just have BinaryDocument, plain and simple. Document is just a field in mongodb for such model in which we store the document data for later retrieval in S3.
So instead of applying a chain of binary_document.document.url
we can just do binary_document.url
.
Does this answer what you wrote?
From your previous message, I misunderstood how your BinaryDocument model works - please disregard my previous msg.
Hi there: Has this issue been resolved: were you able to determine whether this was a Shrine bug or not?
Hello, there :) Sorry for the delayed answer. Was on vacation.
We tested the .perform_in(10.minutes, ..
on production and eventually we received the same error.
Will try to recreate a sandbox environment and replicate it but it seems very hard
What does Mongoid::Document#reload
do when the underlying document has been deleted? This line in shrine-mongoid relies on Mongoid raising an exception if the document belonging to the model instance has been deleted (e.g. Active Record would raise ActiveRecord::RecordNotFound
in this case). If Mongoid happens to return nil
here instead, that would cause the error you're seeing.
This is just a guess. I would check myself, but I've since uninstalled MongoDB from my laptop to save on disk space.
It indeed returns nil
irb(main):001:0> c = Model.last
irb(main):002:0> c.delete
=> true
irb(main):003:0> c.reload
=> nil
Reason for this was that we are using a configuration of raise_not_found_error: false
in mongoid.yml
.
This was due to maintaining compatibility when switching from DynamoDB to MongoDB.
However we do not delete data, ever.
So I don't really understand the issue yet.
Will work to replicate it outside of work.
Will let you know
I really believe this is caused by raise_not_found_error: false
setting, and that some documents are indeed deleted. I looked at shrine-mongoid, but there is really no clean way to handle it other than raising an error ourselves, which probably defeats the purpose of that setting.
Since I don't currently have enough information to reproduce this bug, I will close this issue for now. Let me know if you'll have more information.