shrinerb / shrine

File Attachment toolkit for Ruby applications

Home Page:https://shrinerb.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: Corruption of downloaded files when using `net-protocol v0.2.0`

tmimura39 opened this issue · comments

Report

There is a problem with corruption when downloading files that have been uploaded to S3.
We have not been able to identify the types of files that reproduce, but PDF files have a high probability of reproduction.
Corrupted PDF files will not be able to be opened with common PDF Viewer.

Expected Behavior

You can download the same file you uploaded.

Actual Behavior

A corrupted file that is different from the uploaded file will be downloaded.

Steps to Reproduce the Problem

# Upload some file.
original_file = File.open('sample.pdf')
uploaded_file = Shrine.upload(original_file, :store)

# Download that file.
download_file = uploaded_file.download

# Differing file contents.
File.read(original_file) == File.read(download_file)

Ruby Version

Ruby 3.1.2

Shrine Version

3.4.0

Anything else?

#609 as well as net-protocol v0.2.0 I think this is an effect of ruby/net-protocol@781e400.

AWS S3 is used for storage.
Other storage has not been tested.

I have this too after upgrading to net-protocol 0.2.0. Pinning to net-protocol "< 0.2.0"" resolved problem for me (for now, until it's fixed otherwise).

(You say 0.2.2, but I think net-protocol 0.2.0 released Dec 5 is actually the latest release of net-protocol?)

It is a tricky one, because there are no exceptions, just corrupted files.

You say 0.2.2, but I think net-protocol 0.2.0 released Dec 5 is actually the latest release of net-protocol?

oh, sorry.
That's right. My mistake in stating this.
I have corrected the description.

Thanks.

👋 author of ruby/net-protocol@781e400 here. I'd like to reproduce to investigate but help would be welcome.

Is there a way to reproduce that doesn't require AWS credentials?

Thank you casperisfine 👍

Is there a way to reproduce that doesn't require AWS credentials?

Sorry, I have currently only found a way to reproduce this using AWS S3.
We will report additional information as soon as we find other simple ways to reproduce it.

@casperisfine You can see #609 (comment) where I identified which change was most likely breaking for Shrine. TL;DR: Down::ChunkedIO that Shrine's S3 uses relies on streamed chunks to all be different string objects, but in net-protocol 0.2.0 they're reused.

Ok, I managed to repro using fakes3 and a monkey patch:

require 'shrine'
require 'shrine/storage/s3'

module AWSPatch
  def start_session(endpoint)
    if endpoint == "https://shrine-debug.s3.us-east1.amazonaws.com"
      super("http://localhost:4567") # using fakes3
    else
      super
    end
  end
end

Seahorse::Client::NetHttp::ConnectionPool.prepend(AWSPatch)

Shrine.storages = {
  store: Shrine::Storage::S3.new(
    bucket: "shrine-debug",
    region: "us-east1",
    access_key_id: "FAKE",
    secret_access_key: "FAKE",
  ),
}
original_file = File.open('/Users/byroot/Downloads/protocol-repro.pdf')
uploaded_file = Shrine.upload(original_file, :store)

# Download that file.
download_file = uploaded_file.download

# Differing file contents.
p File.read(original_file) == File.read(download_file)

I'll dig more. And thanks for the pointers @janko. If necessary we'll revert the net-protocol changes, but hopefully we find a way to fix this without losing the extra perf.

Hum, actually I think fakes3 just wasn't working correctly, now using s3rver it works. That said I don't go through TLS which might be part of why it doesn't repro.

I guess I'll have to setup some S3 bucket.

I will investigate how difficult it would be to change Down::ChunkedIO to handle chunks objects being reused. That's probably something I shouldn't have relied on in the first place 🙂

So I have a fix for this in net-protocol, I'll make sure a new version is cut ASAP and that Ruby 3.2.0 ships with a working version: ruby/net-protocol#19

Version 0.2.1 of net-protocol was just released with the fix by @casperisfine, which resolves this issue 🤘🏻

Thank you very much @casperisfine and @janko , I appreciate your work!

I wonder if shrine (or down) should add "net-protocol", "!= 0.2.0" to it's dependencies, to save people the confusion of accidentally using the buggy release, especially because it ships with ruby 3.1. (It's annoying because a "corrupt bytes" problem is in my experience especially unlikely to be covered by one's tests in depending project, and is confusing to debug).

But of course that would only apply to new shrine/down release anyway, so might just make bundle update more confusing, not sure. Just thought I'd bring it up for consideration.

especially because it ships with ruby 3.1

According to stdgems.org, the version of net-protocol shipped with Ruby 3.1 is 0.1.2.

But of course that would only apply to new shrine/down release anyway, so might just make bundle update more confusing, not sure. Just thought I'd bring it up for consideration.

Yeah, that was my initial thought as well, but Shrine technically doesn't depend on net-protocol, in this case it was aws-sdk-s3 being used in tandem with Down, where aws-sdk-s3's usage of net-http (and thus net-protocol) is an implementation detail, tomorrow it could theoretically switch to http.rb. So, I don't see a place where we can warn about this.

So, I don't see a place where we can warn about this

You could warn when instantiating the S3 store. e.g.: warn "blah blah" if Net::Protocol::VERSION == "0.2.0".

I was still considering if the actual gemspec requirement would still be good (although I was going to keep it to myself), but @casperisfine's idea seems pretty good to me. To be extra safe (to say hypothetical future AWS gems that don't depend on Net::Protocol at all so it's not in dependencies anymore) if defined?(Net::Protocol::VERSION) && Net::Protocol::VERSION == "0.2.0"

Although just a log warning is easy to miss, at least it's something. (I forget if stderr will go to console in eg Rails rspec). I think I would have deployed the corrupting code to production if I hadn't happened to notice the issues filed here.