Javascript part allocates the entire file in memory at stream creation

Question

Javascript part allocates the entire file in memory at stream creation

jespersh opened this issue 4 years ago · comments

Describe the bug
I'm trying to create a "chunk" stream for System.Net.Http.StreamContent without the browser allocating GBs of memory for the entire file. Sending native in browser doesn't have this behavior and testing with a console application neither have this behavior.

The native test:

HttpClient httpClient = new HttpClient();
httpClient.BaseAddress = new Uri("https://localhost:5001/");
using (FileStream fs = File.Open("D:\\bigGBtest.zip", FileMode.Open, FileAccess.Read))
{
    using (var formData = new MultipartFormDataContent())
    {
        var streamContent = new StreamContent(fs);
        streamContent.Headers.ContentDisposition = new System.Net.Http.Headers.ContentDispositionHeaderValue("form-data") { Name = "file", FileName = "Test.zip" };
        streamContent.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/octet-stream");
        formData.Add(streamContent);
        var resp = await httpClient.PostAsync("api/v2/version/upload", formData);
    }
}

To Reproduce
Any of these with a multi GB file allocates the entire file into memory:
Using CreateMemoryStreamAsync

var file = (await fileReaderService.CreateReference(fileInputElement).EnumerateFilesAsync()).FirstOrDefault();
await using (var fileStream = await file.CreateMemoryStreamAsync(65536))
{ // Browser memory shoots up after CreateMemoryStreamAsync
  byte[] buffer = new byte[1000];
  await fileStream.ReadAsync(buffer, 0, 1000);
}

Using OpenReadAsync:

var file = (await fileReaderService.CreateReference(fileInputElement).EnumerateFilesAsync()).FirstOrDefault();
await using (var fileStream = await file.OpenReadAsync())
{ // Browser memory shoots up after OpenReadAsync
  byte[] buffer = new byte[1000];
  await fileStream.ReadAsync(buffer, 0, 1000);
}

Expected behavior
The call to ReadAsync decides how much memory is allocated

Screenshots

Project type
Client-side/CSB

Environment
Browser: new Edge with Chromium
BlazorFileReader: 1.5.0.20109
.net SDK: 3.1.301
.net host: 3.1.5

Additional context
A possible fix could be this: https://stackoverflow.com/a/28318964

Tewr · Answer 1 · Sat Jun 13 2020 21:31:50 GMT+0800 (China Standard Time)

This is a regression introduced in this commit. I think I ran a memory analysis and everything, but either I failed to recognize this error, or chrome has changed the way the buffer is allocated.

in any case, a glaring bug, should be easy to fix. Nice catch.

Tewr · Answer 2 · Mon Jun 15 2020 15:43:21 GMT+0800 (China Standard Time)

@jespersh Please let me know if you have the time to test and give feedback on this.
On a 800Mb file I've measured a ~80-100Mb bump in ram usage which I attribute to the slow, single-threaded GC, rather than anything I can do better.

I guess its always possible to do better, this is a tight loop. But one thing is for sure, going away from the model that caused this bug, it's a 500% slow-down. Not very noticable on small files, but quite painful to go from 1sec to 7sec for a 800mb file.

Jesper Hansen · Answer 3 · Mon Jun 15 2020 16:36:19 GMT+0800 (China Standard Time)

I'll try to dig a bit into this as soon as I can, but I am wondering if you could reuse the FileReader between reads since I already called OpenRead, so it can be expected one would keep it alive for some time.

How big are your read chunks? I'd test with ~32KB

Tewr · Answer 4 · Wed Jun 17 2020 04:26:37 GMT+0800 (China Standard Time)

Finally got the time to make some tests.

My experiments show that FileReader instanciation is basically free, no impact what so ever. It's probably cached. Chunk size has a huge impact on speed, and a small impact on ram usage. No matter the chunk size, ram usage is 80-150MB over rest during the process. I`m testing with a 800MB file. Using a chunk size of 82KB -> 16seconds, 330KB -> 5s.

So my conclusion basically what is costly here is the asynchronous callback, which I to my knowledge have no way of avoiding. I could possibly implement a second level of buffering that could be configured somehow, but I' stopping here for now in favour of other features.