Tewr / BlazorFileReader

Library for creating read-only file streams from file input elements or drop targets in Blazor.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Javascript part allocates the entire file in memory at stream creation

jespersh opened this issue · comments

Describe the bug
I'm trying to create a "chunk" stream for System.Net.Http.StreamContent without the browser allocating GBs of memory for the entire file. Sending native in browser doesn't have this behavior and testing with a console application neither have this behavior.

The native test:

HttpClient httpClient = new HttpClient();
httpClient.BaseAddress = new Uri("https://localhost:5001/");
using (FileStream fs = File.Open("D:\\bigGBtest.zip", FileMode.Open, FileAccess.Read))
{
    using (var formData = new MultipartFormDataContent())
    {
        var streamContent = new StreamContent(fs);
        streamContent.Headers.ContentDisposition = new System.Net.Http.Headers.ContentDispositionHeaderValue("form-data") { Name = "file", FileName = "Test.zip" };
        streamContent.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/octet-stream");
        formData.Add(streamContent);
        var resp = await httpClient.PostAsync("api/v2/version/upload", formData);
    }
}

To Reproduce
Any of these with a multi GB file allocates the entire file into memory:
Using CreateMemoryStreamAsync

var file = (await fileReaderService.CreateReference(fileInputElement).EnumerateFilesAsync()).FirstOrDefault();
await using (var fileStream = await file.CreateMemoryStreamAsync(65536))
{ // Browser memory shoots up after CreateMemoryStreamAsync
  byte[] buffer = new byte[1000];
  await fileStream.ReadAsync(buffer, 0, 1000);
}

Using OpenReadAsync:

var file = (await fileReaderService.CreateReference(fileInputElement).EnumerateFilesAsync()).FirstOrDefault();
await using (var fileStream = await file.OpenReadAsync())
{ // Browser memory shoots up after OpenReadAsync
  byte[] buffer = new byte[1000];
  await fileStream.ReadAsync(buffer, 0, 1000);
}

Expected behavior
The call to ReadAsync decides how much memory is allocated

Screenshots
image

Project type
Client-side/CSB

Environment
Browser: new Edge with Chromium
BlazorFileReader: 1.5.0.20109
.net SDK: 3.1.301
.net host: 3.1.5

Additional context
A possible fix could be this: https://stackoverflow.com/a/28318964

commented

This is a regression introduced in this commit. I think I ran a memory analysis and everything, but either I failed to recognize this error, or chrome has changed the way the buffer is allocated.

in any case, a glaring bug, should be easy to fix. Nice catch.

commented

@jespersh Please let me know if you have the time to test and give feedback on this.
On a 800Mb file I've measured a ~80-100Mb bump in ram usage which I attribute to the slow, single-threaded GC, rather than anything I can do better.

I guess its always possible to do better, this is a tight loop. But one thing is for sure, going away from the model that caused this bug, it's a 500% slow-down. Not very noticable on small files, but quite painful to go from 1sec to 7sec for a 800mb file.

I'll try to dig a bit into this as soon as I can, but I am wondering if you could reuse the FileReader between reads since I already called OpenRead, so it can be expected one would keep it alive for some time.

How big are your read chunks? I'd test with ~32KB

commented

Finally got the time to make some tests.

My experiments show that FileReader instanciation is basically free, no impact what so ever. It's probably cached. Chunk size has a huge impact on speed, and a small impact on ram usage. No matter the chunk size, ram usage is 80-150MB over rest during the process. I`m testing with a 800MB file. Using a chunk size of 82KB -> 16seconds, 330KB -> 5s.

So my conclusion basically what is costly here is the asynchronous callback, which I to my knowledge have no way of avoiding. I could possibly implement a second level of buffering that could be configured somehow, but I' stopping here for now in favour of other features.