Javascript part allocates the entire file in memory at stream creation
jespersh opened this issue · comments
Describe the bug
I'm trying to create a "chunk" stream for System.Net.Http.StreamContent
without the browser allocating GBs of memory for the entire file. Sending native in browser doesn't have this behavior and testing with a console application neither have this behavior.
The native test:
HttpClient httpClient = new HttpClient();
httpClient.BaseAddress = new Uri("https://localhost:5001/");
using (FileStream fs = File.Open("D:\\bigGBtest.zip", FileMode.Open, FileAccess.Read))
{
using (var formData = new MultipartFormDataContent())
{
var streamContent = new StreamContent(fs);
streamContent.Headers.ContentDisposition = new System.Net.Http.Headers.ContentDispositionHeaderValue("form-data") { Name = "file", FileName = "Test.zip" };
streamContent.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/octet-stream");
formData.Add(streamContent);
var resp = await httpClient.PostAsync("api/v2/version/upload", formData);
}
}
To Reproduce
Any of these with a multi GB file allocates the entire file into memory:
Using CreateMemoryStreamAsync
var file = (await fileReaderService.CreateReference(fileInputElement).EnumerateFilesAsync()).FirstOrDefault();
await using (var fileStream = await file.CreateMemoryStreamAsync(65536))
{ // Browser memory shoots up after CreateMemoryStreamAsync
byte[] buffer = new byte[1000];
await fileStream.ReadAsync(buffer, 0, 1000);
}
Using OpenReadAsync:
var file = (await fileReaderService.CreateReference(fileInputElement).EnumerateFilesAsync()).FirstOrDefault();
await using (var fileStream = await file.OpenReadAsync())
{ // Browser memory shoots up after OpenReadAsync
byte[] buffer = new byte[1000];
await fileStream.ReadAsync(buffer, 0, 1000);
}
Expected behavior
The call to ReadAsync
decides how much memory is allocated
Project type
Client-side/CSB
Environment
Browser: new Edge with Chromium
BlazorFileReader: 1.5.0.20109
.net SDK: 3.1.301
.net host: 3.1.5
Additional context
A possible fix could be this: https://stackoverflow.com/a/28318964
This is a regression introduced in this commit. I think I ran a memory analysis and everything, but either I failed to recognize this error, or chrome has changed the way the buffer is allocated.
in any case, a glaring bug, should be easy to fix. Nice catch.
@jespersh Please let me know if you have the time to test and give feedback on this.
On a 800Mb file I've measured a ~80-100Mb bump in ram usage which I attribute to the slow, single-threaded GC, rather than anything I can do better.
I guess its always possible to do better, this is a tight loop. But one thing is for sure, going away from the model that caused this bug, it's a 500% slow-down. Not very noticable on small files, but quite painful to go from 1sec to 7sec for a 800mb file.
I'll try to dig a bit into this as soon as I can, but I am wondering if you could reuse the FileReader
between reads since I already called OpenRead
, so it can be expected one would keep it alive for some time.
How big are your read chunks? I'd test with ~32KB
Finally got the time to make some tests.
My experiments show that FileReader instanciation is basically free, no impact what so ever. It's probably cached. Chunk size has a huge impact on speed, and a small impact on ram usage. No matter the chunk size, ram usage is 80-150MB over rest during the process. I`m testing with a 800MB file. Using a chunk size of 82KB -> 16seconds, 330KB -> 5s.
So my conclusion basically what is costly here is the asynchronous callback, which I to my knowledge have no way of avoiding. I could possibly implement a second level of buffering that could be configured somehow, but I' stopping here for now in favour of other features.