gulpjs / vinyl

Virtual file format.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cloned stream file content is not drained

sergeche opened this issue · comments

Steps to reproduce:

  1. Create file large enough to exceed highWatermark for file streams. In my case it’s 131KB file.
  2. This code works fine:
var vfs = require('vinyl-fs');

vfs.src('./large-file.txt', {buffer: false})
.pipe(vfs.dest('./dest'))
.on('end', function() {
    console.log('done!');
});

but this doesn’t, there’s no done! message and file isn’t written:

var vfs = require('vinyl-fs');
var through = require('through2');

vfs.src('./large-file.txt', {buffer: false})
.pipe(through.obj(function(file, enc, next) {
    next(null, file.clone());
}))
.pipe(vfs.dest('./dest'))
.on('end', function() {
    console.log('done!');
});

Tested on node v0.10.38 and iojs v.2.3.1

After debugging, I found out that file contents reading stops after second 64KB chunk, looks like writing stream is not able to properly drain and resume reading.

Playing with File.clone(), I’ve commented this line https://github.com/wearefractal/vinyl/blob/master/index.js#L68 and this test case worked fine.

Do you know if there are side-effects of this solution?

@sergeche That line seems extraneous, want to send a PR to remove it?

I will, but first I have to do more tests. I think the right solution might be much more complex. E.g. if I create two clones that attached to a single file read stream and read contents for one of them, will second clone restart read stream when I read from it?

The simplest effective way is createReadStream agagin: https://github.com/snowyu/abstract-file.js/blob/master/src/attributes.coffee#L55

Even PassThrough, the data would be lost if one through stream is not ready to recevie data opportunely.

contents = fs.createReadStream './README.md'
stream1 = contents.pipe(new stream.PassThrough())
stream2 = contents.pipe(new stream.PassThrough())

stream1.on 'data', (data)->console.log 's1', data.toString()
stream1.on 'end', ->
  # stream2 can not receive any data here.
  stream2.on 'data', (data)->console.log 's2', data.toString()

I did some research on this problem.

Looks like the problem is when clone file contents stream you end up with two consumers (original and cloned files) attached to a single reader. When you start reading from one of these streams, the other stream also receives data. But since that other stream is still in paused mode, it puts on pause the reader.

The quick and pretty valid solution, as @snowyu noted, is to create a new reader stream for a cloned file. The downside of this solution is that you’ll loose all piped streams from original file.

@sergeche We don't want to lose all the piped streams. What else can be done here?

@phated I think the only solution is to clone all piped streams and arrange them into a new pipeline. And the only solution to do so properly is to get access to stream constructor.

I looked over repo issues and didn’t found any reports like this. Looks like there’s not so much users who need to use streams for file reading :) So I guess simply creating a new file stream reader will be OK.

This example #55 (comment) works fine node v4 and v5, but not on v0.10 and v0.12.

@mcollina thanks, the example in the original issue works in node v5 also. I kept the technical, stream-specific discussion in the readable-stream issue.

This problem seems to be due to Streams2 implementation and is fixed with Streams3. If you upgrade to node v4 or v5, the sample from the initial issue works perfectly.