DList#materialise creates file handles eagerly, not lazily
blever opened this issue · comments
When DList#materialize
is used, SequenceFileReader
objects are created eagerly for every part file in the underlying bridge store. This becomes a problem if the number of part files exceeds the file descriptor limit set on the client machine. In such a case, accessing the resulting DObject
can result in a HDFS MissingBlockException
(even though the block is present).
To avoid this problem, it should be possible to refactor BridgeStoreIterator
to create SequenceFileReader
objects lazily, ensuring that once each part file is iterated over, its associated SequenceFileReader
, and underlying file descriptor, is released.