NICTA / scoobi

A Scala productivity framework for Hadoop.

Home Page:http://nicta.github.com/scoobi/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DList#materialise creates file handles eagerly, not lazily

blever opened this issue · comments

When DList#materialize is used, SequenceFileReader objects are created eagerly for every part file in the underlying bridge store. This becomes a problem if the number of part files exceeds the file descriptor limit set on the client machine. In such a case, accessing the resulting DObject can result in a HDFS MissingBlockException (even though the block is present).

To avoid this problem, it should be possible to refactor BridgeStoreIterator to create SequenceFileReader objects lazily, ensuring that once each part file is iterated over, its associated SequenceFileReader, and underlying file descriptor, is released.