google / mr4c

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A question about the keyspace.

opened this issue · comments

   I have run the examples. And I have a question about the kayspace. I read the description of keyspace. 
   "The keyspace is an index of unique elements in the dataset. 
    Each key refers to a particular peice of the data without having to keep track of a lot paths. This can be especially handy when we are operating on a large cluster where all of the files are not necessarily local."
   I think it means that every file has a key instead of every record in the file. Am I right?   

Yes, every file has a key. This framework wasn't designed for the standard Hadoop example of iterating thru records in a file. It was designed to support working with binary files, such as images.