jingw / pyhdfs

Python HDFS client

Home Page:https://pyhdfs.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is it possible to support 'getmerge'?

Yodeser opened this issue · comments

Such as title.
getmerge: download all the files in the path and aggregate them into one.

Hi, the getmerge command isn't anything special. It just loops through all the files and copies them. It's a client-side feature, not a server-side RPC. There's no WebHDFS endpoint for it.
https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java#L101

You can implement this yourself with something along the lines of

with open('dest', 'w') as f:
    with hdfs.open('thing') as f2:
        f.write(f2.read())