jingw / pyhdfs

Python HDFS client

Home Page:https://pyhdfs.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

client should return some info when succuessfully create a file

cosven opened this issue · comments

for example, hdfs server may return a response with headers like this

HTTP/1.1 201 Created
Location: webhdfs://<HOST>:<PORT>/<PATH>
Content-Length: 0

I want to get location from response headers, however, client.create do not return any thing.

What's the use case of this? The HDFS Java API doesn't return such information either.
https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path)

ummm...

intention: I use HDFS REST API to upload a file to hdfs, I want to get file uri on hdfs when upload finished.
For example

curl -L -X PUT http://hdfs_proxy.com/webhdfs/v1/{path}?op=CREATE -T xxx.tar.gz

hdfs web server may return a response like this

HTTP/1.1 201 Created
Location: hdfs://hdfs_name/{path}
Content-Length: 0

I want to get this file uri on hdfs -> hdfs://hdfs_name/{path}

Currently, I dont' have any idea to do this.

If you're uploading a file to HDFS at some path, don't you already know the path?

I only have the HTTP Proxy url, howerver, hdfs_name is unknown.
By the way, I do not care about hdfs_name when I upload a file.
Is it reasonable?

If I cant get file uri from response header, I will have to keep a {hdsf_name: web rest api uri} mapping in source code in order to construct the final file uri.

Maybe it would help to describe your setup and problem in more detail. I don't see the purpose of getting either the NameNode host name or the DataNode host name; the HDFS Java API doesn't provide such information either.

  • If you want to access the NameNode, you have your proxy.
  • If you want to access the DataNode, that hostname is likely to change as the cluster evolves, and you're better off getting a current hostname using the OPEN operation.

what I mean by hdfs_name is HDFS cluster name.

more detail:

I have a HTTP Proxy: proxy.in.company.com
namenode: debian01.prod.company.com
dataanode: debianxx.prod.company.com
I dont know HDFS cluster name.

so when I want to upload a file to HDFS through HTTP REST api

curl -i -L -X PUT proxy.in.company.com/webhdfs/v1/<path> -T file.tar.gz

I want to get the full file uri from this response. hdfs://<cluster_name>/<path>

Sorry, I'll be more inclined to add this if it shows up in the official HDFS client. But feel free to patch this into your own version of the code.