Python wrapper around Hadoop's WebHDFS interface.
import webhdfs
w = webhdfs.API(prefix="http://localhost:14000/webhdfs/v1", user="webhdfs")
Returns the output of LISTSTATUS
on path
.
A generator
which, much like Linux's find
, yields files matching name
.
Returns the contents of path
. Raises a TypeError
if path
is a directory.
Returns the contents of path
but avoids reading the content into memory for large objects. Raises a TypeError
if path
is a directory.
Returns a boolean indicating whether path
exists in HDFS.
Returns a boolean indicating whether path
is a directory.
Creates a file in HDFS at path
using either the contents of file
or raw data
. Raises an IOError
if path
already exists.
Deletes a file in HDFS at path
. Optionally, if recursive
is True
, will delete all content beneath it in the hierarchy. Raises an IOError
if path
does not exist.
Returns the HDFS checksum for path
.
Copies the contents of path
to output
. If path
is a single file, this is the same as hadoop fs -get
. If path
is a directory, this is the same as hadoop fs -getmerge
.
A generator
which yields individual lines from path
. If decompress
is True
, files are decompressed en route. As with getmerge()
, path
can be a single file or a directory.