pyarrow.hdfs.connect(host='default', port=0, user=None, kerb_ticket=None, driver='libhdfs', extra_conf=None)
you have to be sure that
libhdfs.so
is in $HADOOP_HOME/lib/nativ
e as well as in $ARROW_LIBHDFS_DIR
.
For
HADOOP
:bash-3.2$ ls $ARROW_LIBHDFS_DIR
examples libhadoop.so.1.0.0 libhdfs.a libnativetask.a
libhadoop.a libhadooppipes.a libhdfs.so libnativetask.so
libhadoop.so libhadooputils.a libhdfs.so.0.0.0 libnativetask.so.1.0.0
The last version as i know is Hadoop 3.2.0
You can load any native shared library using DistributedCache for distributing and symlinking the library files.
This example shows you how to distribute a shared library, mylib.so, and load it from a MapReduce task.
- First copy the library to the HDFS:
bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1
- The job launching program should contain the following:DistributedCache.createSymlink(conf); DistributedCache.addCacheFile("hdfs://host:port/libraries/mylib.so. 1#mylib.so", conf);
- The MapReduce task can contain:
System.loadLibrary("mylib.so");
Note: If you downloaded or built the native hadoop library, you don’t need to use DistibutedCache to make the library available to your MapReduce tasks.
No comments:
Post a Comment
Thanks for your comments