Coding with python: How to connect to hdfs using pyarrow in python

pyarrow.hdfs.connect(host='default', port=0, user=None, kerb_ticket=None, driver='libhdfs', extra_conf=None)

you have to be sure that libhdfs.so is in $HADOOP_HOME/lib/native as well as in $ARROW_LIBHDFS_DIR.

For HADOOP:

bash-3.2$ ls $ARROW_LIBHDFS_DIR
examples libhadoop.so.1.0.0 libhdfs.a libnativetask.a
libhadoop.a libhadooppipes.a libhdfs.so libnativetask.so
libhadoop.so libhadooputils.a libhdfs.so.0.0.0 libnativetask.so.1.0.0

The last version as i know is `Hadoop 3.2.0`

You can load any native shared library using DistributedCache for distributing and symlinking the library files.

This example shows you how to distribute a shared library, mylib.so, and load it from a MapReduce task.

First copy the library to the HDFS: bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1
The job launching program should contain the following:

DistributedCache.createSymlink(conf); DistributedCache.addCacheFile("hdfs://host:port/libraries/mylib.so. 1#mylib.so", conf);
The MapReduce task can contain: System.loadLibrary("mylib.so");

Note: If you downloaded or built the native hadoop library, you don’t need to use DistibutedCache to make the library available to your MapReduce tasks.

Monday, 24 June 2019

How to connect to hdfs using pyarrow in python

The last version as i know is `Hadoop 3.2.0`

No comments:

Post a Comment

Monday, 24 June 2019

How to connect to hdfs using pyarrow in python

The last version as i know is Hadoop 3.2.0

No comments:

Post a Comment

The last version as i know is `Hadoop 3.2.0`