-->

Welcome to our Coding with python Page!!! hier you find various code with PHP, Python, AI, Cyber, etc ... Electricity, Energy, Nuclear Power

Monday, 24 June 2019

How to connect to hdfs using pyarrow in python

pyarrow.hdfs.connect(host='default', port=0, user=None, kerb_ticket=None, driver='libhdfs', extra_conf=None)
you have to be sure that libhdfs.so is in $HADOOP_HOME/lib/native as well as in $ARROW_LIBHDFS_DIR.
For HADOOP:
bash-3.2$ ls $ARROW_LIBHDFS_DIR
examples libhadoop.so.1.0.0 libhdfs.a libnativetask.a
libhadoop.a libhadooppipes.a libhdfs.so libnativetask.so
libhadoop.so libhadooputils.a libhdfs.so.0.0.0 libnativetask.so.1.0.0

The last version as i know is Hadoop 3.2.0

You can load any native shared library using DistributedCache for distributing and symlinking the library files.
This example shows you how to distribute a shared library, mylib.so, and load it from a MapReduce task.
  1. First copy the library to the HDFS: bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1
  2. The job launching program should contain the following:
    DistributedCache.createSymlink(conf); DistributedCache.addCacheFile("hdfs://host:port/libraries/mylib.so. 1#mylib.so", conf);
  3. The MapReduce task can contain: System.loadLibrary("mylib.so");
Note: If you downloaded or built the native hadoop library, you don’t need to use DistibutedCache to make the library available to your MapReduce tasks.

No comments:

Post a Comment

Thanks for your comments

Rank

seo