Is It Possible To Save Files In Hadoop Without Saving Them In Local File System?
Solution 1:
Hadoop has REST APIs that allow you to create files via WebHDFS.
So you could write your own create
based on the REST APIs using a python library like requests
for doing the HTTP. However, there are also several python libraries that support Hadoop/HDFS and already use the REST APIs or that use the RPC mechanism via libhdfs
.
- pydoop
- hadoopy
- snakebite
- pywebhdfs
- hdfscli
- pyarrow
Just make sure you look for how to create a file rather than having the python library call hdfs dfs -put
or hadoop fs -put
.
See the following for more information:
- pydoop vs hadoopy - hadoop python client
- List all files in HDFS Python without pydoop
- A Guide to Python Frameworks for Hadoop
- Native Hadoop file system (HDFS) connectivity in Python
- PyArrow
- https://github.com/pywebhdfs/pywebhdfs
- https://github.com/spotify/snakebite
- https://crs4.github.io/pydoop/api_docs/hdfs_api.html
- https://hdfscli.readthedocs.io/en/latest/
- WebHDFS REST API:Create and Write to a File
Solution 2:
Here's how to download a file directly to HDFS with Pydoop:
import os
import requests
import pydoop.hdfs as hdfs
defdl_to_hdfs(url, hdfs_path):
r = requests.get(url, stream=True)
with hdfs.open(hdfs_path, 'w') as f:
for chunk in r.iter_content(chunk_size=1024):
f.write(chunk)
URL = "https://www.python.org/ftp/python/3.7.0/Python-3.7.0.tar.xz"
dl_to_hdfs(URL, os.path.basename(URL))
The above snippet works for a generic URL. If you already have the file as a Django UploadedFile
, you can probably use its .chunks
method to iterate through the data.
Solution 3:
Python is installed in your Linux. It can access only local files. It cannot directly access files in HDFS.
In order to save/put the files directly to HDFS, you need to use any of these below:
Spark: Use Dstream for streaming files
Kafka: matter of setting up configuration file. Best for streaming data.
Flume: set up configuration file. Best for static files.
Post a Comment for "Is It Possible To Save Files In Hadoop Without Saving Them In Local File System?"