Error: Java.lang.runtimeexception: Pipemapred.waitoutputthreads(): Subprocess Failed With Code 1, Worked Perfectly On Local
Solution 1:
If your issue is not about python libraries or code problem, it might be about python file comments (first lines) and your OS.
For me, on MAC OS, after installing locally HADOOP with this tutorial : tuto
Python mapper/reducer didn't execute well.
Errors :
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
or
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
My configuration :
- I use HADOOP 3.2.1_1
- with Python 3.7.6,
- on macOS Mojave 10.14.6
- I have installed JAVA version of tutorial (adoptopenjdk8) : "1.8.0_252"
To launch your job with python, I use new command : mapred streaming
instead of hadoop jar /xxx/hadoop-mapreduce/hadoop-streaming-xxx.jar
form Hadoop documentation
(be careful, I think this doc is not good about examples with generic options (deprecated: -file, new: -files)
I found two possibilities :
- Keep python file untouched with first line :
# -*-coding:utf-8 -*
Only this command works for me :
mapred streaming -files WordCountMapper.py,WordCountReducer.py \
-input /data/input/README.TXT \
-output /data/output \
-mapper "python WordCountMapper.py" \
-reducer "python WordCountReducer.py"
assuming that I want to count words of /data/input/README.TXT
already copied in my HDFS volume (hadoop fs -copyFromLocal /absolute-local-folder/data/input/README.TXT /data/input
), with local Python files WordCountMapper.py
& WordCountReducer.py
Code for WordCountMapper.py :
#!/usr/bin/python# -*-coding:utf-8 -*
import sys
for line in sys.stdin:
# Supprimer les espaces
line = line.strip()
# recupérer les mots
words = line.split()
# operation map, pour chaque mot, generer la paire (mot, 1)for word in words:
print("%s\t%d" % (word, 1))
Code for WordCountReducer.py :
#!/usr/bin/python# -*-coding:utf-8 -*import sys
total = 0
lastword = Nonefor line in sys.stdin:
line = line.strip()
# recuperer la cle et la valeur et conversion de la valeur en int
word, count = line.split()
count = int(count)
# passage au mot suivant (plusieurs cles possibles pour une même exécution de programme)if lastword isNone:
lastword = word
if word == lastword:
total += count
else:
print("%s\t%d occurences" % (lastword, total))
total = count
lastword = word
if lastword isnotNone:
print("%s\t%d occurences" % (lastword, total))
- Edit python files for execution :
2.1. Add execution mode to python files :
chmod +x WordCountMapper.py
chmod +x WordCountReducer.py
2.2. Add have 2 lines at first :
first line : `#!/usr/bin/python`
second line : `# -*-coding:utf-8 -*`
Use this command :
mapred streaming -files WordCountMapper.py,WordCountReducer.py \
-input /data/input/README.TXT \-output /data/output \-mapper ./WordCountMapper.py \-reducer ./WordCountReducer.py
Solution 2:
I checked the job error logs and placed the required python files which are not predefined libraries into the python directory. Then, enter the Hadoop streaming command with those python files:
hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=0 -file /home/mapper3.py -mapper mapper3.py -file /home/reducer3.py -reducer reducer3.py -file /home/ErrorHandle.py -file /home/ExceptionUtil.py -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt
Post a Comment for "Error: Java.lang.runtimeexception: Pipemapred.waitoutputthreads(): Subprocess Failed With Code 1, Worked Perfectly On Local"