Skip to content Skip to sidebar Skip to footer

Error: Java.lang.runtimeexception: Pipemapred.waitoutputthreads(): Subprocess Failed With Code 1, Worked Perfectly On Local

I have googled this error on each and every forum but no luck. I have got the error written below: 18/08/29 00:24:53 INFO mapreduce.Job: map 0% reduce 0% 18/08/29 00:24:59 INFO ma

Solution 1:

If your issue is not about python libraries or code problem, it might be about python file comments (first lines) and your OS.

For me, on MAC OS, after installing locally HADOOP with this tutorial : tuto Python mapper/reducer didn't execute well. Errors : java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 or java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127

My configuration :

  • I use HADOOP 3.2.1_1
  • with Python 3.7.6,
  • on macOS Mojave 10.14.6
  • I have installed JAVA version of tutorial (adoptopenjdk8) : "1.8.0_252"

To launch your job with python, I use new command : mapred streaming instead of hadoop jar /xxx/hadoop-mapreduce/hadoop-streaming-xxx.jar form Hadoop documentation (be careful, I think this doc is not good about examples with generic options (deprecated: -file, new: -files)

I found two possibilities :

  1. Keep python file untouched with first line : # -*-coding:utf-8 -*

Only this command works for me :

mapred streaming -files WordCountMapper.py,WordCountReducer.py \
-input /data/input/README.TXT \
-output /data/output \
-mapper "python WordCountMapper.py" \
-reducer "python WordCountReducer.py"

assuming that I want to count words of /data/input/README.TXT already copied in my HDFS volume (hadoop fs -copyFromLocal /absolute-local-folder/data/input/README.TXT /data/input), with local Python files WordCountMapper.py & WordCountReducer.py

Code for WordCountMapper.py :

#!/usr/bin/python# -*-coding:utf-8 -*
import sys

for line in sys.stdin:
    # Supprimer les espaces
    line = line.strip()
    # recupérer les mots
    words = line.split()

    # operation map, pour chaque mot, generer la paire (mot, 1)for word in words:
        print("%s\t%d" % (word, 1))

Code for WordCountReducer.py :

#!/usr/bin/python# -*-coding:utf-8 -*import sys
total = 0
lastword = Nonefor line in sys.stdin:
    line = line.strip()

    # recuperer la cle et la valeur et conversion de la valeur en int
    word, count = line.split()
    count = int(count)

    # passage au mot suivant (plusieurs cles possibles pour une même exécution de programme)if lastword isNone:
        lastword = word
    if word == lastword:
        total += count
    else:
        print("%s\t%d occurences" % (lastword, total))
        total = count
        lastword = word

if lastword isnotNone:
    print("%s\t%d occurences" % (lastword, total))
  1. Edit python files for execution :

2.1. Add execution mode to python files :

chmod +x WordCountMapper.py

chmod +x WordCountReducer.py

2.2. Add have 2 lines at first :

first line :  `#!/usr/bin/python` 

second line : `# -*-coding:utf-8 -*`

Use this command :

mapred streaming -files WordCountMapper.py,WordCountReducer.py \
-input /data/input/README.TXT \-output /data/output \-mapper ./WordCountMapper.py \-reducer ./WordCountReducer.py

Solution 2:

I checked the job error logs and placed the required python files which are not predefined libraries into the python directory. Then, enter the Hadoop streaming command with those python files:

hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=0 -file /home/mapper3.py -mapper mapper3.py -file /home/reducer3.py -reducer reducer3.py -file /home/ErrorHandle.py -file /home/ExceptionUtil.py -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt

Post a Comment for "Error: Java.lang.runtimeexception: Pipemapred.waitoutputthreads(): Subprocess Failed With Code 1, Worked Perfectly On Local"