Saturday, 3 January 2015

What happens to a mapreduce job with a reduce class when we set the number of reduce tasks as zero ?

When we set the number of reduce tasks as zero, reduce tasks will not be executed. The output of the mapper will be copied to the hdfs and it will be the output of the job. Suppose 10 mappers were spawned for a job, if we set the number of reduce tasks as zero, we will get 10 output files.
The output files will be with a name similar to part-m-00000, part-m-00001 ..... part-m-00009.
We can set the number of reduce tasks as zero either from the program or from the commandline.

In the program we can set this by setting the following configuration
job.setNumReduceTasks(0);

From the  command line also we can achieve the same result by using the property below
-Dmapred.reduce.tasks=0

No comments:

Post a Comment

How to check the memory utilization of cluster nodes in a Kubernetes Cluster ?

 The memory and CPU utilization of a Kubernetes cluster can be checked by using the following command. kubectl top nodes The above command...