Saturday, 3 January 2015

What happens to a mapreduce job if the user sets the number of reduce tasks as one ?

When the number of reduce tasks is set to one, only one reduce task will be executed for the entire jobs. All the intermediate map outputs will be gathered by a single reducer. The single reducer processes the entire map outputs and the output will be stored in a single file in hdfs. It will be with the name part-r-00000.
For setting the number of reduce tasks as one, add the following property in the driver class.
job.setNumReduceTasks(1);

No comments:

Post a Comment

How to check the memory utilization of cluster nodes in a Kubernetes Cluster ?

 The memory and CPU utilization of a Kubernetes cluster can be checked by using the following command. kubectl top nodes The above command...