Saturday, 3 January 2015

When will the reduce tasks start execution in a mapreduce job.?

Reduce tasks will be started only after the completion of all map tasks. Sometimes we may see the progression of reduce tasks as 15% and the map tasks as 90%. This doesn't means that the reduce tasks started execution. The reduce logic has to be executed on the complete output of all the mappers, not on the partial output. 

The reduce() method will not be called until all of the mappers have completed. If the reduce logic is applied on the partial map output, we will get incorrect output. The 15% or some value pf progression of reduce tasks before the completion of map phase is not the reduce logic execution progress, it is the progress representing the transfer of mapper output to the reducers. The mapred.reduce.slowstart.completed.maps property specifies the percentage of mappers that must complete before the reducers can start receiving data from the completed mappers. Once all the mapper execution gets completed, the reducer starts executing on to of the map outputs.

No comments:

Post a Comment

How to check the memory utilization of cluster nodes in a Kubernetes Cluster ?

 The memory and CPU utilization of a Kubernetes cluster can be checked by using the following command. kubectl top nodes The above command...