Sunday, 4 January 2015

Role of partitioner in mapreduce execution

Mapreduce job has mainly two phases, map phase and reduce phase. The output of the mapper will be passed to the reducer. The input and output of mapper will be key-value pair. The input of reducer will be key and list of values. The input to the reducer are the portion of the map output or the complete map output. The map output will be partitioned based on keys and multiple partitions will be generated. Each partition will be send to reducers. The output of map is partitioned in such a way that all the similar keys will be grouped in a single partition and will be send to the same reducer. Otherwise we will get incorrect output. The default partitioner in hadoop uses the hash value of the keys for partitioning the map outputs. This can be overridden by writing user defined partitioning function.

No comments:

Post a Comment

How to check the memory utilization of cluster nodes in a Kubernetes Cluster ?

 The memory and CPU utilization of a Kubernetes cluster can be checked by using the following command. kubectl top nodes The above command...