Mapreduce job has mainly two phases, map phase and reduce phase. The output of the mapper will be passed to the reducer. The input and output of mapper will be key-value pair. The input of reducer will be key and list of values. The input to the reducer are the portion of the map output or the complete map output. The map output will be partitioned based on keys and multiple partitions will be generated. Each partition will be send to reducers. The output of map is partitioned in such a way that all the similar keys will be grouped in a single partition and will be send to the same reducer. Otherwise we will get incorrect output. The default partitioner in hadoop uses the hash value of the keys for partitioning the map outputs. This can be overridden by writing user defined partitioning function.
Sunday, 4 January 2015
Role of partitioner in mapreduce execution
Subscribe to:
Post Comments (Atom)
How to check the memory utilization of cluster nodes in a Kubernetes Cluster ?
The memory and CPU utilization of a Kubernetes cluster can be checked by using the following command. kubectl top nodes The above command...
-
The memory and CPU utilization of a Kubernetes cluster can be checked by using the following command. kubectl top nodes The above command...
-
Have you heard about Anaconda or Miniconda ? If you are a python developer or someone who is interested to learn python, you should defi...
-
Introduction Raspberry Pi is one of the powerful device invented in this era. During my school days and college days, all the automati...
No comments:
Post a Comment