Saturday, 3 January 2015

What is the best tool for creating workflow or chaining jobs in hadoop ?

Sometimes we require some tools to chain mapreduce jobs, hive jobs, pig jobs etc. We can chain these jobs using our own way either by using programs or using some scripts. But the best way to chain jobs in hadoop ecosystem is by using oozie.

Oozie is a workflow and orchestration framework in hadoop ecosystem. We don't need to worry about the complexities of handling various scenarios that may have to be considered while developing a chaining tool. Oozie is a very simple tool and the workflows can be achieved by using an xml file. For more details refer oozie website

No comments:

Post a Comment

How to check the memory utilization of cluster nodes in a Kubernetes Cluster ?

 The memory and CPU utilization of a Kubernetes cluster can be checked by using the following command. kubectl top nodes The above command...