Wednesday 19 November 2014

Hadoop Interview Questions

1) What is the name of Hadoop's file system .?
Ans: HDFS

2) What is the full form of HDFS.?
Ans: Hadoop Distributed File System

3) What is the Processing Layer of Hadoop. ?
Ans: Mapreduce

4) Hadoop framework is written in which language .?
Ans: Java

5) What is the licencing cost for hadoop.?
Ans: Hadoop is an opensource technology. So it is free.

6) Who is known as father of Hadoop.?
Ans: Doug Cutting

7) How Hadoop differs from other data processing technologies..?
Ans: Hadoop is a framework which is having distributed storage as well as a distributed processing layer. The basic idea behind hadoop is to bring down the processing layer down to storage. Hadoop is a horizontally scaling framework So high end server grade hardware is not required. Only commodity hardware is required.

8) Is hadoop good for real time processing.?
Ans: Directly No. Hadoop is a batch processing framework. So it can't be used for real time processing. But it can work along with other technologies to produce real time outputs.

9) Is hadoop a replacement for RDBMS..?
Ans: Hadoop is not suitable for processing small or medium amount of data. Since hadoop is a batch processing framework, hadoop will not provide faster output. What hadoop guarantees is that, it will never fail with large data. In case of large data, which the other data processing technologies can't process, hadoop will perform well

10) If hadoop is open source and free, who is maintaining it and enhancing it.?
Ans: Hadoop is an Apache project, people all over the world are contributing and adding more enhancements to it. Lot of companies are also using hadoop, they are also contributing to hadoop.

11) Why hadoop became very popular.?
Ans: Analyzing hidden insight from data became a very important part of almost every organisation now. The correctness of the insights will be more as the size of the data is more. Now a days the usage of internet and social media is very high. So if we collect that data alone, we can analyse people upto some extent. Similar to this, we can analyse anything and everything using the history data. This is one reason. Similarly real time  monitoring and decision making also became very important now. This is another factor. If we go for a tool / product with licence, the licensing cost itself will be very high. Hadoop is opensource and free. Hadoop runs on commodity hardware, so the cost of the Infrastructure is also less. This made hadoop a hot cake in the market.

12) What do you mean by a pseudo distributed hadoop cluster.?
Ans If all the daemons of the hadoop are running in a single node, it is called pseudo distributed mode. This is not used for production. This is just for development and learning purpose.

No comments:

Post a Comment

How to check the memory utilization of cluster nodes in a Kubernetes Cluster ?

 The memory and CPU utilization of a Kubernetes cluster can be checked by using the following command. kubectl top nodes The above command...