Coder In Boots: November 2014

Saturday, 29 November 2014

Python program to check whether a number is Odd or Even

This is a very basic program in python for checking whether a given number is odd or even.

__author__ = 'coder'

def OddEven(number):
    try:
        remainder = number % 2
        if number == 0:
            print "Number is Zero"
        elif remainder == 0:
            print "Number is Even"
        else:
            print "Number is Odd"
    except:
        print "Error while processing"

if __name__ == '__main__':
    OddEven(11)

Wednesday, 19 November 2014

Hadoop Interview Questions

1) What is the name of Hadoop's file system .?
Ans: HDFS

2) What is the full form of HDFS.?
Ans: Hadoop Distributed File System

3) What is the Processing Layer of Hadoop. ?
Ans: Mapreduce

4) Hadoop framework is written in which language .?
Ans: Java

5) What is the licencing cost for hadoop.?
Ans: Hadoop is an opensource technology. So it is free.

6) Who is known as father of Hadoop.?
Ans: Doug Cutting

7) How Hadoop differs from other data processing technologies..?
Ans: Hadoop is a framework which is having distributed storage as well as a distributed processing layer. The basic idea behind hadoop is to bring down the processing layer down to storage. Hadoop is a horizontally scaling framework So high end server grade hardware is not required. Only commodity hardware is required.

8) Is hadoop good for real time processing.?
Ans: Directly No. Hadoop is a batch processing framework. So it can't be used for real time processing. But it can work along with other technologies to produce real time outputs.

9) Is hadoop a replacement for RDBMS..?
Ans: Hadoop is not suitable for processing small or medium amount of data. Since hadoop is a batch processing framework, hadoop will not provide faster output. What hadoop guarantees is that, it will never fail with large data. In case of large data, which the other data processing technologies can't process, hadoop will perform well

10) If hadoop is open source and free, who is maintaining it and enhancing it.?
Ans: Hadoop is an Apache project, people all over the world are contributing and adding more enhancements to it. Lot of companies are also using hadoop, they are also contributing to hadoop.

11) Why hadoop became very popular.?
Ans: Analyzing hidden insight from data became a very important part of almost every organisation now. The correctness of the insights will be more as the size of the data is more. Now a days the usage of internet and social media is very high. So if we collect that data alone, we can analyse people upto some extent. Similar to this, we can analyse anything and everything using the history data. This is one reason. Similarly real time monitoring and decision making also became very important now. This is another factor. If we go for a tool / product with licence, the licensing cost itself will be very high. Hadoop is opensource and free. Hadoop runs on commodity hardware, so the cost of the Infrastructure is also less. This made hadoop a hot cake in the market.

12) What do you mean by a pseudo distributed hadoop cluster.?
Ans If all the daemons of the hadoop are running in a single node, it is called pseudo distributed mode. This is not used for production. This is just for development and learning purpose.

Creating Random File of any size in linux

Sometimes we require some random file of some specific size for testing some performance such as file transfer. There are several ways to create such files.

Method 1
If you just want a file with some specific size, you can use the following command.

dd if=/dev/urandom of=dummyfile.txt bs=2G count=1

The bs is the block size, 2G means 2 GB. If you make the count as 2, then 2 GB *2 = 4 GB random file will be generated.

Method 2
If you are concerned about the schema, you can generate the data using a simple approach.
First create a file with a sample data set of 1 or 2 records in a file. Let us call that file name as A.txt

Then using a simple shell script, we can make it very big.

Depending upon the size requirement, you can increase the value of limit to any number.

Method 3
Use any online / offline data generation tools. This will be required only if you need random data with some specific schema and discrete values.
Some useful links are listed below

1) http://www.generatedata.com/
2) http://c2.com/cgi/wiki?TestDataGenerator

Downloading a file from linux command line

While dealing with linux, most of the times, it is required to download some files. If the linux is not having a GUI, most of the people will download file directly in windows and transfer it to Linux environment.
This is not required if you have internet access in your Linux machine. The super cool linux is providing us a lot of features, the only thing is we have to learn it and use it properly.
Get the proper download URL and execute the following command.

wget <download url>

Eg: If you want to download tomcat, the command will be as shown below.

wget  http://apache.mirrors.hoobly.com/tomcat/tomcat-7/v7.0.57/bin/apache-tomcat-7.0.57.zip

Various File Editors in Linux

When we say about linux, most of the people will think about a black command line. Operating with that black command line is not that much difficult as most of the people think. The basic operation that most of us did in the linux command line is creating a file or editing a file. This is required for all configuration files. For this we require a file editor. Here I am listing down some popular file editors in linux.

1) vi

2) vim

3) nano

4) gedit (This is a desktop editor)

5) gvim

6) emacs

Setting Java home in Linux and Windows Environments

Java is a very popular programming language. Most of the people in the world are using java directly or indirectly. Most of the times we need to set JAVA_HOME environment variable. This is a very basic activity. But just posting here, because it may help someone. When I started my career, I also searched in internet for the same.

Setting up JAVA_HOME in LINUX environments

1) First we have to install java. Java can be downloaded from oracle website. Based on your operating system architecture( 32 or 64 bit), and requirement, download the proper java installable.

2) If it is a tarball, extract the tar ball and keep the folder in /opt directory
3) Go to the home folder of java and type pwd. Suppose the result is /opt/jdk1.7.0
4) Open the following file and add the entries as below.

For ubuntu:
open /etc/bash.bashrc file or ~/.bashrc file

For CentOS and Redhat:
open /etc/bashrc or ~/.bashrc file

Add the following lines to the file

export JAVA_HOME=<path to java home directory>
export PATH=$JAVA_HOME/bin:$PATH

Save the file and exit.
Then refresh the file using the command
source /etc/bashrc or source ~/.bashrc

Setting up JAVA_HOME in Windows environments

1) Download the JDK from Oracle website and install it in your machine
2) Click on the start button
3) Right click on My Computer
4) Click on Advanced System settings
5) Click on Environment variables
6) Depending on the use, click on New in the System Variables or User Variables. The scope of user variables is limited to the particular user account, where as of the System Variables will be accessible to all the users.
7) Click on New and Give variable Name as JAVA_HOME and variable value as the complete path to java installation folder in your windows machine.
Eg: C:\Program Files\Java\jdk1.7.0_60
8) Edit the Path and append the following entry to the end.
%JAVA_HOME%/bin;
Don't forget to put a semicolon delimiter between the existing values and newly added value.

Now your JAVA_HOME is set.. :)

Coder In Boots

Pages