It is all about data !!! Data Science, Data analytics,Data warehouse : Hadoop Framework

Hadoop Framework

Knowledge shared is knowledge gained J

Wish you all a happy day folks. Let us look at the Hadoop architecture in today’s topic.

Hadoop is a framework developed by Apache in order to handle large volume of data (i.e.) big data.

Have a look at the below picture closely.

· Hadoop framework is designed based on Master/Slave architecture

Let me explain in it laymen terms about the architecture then let us move on to the real terminologies used in the Hadoop

Simple explanation

Just imagine Master Rob has three slaves named A, B, C.

· Rob will note down all A, B, C information in his employee register – Where A,B,C lives ? How much potential each one possess etc

· A,B,C knows who the master after interacting with Master

· Rob will periodically (1 hour) check if A/B/C is performing their duty. If anyone is not replying or missing means there is something wrong with them

· Rob is not sure when A, B or C will leave him. Hence whatever A knows he orders to share with B and C. Similarly B à share with A and C and C à share with A and B

· A/B/C after performing their duty will update their current amount of work, pending work etc.

· Rob as a single person cannot monitor all employees. He appoints Tom as resource manager in order to monitor A/B/C

· Rob maintains who log where who did what

· If Rob loses his register and log then it will be very difficult to track A, B, C wages, capability, past achievements etc. Hence Bob gets this information once in a while

Let us map the above scenario in Hadoop terminology

1. Rob à Master Node or Name Node

2. A/B/C à Data Node or Slave Node

3. Rob maintains A/B/C info à Metadata (FS Image). This has all the information about the Cluster

4. Rob’s Log à Edit logs which has transactional information

5. Rob Checks periodically à Heart beat every 30 seconds to make sure the slave is not down

6. Tom à Resource manager process or daemon to manage Slaves A/B/C

7. Sharing info à Replication factor. This is to make sure the data is retrieved even if one machine goes down

8. Bob à Secondary name node. This is housekeeping node periodically gets the metadata information from Name Node

9. A/B/C performing work à Read/Write operation

Process Explanation

HDFS - Hadoop File System

We know file systems like UNIX file system, NTFS, FAT32. Similarly Hadoop file system is a type of file system used in Big data world. This file system determines

· How the data is stored

· What are the directory structures?

All the Big data tools developed has to imply with Hadoop file system

YARN Processing

Yarn – Yet another resource negotiator

YARN was introduced from Hadoop 2.0. This is a processing unit for Hadoop framework. Some of the operations are

· How to track the job

· Scheduling the job

It is all about data !!! Data Science, Data analytics,Data warehouse

Sunday, May 3, 2015

Hadoop Framework

No comments:

Post a Comment

List of topics