Sunday, May 17, 2015

Hadoop Architecture


Hadoop Architecture Explained

I have explained Hadoop 1 and Hadoop 2 architecture

·        Hadoop 1.0 architecture

·        Hadoop 2.0 architecture

a.     Federation architecture

b.     High availability architecture

c.      Federation architecture with High availability

Hadoop 1.0 architecture

There are two important components of Hadoop architecture. They are

·        HDFS – Hadoop file system

·        Processing Framework – Map reduce

 
 
HDFS
·        Defines how the data is stored and distributed in different data notes
·        Name node or the master node contains Metadata information
·        Data node is actually where the exact data is present
·        Data is stored in multiple blocks in different data nodes based on the replication factor
·        Once there is a seek or write operation request, Client process contact Name node and Data node intern retrieves the information
·        Name node sends all the Metadata information to Secondary name once in a while
·        Secondary name node is not a fail over setup node
Job Tracker
·        This is processing unit for Hadoop system
·        Once the request is received, Job tracker schedules the job and monitors the job
·        Job tracker creates a request to Data node which in turn creates a Task tracker and executes the real map reduce jobs in data node
 
Disadvantages of Hadoop 1.0 architecture
·        Was not able to Scale up more than 4000 node Cluster
·        Job Tracker function was too complex to handle as it was used to schedule and monitor Jobs
·        No High availability mechanism
 
<Yet to Update Hadoop2.0 architecture>
 
 
 
 
 
 
 
 

No comments:

Post a Comment