Hadoop Architecture Explained
·
Hadoop
1.0 architecture
·
Hadoop
2.0 architecture
a. Federation architecture
b. High availability architecture
c. Federation architecture with High availability
Hadoop 1.0 architecture
There are
two important components of Hadoop architecture. They are
·
HDFS
– Hadoop file system
·
Processing
Framework – Map reduce
HDFS
·
Defines
how the data is stored and distributed in different data notes
·
Name
node or the master node contains Metadata information
·
Data
node is actually where the exact data is present
·
Data
is stored in multiple blocks in different data nodes based on the replication
factor
·
Once
there is a seek or write operation request, Client process contact Name node
and Data node intern retrieves the information
·
Name
node sends all the Metadata information to Secondary name once in a while
·
Secondary
name node is not a fail over setup node
Job Tracker
·
This
is processing unit for Hadoop system
·
Once
the request is received, Job tracker schedules the job and monitors the job
·
Job
tracker creates a request to Data node which in turn creates a Task tracker and
executes the real map reduce jobs in data node
Disadvantages of Hadoop 1.0
architecture
·
Was
not able to Scale up more than 4000 node Cluster
·
Job
Tracker function was too complex to handle as it was used to schedule and
monitor Jobs
·
No
High availability mechanism
<Yet to Update Hadoop2.0 architecture>