It is all about data !!! Data Science, Data analytics,Data warehouse

Hadoop Architecture Explained

I have explained Hadoop 1 and Hadoop 2 architecture

· Hadoop 1.0 architecture

· Hadoop 2.0 architecture

a. Federation architecture

b. High availability architecture

c. Federation architecture with High availability

Hadoop 1.0 architecture

There are two important components of Hadoop architecture. They are

· HDFS – Hadoop file system

· Processing Framework – Map reduce

HDFS

· Defines how the data is stored and distributed in different data notes

· Name node or the master node contains Metadata information

· Data node is actually where the exact data is present

· Data is stored in multiple blocks in different data nodes based on the replication factor

· Once there is a seek or write operation request, Client process contact Name node and Data node intern retrieves the information

· Name node sends all the Metadata information to Secondary name once in a while

· Secondary name node is not a fail over setup node

Job Tracker

· This is processing unit for Hadoop system

· Once the request is received, Job tracker schedules the job and monitors the job

· Job tracker creates a request to Data node which in turn creates a Task tracker and executes the real map reduce jobs in data node

Disadvantages of Hadoop 1.0 architecture

· Was not able to Scale up more than 4000 node Cluster

· Job Tracker function was too complex to handle as it was used to schedule and monitor Jobs

· No High availability mechanism

Sunday, May 17, 2015