Hadoop Framework
Knowledge shared is knowledge gained J
Wish you all
a happy day folks. Let us look at the Hadoop architecture in today’s topic.
Hadoop is a
framework developed by Apache in order to handle large volume of data (i.e.)
big data.
Have a look
at the below picture closely.
· Hadoop framework is designed based on
Master/Slave architecture
Let me
explain in it laymen terms about the architecture then let us move on to the
real terminologies used in the Hadoop
Simple
explanation
Just imagine
Master Rob has three slaves named A, B, C.
· Rob will note down all A, B, C
information in his employee register – Where A,B,C lives ? How much potential
each one possess etc
· A,B,C knows who the master after
interacting with Master
· Rob will periodically (1 hour) check
if A/B/C is performing their duty. If anyone is not replying or missing means
there is something wrong with them
· Rob is not sure when A, B or C will
leave him. Hence whatever A knows he orders to share with B and C. Similarly B à share with A and C and C à share with A and B
· A/B/C after performing their duty
will update their current amount of work, pending work etc.
· Rob as a single person cannot monitor
all employees. He appoints Tom as resource manager in order to monitor A/B/C
· Rob maintains who log where who did
what
· If Rob loses his register and log
then it will be very difficult to track A, B, C wages, capability, past
achievements etc. Hence Bob gets this information once in a while
Let
us map the above scenario in Hadoop terminology
1. Rob à Master Node or Name Node
2. A/B/C à Data Node or Slave Node
3. Rob maintains A/B/C info à Metadata (FS Image). This has all the information
about the Cluster
4. Rob’s Log à Edit logs which has transactional information
5. Rob Checks periodically à Heart beat every 30 seconds to make sure the slave is
not down
6. Tom à Resource manager process or daemon to manage Slaves A/B/C
7. Sharing info à Replication factor. This is to make sure the data is
retrieved even if one machine goes down
8. Bob à Secondary name node. This is housekeeping node periodically
gets the metadata information from Name Node
9. A/B/C performing work à Read/Write operation
Process
Explanation
HDFS
- Hadoop File System
We know file systems like UNIX file
system, NTFS, FAT32. Similarly Hadoop file system is a type of file system used
in Big data world. This file system determines
· How the data is stored
· What are the directory structures?
All the Big
data tools developed has to imply with Hadoop file system
YARN
Processing
Yarn – Yet another resource negotiator
YARN was
introduced from Hadoop 2.0. This is a processing unit for Hadoop framework.
Some of the operations are
· How to track the job
· Scheduling the job