YARN = Yet Another Resource Negotiator
Name node is having what all are the total blocks, what all are data nodes etc.. at the same time name node allocating resources. Name node need to decide which resource for which user etc.
YARN is the Hadoop processing layer that contains.
- A resource manager
- A job scheduler
YARN allows multiple data processing engines to run on a single Hadoop cluster.
- Batch Programs (eg., spark, MapReduce)
- Interactive SQL (e.g. Impala)
- Advanced Analytics ( e.g. Spark, Impala)
- Streaming ( e.g. Spark Streaming)
YARN Daemons
Resource Manager ( RM)
- Runs on master Node
- Global resource scheduler
- Arbitrates system resources between competing applications.
- Has a pluggable scheduler to support different algorithms ( capacity, fair scheduler, etc.)
Node Manager ( NM) - It's logical entity.
- Runs on slave nodes.
- Communicates with RM
Running an Application in YARN
Containers
- Created by the RM upon request.
- Allocate a certain amount of resources(memory, CPU) on a slave node.
- Applications run in one or more containers.
Application Manager ( AM)
- One per application
- Framework/application specific.
- Runs in a container.
- Requests more containers to run application tasks.
Cluster Metrics :
How many apps submitted
How many aps pending
How many apps running
How many apps completed
How many containers running
How much memory used ?
How much memory total ?
How munch memory reserved ?
What are the active nodes ?