YARN的(最)重要论文
原MapReduce的问题(Hadoop1.0)也就是YARN要解决的问题:
1,tight coupling of a specific programming model with resource management infrastructure, forcing developers to abuse the Mapreduce programming model
2,centralize handling of jobs' control flow, which resulted in endliess scalability conerns for the scheduler
所以开发了YARN:The new architecture decouples the programming model from the resource management infrastructure and delegate many scheduling functions(e,., task fault-tolerance) to per-application components.
对YARN的需求:
1,Scalability
2,Multi-tanancy
3,Serviceability
4,Locality awareness
5,High Cluster Utilization
6,Reliability/Aailability
7,Secure and auditable operation
8,Support for programming model diversity
9,Flexible Resource Model
10,Backword compatibility
YARN构成:
Resource Manager(RM): A deamon on a dedicated machine and act as the central authoirty arbitrating resource among various competing applications
Application Master(AM): Coodinates the logical plan of a single job by requesting resources from the the RM, generating a physical plan from the resource it recieves and coodinating the execution of the plan around faults.
Node Manager(NM): A special system deamon running on each node.
关键点:
The RM dynamically allocate leases-called containers - to applications to run on particular nodes. The container is a logical bundle of resource(e.g., <2GB RAM , 1CPU>) bound to particular node.
All containers in YARN - including AMs are described by a container lauch contest(CLC)