Capacity Scheduler
The capacity scheduler is pretty straight forward. It assigns each user a percentage of the parent queue that they are allowed to use. The queue object corresponding to your user, will have an “absoluteCapacity” field containing a double. This is the percentage of the entire cluster (both cpu and memory) that is available for jobs submitted by your user.
Thus you need to multiply this percentage, by the total resource available on the cluster. These metrics can be retrieved from the “clusterMetrics” api which contains both an “availableVirtualCores” and an “availableMB” field.
Finding total memory and total vCores available on the cluster | |
API Call: | http://<rm http address:port>/ws/v1/cluster/metrics |
Path to Json | clusterMetrics→ availableVirtualCores → availableMB |
(total_vcores,total_memory) =http://<rm http address:port>/ws/v1/cluster/metrics -> “clusterMetrics” -> {“availableVirtualCores”, “availableMB”}
Finding percent of resources available to your user if the queue type is capacity scheduler | |
API Call: | http://<rm http address:port>/ws/v1/cluster/scheduler |
Path to Json | scheduler→ schedulerInfo → rootQueue → queues ….find your queue → myUserName → absoluteCapacity |
Then we can calculate the cores and memory available for our job with the following equations
Available vcores for Spark Application = absoluteCapacity x availableVirtualCores
Available memory (mb) for Spark Application = absoluteCapacity x availableMB
Fair Scheduler
The fair scheduler is a little more complicated. Essentially resources are divided equally among other active users on the cluster. If the scheduler type is the fairScheduler, than to get the resource available to our queue we have to query the queue object for the “maxResources” object, which contains a “vCores” and “memory” field. These values represent the resource available on each node. Thus, to get the total resource available in our queue we need to multiply each of these numbers by the number of active nodes on the cluster (which can be retrieved from the cluster metrics API.
Finding the active nodes on the cluster | |
API Call: | http://<rm http address:port>/ws/v1/cluster/metrics |
Path to Json | clusterMetrics→ activeNodes |
Finding percent of resources available to your user if the queue type is fair scheduler | |
API Call: | http://<rm http address:port>/ws/v1/cluster/scheduler |
Path to Json | scheduler→ schedulerInfo → rootQueue → queues ….find your queue → myUserName → maxResources → vCores → memory |
Then we can calculate the cores and memory available to our Spark Application with the following equations
Available v-cores = activeNodes x v cores per node (see above)
Available memory = activeNodes x memory per node (see above)
How To Calculate Memory Overhead
This is different depending on whether you are running with yarn client or yarn cluster mode. In yarn cluster mode, both the executor memory overhead and driver memory overhead can be set manually in the conf with the “spark.yarn.executor.memoryOverhead” and “spark.yarn.driver.memoryOverhead” values respectively. If these values are unset the memory can be calculated with the following equation:
memory overhead = Max(MEMORY_OVERHEAD_FACTOR *requested memory, MEMORY_OVERHEAD_MINIMUM).
Where MEMORY_OVERHEAD_FACTOR = 0.10 and
MEMORY_OVERHEAD_MINIMUM = 384
That does it for our three-part series on using the YARN API within Chorus. We’re constantly working to deliver new enterprise functionality so check back with our blog to stay up to date and informed on the latest features.