Apache Spark on YARN: Resource Planning

gowebing 2024-12-02

This is the second article of a four-part series about Apache Spark on YARN. As Apache Spark is an in-memory distributed data processing engine, application performance is heavily dependent on resources such as executors, cores, and memory allocated. The resources for the application depends on the application characteristics such as storage and computation.

A few performance bottlenecks were identified in the SFO Fire Department call service dataset use case with the YARN cluster manager. One of the bottlenecks was about the improper usage of resources in the YARN cluster and the execution of the applications based on default Spark configuration.

To understand the use case and performance bottlenecks identified, refer our previous blog on Apache Spark on YARN: Performance and Bottlenecks. In this blog, let's discuss resource planning for the same use case used in our previous blog and discuss improving the performance of the Spark application used in that use case.

Spark Resource Planning Principles

The general principles to be followed while deciding resource allocation for Spark application are as follows:

The most granular (smallest sized executors) level of resource allocation reduces application performance due to the inability to use the power of running multiple tasks inside single executor. To perform computation faster, multiple tasks within the executor share the cached data.
The least granular (biggest executors) level of resource allocation influences application performance due to the overuse of resources and not considering memory overhead of OS and other daemons.
The balanced resources (executors, cores, and memory) with memory overhead improve the performance of the Spark application, especially when running Spark application on YARN.

Understanding Use Case Performance

The Spark application is executed in YARN cluster mode. The resource allocation for the use case Spark applications is illustrated in the below table: select

The observation from Spark UI are as follows:

High-level API implementation of the application was completed and the results were provided in 1.8 and 1.9 minutes.
Low-level RDD API implementation of the application was completed in 22 minutes and even with Kyro serialized way the application was completed in 21 minutes.

Fire Service Call Output Let's understand the YARN resources before performing Spark application resource tuning.

Understanding YARN Resource

A cluster is set up and the YARN resource availability from the YARN configuration is illustrated in the below table:

select

The maximum memory and vcores available per node are 8 GB and 3 Cores. Totally, we have 16 GB and 6 Cores as shown in the below diagram:

YARN Cluster Metrics

If the underlying instance has more memory and core, the above configuration can be increased. Let's stick with the above configuration of YARN and tune the resources.

If the resources allocated to the Spark application exceeds these limits, then the application will be terminated with error messages. Executor Memory Exceeds Cluster Memory Error Executor Core Exceeds Cluster Vcore Error

Hopefully, you understood the use case performance and available resources in YARN.

Spark on YARN: Resource Planning

Let's find out the reasonable resources to execute the Spark application in YARN.

Memory available per node	8 GB
Core available per node	3

To find out the number of executors, cores, and memory and its works for our use case with notable performance improvement, perform the following steps:

Step 1

Allocate 1 GB memory and 1 core for driver per node. The driver can be launched at any one of the nodes at run time. If the output of the action returns more data (for example, more than 1 GB), then driver memory must be adjusted.

Memory available per node	7 GB
Core available per node	2

Step 2

Assign 1 GB memory and 1 core for OS and Hadoop Daemons overhead per instance.

Let's look at the below instance to launch the cluster.

Instance details: m4.xlarge (4 cores, 16 GB RAM)

1 core and 8 GB RAM are freed up for other resources and YARN is configured with 8 GB RAM and 3 cores per node. The freed-up resource will be used on OS and Hadoop Daemons overhead. Memory available and core available per node remains unchanged after Step 2.

Step 3

Find out the number of cores per executor.

As 2 cores per node are available, decide the number of cores as 2 per executor.

Note: If you have more cores per instance (for example, 16 – 1(overhead) = 15), then stick with the number of cores per executor as 5 while running in YARN with HDFS due to high HDFS throughput.

Step 4

Find out the number of executors and the memory per executors.

Number of cores per executor: 2 Total cores = Number of nodes * Number of cores per node (after taking overhead) => 2 * 2 = 4
Total Executors: 2
- Total executors = Total cores / Number of nodes => 4 / 2 = 2
- Number of executors per node = Total executors / Number of nodes => 2/2 = 1
  - Each node will have one executor.
Memory per executor: 7 GB (this must be adjusted as per the application payload).
- Memory per node / Number of executors per node => 7 / 1 => 7 GB

This calculation works well with our use case except that the memory per executor as input dataset size is 1.5 GB and using 6 GB per executor to process 1.5 GB is like over using the memory.

Executor memory with 2 GB is applied and increased up to 7 GB per executor to execute the Spark application. 2 GB memory per executor is decided as there are no additional performance improvements while increasing executor memory from 2 GB to 7 GB.

The decided resource allocation derived from the above steps for the use case Spark applications is illustrated in the below table: select

Note:Different organizations have different workloads and the above steps may not work well for all cases, but you can get an idea of calculating executors, cores, and memory.

Running Spark on YARN With Tuned Resource

Let's look at different granual and resource levels.

DataFrame Implementation of Spark Application

DataFrame implementation of Spark application is executed in most granular, lease granular, and balanced resource (which we have calculated) levels.

Most Granular Level Resource Allocation

./bin/spark-submit --name FireServiceCallAnalysisDataFrameTest2 --master yarn --deploy-mode cluster   --executor-memory 1g --executor-cores 1  --num-executors 7 --class com.treselle.fscalls.analysis.FireServiceCallAnalysisDF /data/SFFireServiceCall/SFFireServiceCallAnalysis.jar /user/tsldp/FireServiceCallDataSet/Fire_Department_Calls_for_Service.csv

FireService Call Analysis DataFrame Test2 Executors Stats

Least Granular Level Resource Allocation

./bin/spark-submit --name FireServiceCallAnalysisDataFrameTest1 --master yarn --deploy-mode cluster   --executor-memory 7g --executor-cores 2  --num-executors 1 --class com.treselle.fscalls.analysis.FireServiceCallAnalysisDF /data/SFFireServiceCall/SFFireServiceCallAnalysis.jar /user/tsldp/FireServiceCallDataSet/Fire_Department_Calls_for_Service.csv

Fire Service Call Analysis DataFrame Test1 Executors Stats

Balanced Resource Allocation

./bin/spark-submit --name FireServiceCallAnalysisDataFrameTest --master yarn --deploy-mode cluster --executor-memory 2g --executor-cores 2 --num-executors 2 --class com.treselle.fscalls.analysis.FireServiceCallAnalysisDF /data/SFFireServiceCall/SFFireServiceCallAnalysis.jar /user/tsldp/FireServiceCallDataSet/Fire_Department_Calls_for_Service.csv

Fire Service Call Analysis DataFrame Test Executor Stats The balanced resource allocation provides notable performance improvement from 1.8 minutes to 1.3 minutes. FireServiceCallAnalysisSPTuneOutput

RDD Implementation of Spark Application

RDD implementation of the Spark application is executed in most granular, least granular, and balanced resource (which we have calculated) levels.

./bin/spark-submit --name FireServiceCallAnalysisRDDTest2 --master yarn --deploy-mode cluster  --executor-memory 1g --executor-cores 1  --num-executors 7 --class com.treselle.fscalls.analysis.FireServiceCallAnalysis /data/SFFireServiceCall/SFFireServiceCallAnalysis.jar /user/tsldp/FireServiceCallDataSet/Fire_Department_Calls_for_Service.csv

./bin/spark-submit --name FireServiceCallAnalysisRDDTest1 --master yarn --deploy-mode cluster   --executor-memory 7g --executor-cores 2  --num-executors 1 --class com.treselle.fscalls.analysis.FireServiceCallAnalysis /data/SFFireServiceCall/SFFireServiceCallAnalysis.jar /user/tsldp/FireServiceCallDataSet/Fire_Department_Calls_for_Service.csv

./bin/spark-submit --name FireServiceCallAnalysisRDDTest --master yarn --deploy-mode cluster --executor-memory 2g --executor-cores 2 --num-executors 2 --class com.treselle.fscalls.analysis.FireServiceCallAnalysis /data/SFFireServiceCall/SFFireServiceCallAnalysis.jar /user/tsldp/FireServiceCallDataSet/Fire_Department_Calls_for_Service.csv

The RDD implementation on balanced resource allocation is2 times faster than default Spark configuration execution. Spark default configuration produced results in 22 minutes. But, after resource tuning, the result is produced in 11 minutes. FireServiceCallAnalysisRDDSPTuneOutput

Spark Applications With Default Configuration

Spark Application After Resource Tuning

Conclusion

In this blog, we have discussed the Spark resource planning principles and understood the use case performance and YARN resource configuration before doing resource tuning for Spark application.

We followed certain steps to calculate resources (executors, cores, and memory) for Spark application. The results are as follows:

Significant performance improvement in the DataFrame implementation of Spark application from 1.8 minutes to 1.3 minutes.
RDD implementation of Spark application is 2 times faster from 22 minutes to 11 minutes.

After performance tuning and fixing of the bottleneck, the final duration to complete the application in both high-level and low-level APIs is:

Straggler Fix Output