Fine-Tuning Performance, Resolving Common Issues in FinTech Application With MySQL
Relational Databases are the bedrock of any FinTech application, especially for OLTP (Online transaction Processing). This foundational component in any application architecture usually poses challenges around scaling as the business expands rapidly. So, it is imperative that all database activities are monitored closely in the production environment and issues like long-running queries are tracked and resolved.
This article will explore the FinTech case study, which has built a Lending Platform. The company uses the MySQL database hosted in AWS as part of the AWS RDS service. It has multiple microservices using different database schemas hosted on the database instance. The MVP product offering was launched a few years back, and since then, they have been incorporating new features into the FinTech platform. We will cover commonly identified database issues and what was done to resolve these issues.
Common Mistakes
This section discusses common mistakes identified, the steps to resolve the issues, and additional guidelines.
Using Database Views Inside Stored Procedures
After it started seeing database growth, the major issue identified in the initial days with the MySQL database was that usage of views inside stored procedures resulted in long-running queries and full table scans. The pattern we saw with the developers was that they built multiple reusable views and used those in the stored procedures. The example below is a stored procedure invoking a view uv_consultations
:
CREATE PROCEDURE `usp_get_consultation`(
in_dso_ref_id varchar(45)
)
BEGIN
select *
from uv_consultations
where
dso_ref_id=in_dso_ref_id;
END
Here is what the view looks like:
CREATE OR REPLACE VIEW uv_consultations AS
SELECT
c.id AS consultation_id,
c.dso_ref_id AS dso_ref_id,
c.pat_id AS pat_id,
p.first_name AS pat_first_name,
p.last_name AS pat_last_name,
p.dso_pat_id AS dso_pat_id,
p.dob AS pat_dob,
COUNT(cn.id) AS notifs
FROM
((((consultation c
JOIN patient p ON ((c.pat_id = p.id)))
LEFT JOIN application app ON ((c.id = app.consultation_id)))
LEFT JOIN responsible_party rp ON ((app.primary_rp_id = rp.id)))
LEFT JOIN consultation_notif cn ON (((cn.consultation_id = c.id)
AND (cn.read = 0)
AND (cn.hide = 0))))
GROUP BY c.id
So, if we look at the view, it has multiple joins and a group by. It's a complex view query. MySQL executes views in two ways: view merging and view materialization.
View Merging
In this approach, MySQL merges the view query with the outer query in which the view has been referenced. The biggest advantage is that it uses underlying table indexes of the actual base table, improving the query timing. However, there are limitations to view merging. The query within the view must be a simple select statement with no aggregations, joins, sub-queries, or DISTINCT
clauses.
View Materialization
When MySQL cannot perform a merge view, it defaults to View Materialization. In this approach, MySQL stores the view results in a temporary internal table and then uses the query on the view to query the internal table. The drawback is that it does not use base table indexes, causing the queries on the view to run slower. Views with GROUP BY
, aggregation, DISTINCT
, and complex joins trigger view materialization.
In the above example, the view has aggregation and multiple joins, so the stored procedure's execution results in view materialization, which causes it to execute slowly. To mitigate this, refactored the stored procedure to use the direct complex SQL query and avoid the intermittent layer of views.
Leading Wildcard String Comparison
If we use the LIKE
operator and the leading wild card search, e.g., LIKE ‘%AMOL’
, then the query will not use the index on that column. Below is the sample query:
SELECT
COUNT(*) INTO v_ext_partner_service_count
FROM
loan_ext_payload
WHERE
loan_acct_id = v_loan_acct_id
AND ext_partner_service LIKE '%CREDIT_PULL'
For LIKE ‘CREDIT_PULL%’
, MySQL uses the index efficiently as the indexes are structured in a way that this type of search makes them inherently faster. On the contrary, for the leading wildcard LIKE ‘%CREDIT_PULL’
, MySQL execution engine looks at each entry in the index if it ends with ‘CREDIT_PULL.’ The index is optimized for prefixing (CREDIT_PULL%
), not suffixing (%CREDIT_PULL
); its performance benefits are wiped out with leading wildcard string comparisons.
The recommendation is to avoid using prefixed wild card string comparison or search. If this is unavoidable, use a full-text search.
Using Functions as Part of the WHERE
Clause
Using functions for filtering records using a WHERE
condition can harm query performance. For instance, using FIND_IN_SET()
method as in the below query:
SELECT
code,
code_type,
message,
message_es
FROM
locale_messages
WHERE
code_type = v_code_type
AND FIND_IN_SET(code, v_codes) > 0;
FIND_IN_SET
returns the position of a string if it is present (as a substring) within a list of strings. Using FIND_IN_SET
in the query causes a full table scan as it does not use the index. MySQL has to scan each record to evaluate the function, which could be very expensive from a computing perspective for larger tables.
Similarly, mathematical, string, or date functions would have a similar side effect. Below are the examples:
SELECT
loan_id, loan_acct_id
FROM
loan_application
WHERE YEAR(create_date) = 2024;
SELECT
loan_id, loan_acct_id
FROM
loan_application
WHERE YEAR(create_date) = 2024;
SELECT
loan_id, loan_acct_id
FROM loan_application
WHERE ABS(merchant_discount_fee) > 5;
If we cast columns as part of the where clause, it again leads to a full table scan, impacting the query performance. Example as below:
SELECT
loan_id, loan_acct_id
FROM
loan_application
WHERE
CAST(approval_datetime AS DATE) = '2024-06-01';
As a resolution for FIND_IN_SET
, it enabled full-text search. For all other function-based where conditions, function-based indexes were created.
Performing Deletes as Part of Regular Transactions
Frequent delete operations as part of regular transaction processing can degrade database performance. Deleting rows from a table leads to index rebalancing. MySQL has to update the indexes on the columns for that table as it has to remove the index entries. For rollback and recovery, transaction logs are created for all delete operations. This can lead to rapid growth of transaction logs. The rows deleted in random order fragment tables and indexes degrade performance due to scattered reads.
To mitigate regular delete operations, rows can be soft-deleted, and then, an offline batch job could perform a hard delete during non-peak hours. Non-peak hours allow the system to use its resources effectively. Batch or bulk deletes are more performant than the deletes span across regular intervals.
Executing Multiple Inserts From the Application Instead of Batching
If we have to insert multiple child rows for the parent record from the application layer, we will execute a stored procedure for each insert instead of batching it. Each call on insert needs to establish a database connection from the application layer and execute individual stored procedures for each record, which could be inefficient.
To avoid multiple round trips to the database server, the multi-insert statement was built as below on the application layer and then sent to the database-stored procedure to execute it.
INSERT INTO verif_user_attrib_result
(id, application_id, user_attrib_id, doc_type_id, verif_result, verif_notes, verif_date, doc_upload_count)
values(148712,146235,1,NULL,1,NULL,NULL,0),
(148712,146235,2,NULL,1,NULL,NULL,0),
(148712,146235,3,NULL,1,NULL,NULL,0),
(148712,146235,4,NULL,-1,NULL,NULL,0);
CREATE PROCEDURE p_verif_create_user_attrib_results (
IN v_user_attrib_result_multi_insert TEXT,
out v_status int)
BEGIN
SET v_status = 1;
SET @qry_user_attrib_result = v_user_attrib_result_multi_insert;
PREPARE stmt_user_attrib_result from @qry_user_attrib_result;
EXECUTE stmt_user_attrib_result;
DEALLOCATE PREPARE stmt_user_attrib_result;
SET v_status = 0;
END
Recommendations
Connection Labels
In the microservices world, where multiple microservices connect to different or the same schemas of the database server, monitoring the connection requests from the different microservices often becomes difficult. If there is an issue with connection pools, connection draining, or database performance issues, we cannot identify which connection belongs to which service. This is where program_name
connection attributes come in handy.
jdbc:mysql://db.fintech.com:3306/icwdev?cacheCallableStmts=true
&callableStmtCacheSize=1000&connectionAttributes=program_name:loan-application-api
This attribute labels every connection to the respective program name. This connection identifier helps segregate database issues with a specific service. Apart from diagnosing issues, it helps in enhanced monitoring. We can build query performance metrics and error rates for specific microservices.
Purging Large Tables
For large tables, we purge the data based on purging criteria. Utility tables managing user access tokens are periodically purged. To manage purging effectively, we implement partitioning on the tables. Partitioning is time-based
Index Columns Used in Join
We ensure that all columns listed in join conditions are indexed. By including index columns in join conditions, the database engine filters the data efficiently using indexed data, thus avoiding the entire table scan.
Keep Transactions Smaller
In high-concurrency applications, we must ensure smaller transactions for optimal database performance. Smaller transactions reduce the duration of data locking and minimize resource contention, which helps reduce resource contention, avoid deadlocks, and improve the transaction success rate.
Miscellaneous Common Asks
- We need to ensure that
UPDATE
statements include aWHERE
clause. At a FinTech organization, some complex stored procedures missed theWHERE
clause, leading to unintended behaviors in the test environment. - Strict no to
“SELECT *”
in queries.
Conclusion
We have covered common pitfalls and resolutions for fine-tuning the application and underlying MySQL database performance through the FinTech application case study. Here are some important suggestions: avoid using complex views in stored procedures, stay away from leading wildcard searches, monitor queries with full table scans, and take appropriate corrective actions. Perform delete operations during offline hours through the batch job. Avoid multiple roundtrips to the server for multi-inserts; instead, use batching. Define a purging strategy. Have indexes defined on all join columns, and finally, try to keep the transaction scope smaller. With these recommendations incorporated, we can provide optimal database performance and faster end-user applications. Continuous monitoring and proactive issue resolution help maintain consistent application performance and availability.