Updating InnoDB Table Statistics Manually
In this post, we will discuss how to fix cardinality for InnoDB tables manually.
As a support engineer, I often see situations in which the cardinality of a table is not correct. When InnoDB calculates the cardinality of an index, it does not scan the full table by default. Instead, it looks at random pages, as determined by options innodb_stats_sample_pages, innodb_stats_transient_sample_pages, and innodb_stats_persistent_sample_pages, or by the CREATE TABLE
option STATS_SAMPLE_PAGES
. The default value for persistent statistics is 20. This approach works fine when the number of unique values in your secondary key grows in step with the size of the table. But what if you have a column that has a comparatively small number of unique values? This could be a common service, a many-to-many relationship table, for example, or just a table containing a list of sell orders that belong to one of a dozen shops owned by the company. Such tables could grow up to billions of rows with a small (less than 100) number of unique shop IDs.
At some point, InnoDB will report the wrong values for such indexes. Really! If 20 pages have 100 unique shop IDs, how many unique shop IDs would 20,000 pages have? 100 times 1,000? This seems logical, and after a certain number of rows, such indexes will have extraordinarily large cardinality values.
ANALYZE TABLE
will not help because it uses the same algorithm. Increasing the number of "stats" sample pages would help, but it has its own downside: the more pages you have to examine, the slower ANALYZE TABLE
runs. While this command is not blocking, it still creates side effects as described in this blog post. And the longer it runs, the less control you have.
Another issue with InnoDB statistics is that even if it is persistent and STATS_AUTO_RECALC
is set to 0, it still adds values for secondary indexes as shown in lp:1538765. Eventually, after you insert million of rows, your statistics get corrupted. ANALYZE TABLE
can fix it only if you specify a very large number of "stats" sample pages.
Can We Do Anything About It?
InnoDB stores statistics in the "mysql" database in the tables innodb_table_stats
and innodb_index_stats
. Since they are regular MySQL tables, privileged users can access them. We can update them and modify statistics as we like. And these statistics are used by the Optimizer!
I created a small example showing how to do this trick. I used Percona Server for MySQL version 5.7.19, but the trick will work on any supported MySQL and Percona Server for MySQL version.
First, let's create test tables. The first table has shops, with a few shop profiles with the shop ID and name:
create table shops(
shop_id int not null auto_increment primary key,
name varchar(32)
) engine=innodb;
The second table refers to the "shops" table:
create table goods(
id int not null auto_increment primary key,
shop_id int not null,
name varchar(32),
create_date datetime DEFAULT NULL,
key (shop_id, create_date)
) engine=innodb;
Let's check how many unique shops we have:
mysql> select count(distinct shop_id) from shops;
+-------------------------+
| count(distinct shop_id) |
+-------------------------+
| 100 |
+-------------------------+
1 row in set (0.02 sec)
With 100 distinct shops and a key on (shop_id, create_date)
, we expect cardinality in table goods to be not much different than this query result:
mysql> select count(distinct id) as `Cardinality for PRIMARY`,
-> count(distinct shop_id) as `Cardinality for shop_id column in index shop_id`,
-> count(distinct shop_id, create_date) as `Cardinality for create_date column in index shop_id`
-> from goods
*************************** 1. row ***************************
Cardinality for PRIMARY: 8000000
Cardinality for shop_id column in index shop_id: 100
Cardinality for create_date column in index shop_id: 169861
1 row in set (2 min 8.74 sec)
However, SHOW INDEX
returns dramatically different values for the column:
mysql> show index from goods;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| goods | 0 | PRIMARY | 1 | id | A | 7289724 | NULL | NULL | | BTREE | | |
| goods | 1 | shop_id | 1 | shop_id | A | 13587 | NULL | NULL | | BTREE | | |
| goods | 1 | shop_id | 2 | create_date | A | 178787 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.09 sec)
ANALYZE TABLE
does not help:
mysql> analyze table goods;
+------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.goods | analyze | status | OK |
+------------+---------+----------+----------+
1 row in set (0.88 sec)
mysql> show index from goods;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| goods | 0 | PRIMARY | 1 | id | A | 7765796 | NULL | NULL | | BTREE | | |
| goods | 1 | shop_id | 1 | shop_id | A | 14523 | NULL | NULL | | BTREE | | |
| goods | 1 | shop_id | 2 | create_date | A | 168168 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)
As a result, if we join the two tables, Optimizer chooses the wrong JOIN
order and query execution plan:
mysql> explain select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
| 1 | SIMPLE | shops | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 100 | 100.00 | Using where; Using index |
| 1 | SIMPLE | goods | NULL | ref | shop_id | shop_id | 4 | test.shops.shop_id | 534 | 11.11 | Using index condition |
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
2 rows in set, 1 warning (0.13 sec)
mysql> P md5sum
PAGER set to 'md5sum'
mysql> select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
4a94dabc4bfbfb7dd225bcb50278055b -
31896 rows in set (43.32 sec)
If compared to STRAIGHT_JOIN
order:
mysql> explain select goods.* from goods straight_join shops on(goods.shop_id = shops.shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
| 1 | SIMPLE | goods | NULL | range | shop_id | shop_id | 10 | NULL | 31997 | 100.00 | Using index condition |
| 1 | SIMPLE | shops | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.goods.shop_id | 1 | 100.00 | Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
2 rows in set, 1 warning (0.14 sec)
mysql> P md5sum
PAGER set to 'md5sum'
mysql> select goods.* from goods straight_join shops on(goods.shop_id = shops.shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
4a94dabc4bfbfb7dd225bcb50278055b -
31896 rows in set (7.94 sec)
The time difference for a small 8M row table is around six times! For a big table with many columns, it would be even larger.
Is STRAIGHT_JOIN
the only solution for this case?
No! It's also not a great solution because if the query is complicated and involves more than two tables, it may be affected by bug fixes and improvements in the Optimizer code. Then the query order might not be optimal for new versions and updates. Therefore, you'll need to test such queries at each upgrade, including minor ones.
So why does not work? Because the default number of pages it uses to calculate statistics is too small for the difference. You can increase the table option STATS_SAMPLE_PAGES until you find a proper one. The drawback is that the greater you set STATS_SAMPLE_PAGES , the longer it takes for to finish. Also, if you update a large portion of the table, you are often affected by lp:1538765. At some point, the statistics will again be inaccurate.
Now Let's Try Our Manual Statistics Update Trick
InnoDB stores its persistent statistics in the tables mysql.innodb_table_stats
and mysql.innodb_index_stats
:
mysql> alter table goods stats_persistent=1, stats_auto_recalc=0;
Query OK, 0 rows affected (0.11 sec)
Records: 0 Duplicates: 0 Warnings: 0
+---------------+------------+---------------------+---------+----------------------+--------------------------+
| database_name | table_name | last_update | n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+------------+---------------------+---------+----------------------+--------------------------+
| test | goods | 2017-09-05 00:21:12 | 7765796 | 34624 | 17600 |
+---------------+------------+---------------------+---------+----------------------+--------------------------+
1 row in set (0.00 sec)
mysql> select * from mysql.innodb_index_stats where table_name='goods';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| test | goods | PRIMARY | 2017-09-05 00:21:12 | n_diff_pfx01 | 7765796 | 20 | id |
| test | goods | PRIMARY | 2017-09-05 00:21:12 | n_leaf_pages | 34484 | NULL | Number of leaf pages in the index |
| test | goods | PRIMARY | 2017-09-05 00:21:12 | size | 34624 | NULL | Number of pages in the index |
| test | goods | shop_id | 2017-09-05 00:21:12 | n_diff_pfx01 | 14523 | 20 | shop_id |
| test | goods | shop_id | 2017-09-05 00:21:12 | n_diff_pfx02 | 168168 | 20 | shop_id,create_date |
| test | goods | shop_id | 2017-09-05 00:21:12 | n_diff_pfx03 | 8045310 | 20 | shop_id,create_date,id |
| test | goods | shop_id | 2017-09-05 00:21:12 | n_leaf_pages | 15288 | NULL | Number of leaf pages in the index |
| test | goods | shop_id | 2017-09-05 00:21:12 | size | 17600 | NULL | Number of pages in the index |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
8 rows in set (0.00 sec)
And we can update these tables directly:
mysql> update mysql.innodb_table_stats set n_rows=8000000 where table_name='goods';
Query OK, 1 row affected (0.18 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=8000000 where stat_description in('id', 'shop_id,create_date,id') and table_name='goods';
Query OK, 2 rows affected (0.08 sec)
Rows matched: 2 Changed: 2 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=100 where stat_description in('shop_id') and table_name='goods';
Query OK, 1 row affected (0.09 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=169861 where stat_description in('shop_id,create_date') and table_name='goods';
Query OK, 1 row affected (0.08 sec)
Rows matched: 1 Changed: 1 Warnings: 0
I took index values from earlier, as calculated by this query:
select count(distinct id) as `Cardinality for PRIMARY`, count(distinct shop_id) as `Cardinality for shop_id column in index shop_id`, count(distinct shop_id, create_date) as `Cardinality for create_date column in index shop_id` from goods;
mysql> select * from mysql.innodb_table_stats where table_name='goods';
+---------------+------------+---------------------+---------+----------------------+--------------------------+
| database_name | table_name | last_update | n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+------------+---------------------+---------+----------------------+--------------------------+
| test | goods | 2017-09-05 00:47:45 | 8000000 | 34624 | 17600 |
+---------------+------------+---------------------+---------+----------------------+--------------------------+
1 row in set (0.00 sec)
mysql> select * from mysql.innodb_index_stats where table_name='goods';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| test | goods | PRIMARY | 2017-09-05 00:48:32 | n_diff_pfx01 | 8000000 | 20 | id |
| test | goods | PRIMARY | 2017-09-05 00:21:12 | n_leaf_pages | 34484 | NULL | Number of leaf pages in the index |
| test | goods | PRIMARY | 2017-09-05 00:21:12 | size | 34624 | NULL | Number of pages in the index |
| test | goods | shop_id | 2017-09-05 00:49:13 | n_diff_pfx01 | 100 | 20 | shop_id |
| test | goods | shop_id | 2017-09-05 00:49:26 | n_diff_pfx02 | 169861 | 20 | shop_id,create_date |
| test | goods | shop_id | 2017-09-05 00:48:32 | n_diff_pfx03 | 8000000 | 20 | shop_id,create_date,id |
| test | goods | shop_id | 2017-09-05 00:21:12 | n_leaf_pages | 15288 | NULL | Number of leaf pages in the index |
| test | goods | shop_id | 2017-09-05 00:21:12 | size | 17600 | NULL | Number of pages in the index |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
8 rows in set (0.00 sec)
Now the statistics are up to date, but not used:
mysql> explain select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
| 1 | SIMPLE | shops | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 100 | 100.00 | Using where; Using index |
| 1 | SIMPLE | goods | NULL | ref | shop_id | shop_id | 4 | test.shops.shop_id | 534 | 11.11 | Using index condition |
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
2 rows in set, 1 warning (0.04 sec)
To finalize the changes, we need to run FLUSH TABLE goods
:
mysql> FLUSH TABLE goods;
Query OK, 0 rows affected (0.00 sec)
mysql> explain select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
| 1 | SIMPLE | goods | NULL | range | shop_id | shop_id | 10 | NULL | 31997 | 100.00 | Using index condition |
| 1 | SIMPLE | shops | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.goods.shop_id | 1 | 100.00 | Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
2 rows in set, 1 warning (0.28 sec)
mysql> P md5sum
PAGER set to 'md5sum'
mysql> select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
4a94dabc4bfbfb7dd225bcb50278055b -
31896 rows in set (7.79 sec)
Now everything is good.
But FLUSH TABLE
is a blocking operation, right? Won't it block queries and create a worse scenario than described for ANALYZE TABLE
in this post?
At first glance, this is true. But we can use the same trick Percona Toolkit uses: set lock_wait_timeout
to 1 and call FLUSH
in a loop. To demonstrate how it works, I use a similar scenario as described in the ANALYZE TABLE
blog post.
First, let's reset the statistics to ensure our FLUSH
works as expected:
mysql> analyze table goods;
+------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.goods | analyze | status | OK |
+------------+---------+----------+----------+
1 row in set (0.38 sec)
mysql> show indexes from goods;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| goods | 0 | PRIMARY | 1 | id | A | 7765796 | NULL | NULL | | BTREE | | |
| goods | 1 | shop_id | 1 | shop_id | A | 14523 | NULL | NULL | | BTREE | | |
| goods | 1 | shop_id | 2 | create_date | A | 168168 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)
And then update mysql.innodb_*_stats
tables manually. Then check that Optimizer still sees outdated statistics:
mysql> update mysql.innodb_table_stats set n_rows=8000000 where table_name='goods';
Query OK, 1 row affected (0.09 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=8000000 where stat_description in('id', 'shop_id,create_date,id') and table_name='goods';
Query OK, 2 rows affected (0.09 sec)
Rows matched: 2 Changed: 2 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=100 where stat_description in('shop_id') and table_name='goods';
Query OK, 1 row affected (0.11 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=169861 where stat_description in('shop_id,create_date') and table_name='goods';
Query OK, 1 row affected (0.10 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> show indexes from goods;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| goods | 0 | PRIMARY | 1 | id | A | 7765796 | NULL | NULL | | BTREE | | |
| goods | 1 | shop_id | 1 | shop_id | A | 14523 | NULL | NULL | | BTREE | | |
| goods | 1 | shop_id | 2 | create_date | A | 168168 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)
Now let's start a long running query in one session that blocks our FLUSH TABLE
command:
mysql> select sleep(1) from goods limit 1000, 300;
And let's run FLUSH TABLE
in a loop:
sveta@Thinkie:~/build/ps-5.7/mysql-test$ until (`mysqlmtr -P13001 -e "set lock_wait_timeout=1; flush table goods;" test`); do sleep 1; done
ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
...
Now let's ensure we can access the table:
mysql> select * from goods order by id limit 10;
^C
We cannot! We cannot even connect to the database where the table is stored:
sveta@Thinkie:~/build/ps-5.7/mysql-test$ mysqlmtr -P13001 test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
^C
The reason for this is that while the FLUSH TABLE
command was killed due to the metadata lock wait timeout, it also requested table lock for flushing and blocked other incoming queries.
But we can enclose FLUSH TABLE
into LOCK TABLE ... WRITE; ... UNLOCK TABLES;
operations. In this case, the LOCK TABLE
command gets blocked until all queries release metadata lock on the table. Then it exclusively locks the table, FLUSH TABLE
runs, and then the script immediately unlocks the table. Since closing the session causes an implicit unlock, I used a PHP one-liner to have everything in a single session:
$ php -r '
> $link = new mysqli("127.0.0.1", "root", "", "test", 13001);
> $link->query("set lock_wait_timeout=1");
> while(!$link->query("lock table goods write")) {sleep(1);}
> $link->query("flush table goods");
> $link->query("unlock tables");'
We can confirm if a parallel session can access the table:
mysql> select * from goods order by id limit 10;
+----+---------+----------------------------------+---------------------+
| id | shop_id | name | create_date |
+----+---------+----------------------------------+---------------------+
| 1 | 58 | 5K0z2sHTgjWKKdryTaniQdZmjGjA9wls | 2015-09-19 00:00:00 |
| 2 | 17 | xNll02kgUTWAFURj6j5lL1zXAubG0THG | 2013-10-19 00:00:00 |
| 3 | 30 | clHX7uQopKmoTtEFH5LYBgQncsxRtTIB | 2017-08-01 00:00:00 |
| 4 | 93 | bAzoQTN98AmFjPOZs7PGfbiGfaf9Ye4b | 2013-02-24 00:00:00 |
| 5 | 20 | rQuTO5GHjP60kDbN6WoPpE2S8TtMbrVL | 2017-08-05 00:00:00 |
| 6 | 37 | WxqxA5tBHxikaKbuvbIF84H9QuaCnqQ3 | 2013-10-18 00:00:00 |
| 7 | 13 | DoYnFpQZSVV8UswBsWklgGBUc8zW9mVW | 2017-02-06 00:00:00 |
| 8 | 81 | dkNxMQyZNZuTrONEX4gxRLa0DOedatIs | 2015-07-05 00:00:00 |
| 9 | 12 | Z0t2uQ9itexpPf01KUpa7qBWlT5fBmXR | 2014-06-25 00:00:00 |
| 10 | 90 | 6urABBQyaUVVyxljvd11D3kUxbdDRPRV | 2013-10-23 00:00:00 |
+----+---------+----------------------------------+---------------------+
10 rows in set (0.00 sec)
mysql> update goods set name='test' where id=100;
Query OK, 1 row affected (0.08 sec)
Rows matched: 1 Changed: 1 Warnings: 0
After the PHP script finishes its job, statistics are corrected:
mysql> show index from goods;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| goods | 0 | PRIMARY | 1 | id | A | 8000000 | NULL | NULL | | BTREE | | |
| goods | 1 | shop_id | 1 | shop_id | A | 100 | NULL | NULL | | BTREE | | |
| goods | 1 | shop_id | 2 | create_date | A | 169861 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)
Conclusion
We can manually update persistent InnoDB statistics to fix Optimizer plans for our queries, with almost no impact on a live server.