600倍效率提升sql查询优化从几十秒到几十毫秒
背景
运维反馈说线上一个接口越来越慢,大概二三十秒才有返回。
查看接口代码,定位问题出在sql查询效率上。
sql意图是将多个号码最新的数据返回给前端使用。
单表数据量 530万左右,id为主键,phone为普通索引优化过程原sql如下,通过in + 子查询方式查询select * from t_user_track where id in (select max(id) as id from t_user_track where phone in ("xxxxx", "xxxxx") group by phone)
执行时间在30秒左右,
explain查看执行计划
可以看出子查询走上了索引,主查询没有走索引,全表查询,rows=2333216同事写了一段新的sql来查询,如下select * from ( select DISTINCT * from t_user_track where phone in ("xxxxx", "xxxx") order by locate_time desc ) t group by phone;
执行时间在4秒左右
查看执行计划
派生表查询走上了phone的索引,rows=157108,主查询没有走上索引,row=157108改成如下sql,关联表查询select * from t_user_track t1, (select max(id) as id from t_user_track where phone in ("xxxxxx", "xxxxx") group by phone) t2 where t1.id = t2.id;
查询时间为0.04秒
执行计划如下
改成inner join关联表,如下sqlselect * from t_user_track t1 INNER JOIN (select max(id) as id from t_user_track where phone in ("xxxxxx", "xxxxx") group by phone) t2 on t1.id = t2.id
执行时间为0.041秒
执行计划如下
结果
3、4两种关联表查询效率最高原理分析3、4两种连表查询只有一次查询,且通过小表驱动大表,所以查询效率较高第一种in + 子查询的方式,需要两次查询,查询效率较低
留下一个疑问in+子查询的方式,为什么子查询走了索引,主查询却没有走索引mysql官方文档
mysql手册也提到过,具体的原文在mysql文档的这个章节:
I.3. Restrictions on Subqueries
13.2.8. Subquery Syntax
摘抄:
1)关于使用IN的子查询:
Subquery optimization for IN is not as effective as for the = operator or for IN( value_list ) constructs.
A typical case for poor IN subquery performance is when the subquery returns a small number of rows but the outer query returns a large number of rows to be compared to the subquery result.
The problem is that, for a statement that uses an IN subquery, the optimizer rewrites it as a correlated subquery. Consider the following statement that uses an uncorrelated subquery: SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
The optimizer rewrites the statement to a correlated subquery: SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O( M × N ), rather than O( M + N ) as it would be for an uncorrelated subquery.
An implication is that an IN subquery can be much slower than a query written using an IN( value_list ) construct that lists the same values that the subquery would return.
2)关于把子查询转换成join的:
The optimizer is more mature for joins than for subqueries, so in many cases a statement that uses a subquery can be executed more efficiently if you rewrite it as a join.
An exception occurs for the case where an IN subquery can be rewritten as a SELECT DISTINCT join. Example: SELECT col FROM t1 WHERE id_col IN (SELECT id_col2 FROM t2 WHERE condition);
That statement can be rewritten as follows: SELECT DISTINCT col FROM t1, t2 WHERE t1.id_col = t2.id_col AND condition;
But in this case, the join requires an extra DISTINCT operation and is not more efficient than the subquery