MySQL学习笔记

查询截取分析

对mysql的优化：explain
SQL优化分析方法：
- 观察，至少跑一天，看看生产的慢SQL情况。
- 开启慢查询日志，设置阈值，比如超过5秒的就是慢SQL，并将其抓取出来。
- explain+慢SQL分析
- show profile查询SQL在MySQL服务器里面的执行细节和生命周期情况。
- 运维经理或DBA，进行SQL数据库服务器的参数调优。

小表驱动大表

例如以下两种情况，第一种情况是更好的：

for(int i = 5; ...)
{
   
    for(int j = 1000;...)
    {
   
        
    }
}

for(int i = 1000; ...)
{
   
    for(int j = 5;...)
    {
   
        
    }
}

当两个表中B表的数据集小于A表的数据集时，用in优于exists。当A表的数据集小于B表的数据集时，用exists优于in。

select * from A where id in (select id from B);
select * from A where exists (select 1 from B where B.id=A.id);

exists只返回true或false，因此子查询中的select后面写啥都无所谓，因为会被忽略，没有区别。在执行过程中，可能经过了优化而不是我们理解上的逐条对比，如果担心效率问题，可以进行实际检验以确定是否有效率问题。exists子查询往往也可以用条件表达式、其他子查询或者join来替代，何种最优需要具体问题具体分析。

order by关键字优化

order by字句，尽量使用index方式排序，避免使用filesort方式排序。

mysql> create table tblA(
    -> age int,
    -> birth timestamp not null
    -> );
Query OK, 0 rows affected (0.00 sec)

mysql> insert into tblA(age,birth) values (22,now());
Query OK, 1 row affected (0.01 sec)

mysql> insert into tblA(age,birth) values (23,now());
Query OK, 1 row affected (0.00 sec)

mysql> insert into tblA(age,birth) values (23,now());
Query OK, 1 row affected (0.00 sec)

mysql> insert into tblA(age,birth) values (24,now());
Query OK, 1 row affected (0.00 sec)

mysql> create index idx_A_ageBirth on tblA(age,birth);
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain select * from tblA where age > 20 order by age;
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys  | key            | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | tblA  | NULL       | index | idx_A_ageBirth | idx_A_ageBirth | 9       | NULL |    4 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select * from tblA where age > 20 order by age,birth;
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys  | key            | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | tblA  | NULL       | index | idx_A_ageBirth | idx_A_ageBirth | 9       | NULL |    4 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select * from tblA where age > 20 order by birth;
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys  | key            | key_len | ref  | rows | filtered | Extra                                    |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+------------------------------------------+
|  1 | SIMPLE      | tblA  | NULL       | index | idx_A_ageBirth | idx_A_ageBirth | 9       | NULL |    4 |   100.00 | Using where; Using index; Using filesort |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select * from tblA where age > 20 order by birth,age;
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys  | key            | key_len | ref  | rows | filtered | Extra                                    |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+------------------------------------------+
|  1 | SIMPLE      | tblA  | NULL       | index | idx_A_ageBirth | idx_A_ageBirth | 9       | NULL |    4 |   100.00 | Using where; Using index; Using filesort |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select * from tblA order by birth;
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+-----------------------------+
| id | select_type | table | partitions | type  | possible_keys | key            | key_len | ref  | rows | filtered | Extra                       |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+-----------------------------+
|  1 | SIMPLE      | tblA  | NULL       | index | NULL          | idx_A_ageBirth | 9       | NULL |    4 |   100.00 | Using index; Using filesort |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+-----------------------------+
1 row in set, 1 warning (0.00 sec)

mysql支持两种排序，using index和using filesort，前者一定优于后者。order by满足两种情况，会使用index方式排序：orderby语句使用索引最左前列；where字句和order by字句条件列组合满足索引最左前列。
如果不在索引列上，filesort有两种算法，mysql就要启动双路排序和单路排序。
双路排序：MySQL 4.1之前使用双路排序，读取行指针和order by列，对他们进行排序，然后扫描已经排序好的列表，按照列表中的值重新从列表中读取数据。从磁盘取排序字段，在buffer中进行排序，再从磁盘取其他字段。
MySQL 4.1之后，出现了第二种改进算法，就是单路排序。
单路排序：从磁盘读取查询需要的所有列，按照order by列对buffer对他们进行排序，然后扫描排序后的列表进行输出，这样的效率更快一些，避免了第二次读取数据。并且把随机IO变成了顺序IO，但是它会使用更多的空间，因为把每一行都保存在内存中了。
单路排序总体而言会好过双路。但是会有问题：
在sort_buffer中，后面的方法会比前面的占用更多空间，因为单路排序是把所有字段都取出，所以有可能取出的数据的总大小超出了sort_buffer容量，导致每次只能取sort_buffer容量大小的数据进行排序（创建tmp文件，多路合并），排完再取sort_buffer容量大小，以此往复，从而导致多次IO。本来想省一次IO操作，结果反而导致了大量IO操作，偷鸡不成蚀把米。
优化策略：增大sort_buffer_size参数的设置；增大max_length_for_sort_data参数的设置。

为排序使用索引

MySQL两种排序方式：文件排序或扫描有序索引排序
MySQL能为排序和查询使用相同的索引
假设key a_b_c (a,b,c);
order by 能使用索引最左前缀
- order by a
- order by a,b
- order by a,b,c
- order by a DESC, b DESC, c DESC
如果where使用索引的最左前缀定义为常亮，则order by能使用索引
- where a=const order by b,c
- where a=const and b = const order by c
- where a=const order by b,c
- where a=const and b>const order by b,c
不能用索引进行排序
- order by a ASC, b DESC, c DESC 排序不一致
- where g=const order by b,c 丢失a索引
- where a=const order by c 丢失b索引
- where a=const order by a,d d不是索引的一部分
- where a in (…) order by b,c 对于排序来说，多个相等条件也是范围查询。

group by关键字优化

与order by的原则相同。只是group by实质是先排序后进行分组，遵照索引建的最佳左前缀。当无法使用索引列，增大max_length_for_sort_data参数的设置+增大sort_buffer_size参数的设置。where高于having，能写在where限定的条件就不要去having限定了。

慢查询日志

MySQL的慢查询日志是MySQL提供的一种日志记录，它用来记录在MySQL中响应时间超过阈值的语句，具体指运行时间超过long_query_time的值的SQL，则会被记录到慢查询日志中。long_query_time的默认值是10，意思是运行十秒以上的语句。
由他来查看哪些SQL超出了我们的最大忍耐时间值，比如一条SQL执行超过五秒钟，我们就算慢SQL，希望能收集超过5秒的SQL，结合之前的explain进行全面分析。
默认MySQL数据库没有开启慢查询日志，需要手动来设置这个参数。当然如果不是调优需要的话，一般不建议启动该参数，因为开启慢查询日志会或多或少带来一定性能影响。慢查询日志支持将日志记录写入文件。
查看是否开启及如何开启：默认show variables like '%slow_query_log%';开启：set global slow_query_log=1;只对当前数据库生效，如果MySQL重启则会失效。

mysql> show variables like '%slow_query_log%';
+---------------------+------------------------------------+
| Variable_name       | Value                              |
+---------------------+------------------------------------+
| slow_query_log      | OFF                                |
| slow_query_log_file | /var/lib/mysql/lancibe-PC-slow.log |
+---------------------+------------------------------------+
2 rows in set (0.00 sec)

mysql> set global slow_query_log=1;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like '%slow_query_log%';
+---------------------+------------------------------------+
| Variable_name       | Value                              |
+---------------------+------------------------------------+
| slow_query_log      | ON                                 |
| slow_query_log_file | /var/lib/mysql/lancibe-PC-slow.log |
+---------------------+------------------------------------+
2 rows in set (0.00 sec)

如果想要永久生效，就必须修改配置文件my.cnf，在[mysqld]下增加或修改参数slow_query_log和slow_query_log_file后，重启MySQL服务器。也将如下两行配置写进my.cnf文件：

slow_query_log=1
slow_query_log_file=/var/lib/mysql/lancbie-PC-slow.log

由参数long_query_time控制，默认情况下long_query_time的值是10秒。命令：show variables like 'long_query_time%';

mysql> show variables like 'long_query_time%';
+-----------------+-----------+
| Variable_name   | Value     |
+-----------------+-----------+
| long_query_time | 10.000000 |
+-----------------+-----------+
1 row in set (0.00 sec)

可以使用命令修改，也可以在my.cnf参数里面修改。
假如运行时间正好等于long_query_time的情况，并不会被记录下来。也就是说，在MySQL源码里是判断大于long_query_time，而非大于等于。使用set global long_query_time = 3;但是修改之后再查询还是10，设置后看不出变化。此时需要重新连接或断开一个会话才能看到修改值。show variables like 'long_query_time%';
使用select sleep(4)命令模仿一个执行超过3秒的SQL。

lancibe@lancibe-PC:~$ sudo cat /var/lib/mysql/lancibe-PC-slow.log
/usr/sbin/mysqld, Version: 5.7.33 (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /var/run/mysqld/mysqld.sock
Time                 Id Command    Argument
# Time: 2021-04-27T08:45:27.450446Z
# User@Host: root[root] @ localhost []  Id:     4
# Query_time: 4.000161  Lock_time: 0.000000 Rows_sent: 1  Rows_examined: 0
use db26;
SET timestamp=1619513127;
select sleep(4);

日志分析工具mysqldumpslow

lancibe@lancibe-PC:~$ mysqldumpslow --help
Usage: mysqldumpslow [ OPTS... ] [ LOGS... ]

Parse and summarize the MySQL slow query log. Options are

  --verbose    verbose
  --debug      debug
  --help       write this text to standard output

  -v           verbose
  -d           debug
  -s ORDER     what to sort by (al, at, ar, c, l, r, t), 'at' is default
                al: average lock time
                ar: average rows sent
                at: average query time
                 c: count
                 l: lock time
                 r: rows sent
                 t: query time  
  -r           reverse the sort order (largest last instead of first)
  -t NUM       just show the top n queries
  -a           don't abstract all numbers to N and strings to 'S'
  -n NUM       abstract numbers with at least n digits within names
  -g PATTERN   grep: only consider stmts that include this string
  -h HOSTNAME  hostname of db server for *-slow.log filename (can be wildcard),
               default is '*', i.e. match all
  -i NAME      name of server instance (if using mysql.server startup script)
  -l           don't subtract lock time from total time

s：是表示按照何种方式排序
c：访问次数
l：锁定时间
r：返回记录
t：查询时间
al：平均锁定时间
ar：平均返回记录数
at：平均查询时间
t：即为返回前面多少条的数据
g：后面搭配一个正则匹配模式，大小写不敏感。
例如：得到返回记录集最多的10个SQLmysqldumpslow -s r -t 10 /var/lib/mysql/lancibe-PC-slow.log。得到访问次数最多的10个SQLmysqldumpslow -s c -t 10 /var/lib/mysql/lancibe-PC-slow.log。得到按照时间排序的前10条里面含有左连接的查询语句mysqldumpslow -s t -t 10 -g "left join" /var/lib/mysql/lancibe-PC-slow.log。另外建议在使用这些命令时结合|和more使用，否则有可能出现爆屏的情况。

批量数据脚本

mysql> create table dept(
    -> id int unsigned primary key auto_increment,
    -> deptno mediumint unsigned not null default 0,
    -> dname varchar(20) not null default "",
    -> loc varchar(13) not null default "" 
    -> )engine=innodb default charset=GBK;
Query OK, 0 rows affected (0.00 sec)

mysql> create table emp( 
    -> id int unsigned primary key auto_increment, 
    -> empno mediumint unsigned not null default 0, 
    -> ename varchar(20) not null default "", 
    -> job varchar(9) not null default "", 
    -> mgr mediumint unsigned not null default 0, 
    -> hiredate date not null, 
    -> sal decimal(7,2) not null, 
    -> comm decimal(7,2) not null, 
    -> deptno mediumint unsigned not null default 0 
    -> )engine=innodb default charset=GBK;
Query OK, 0 rows affected (0.01 sec)

设置参数log_bin_trust_function_creators
这是因为创建函数时，有可能报错：This function has none of DETERMINISTIC… 这是由于开启过慢查询日志，因为开启了bin-log，所以就必须为function指定一个参数。

show variables like 'log_bin_trust_function_creators';
set global log_bin_trust_function_creators=1;

这样添加了参数之后，如果mysqld重启，上述参数又会消失，解决办法是在/etc/my.cnf下my.cnf[mysqld]加上log_bin_trust_function_creators=1
创建函数，保证每条数据都不同：随机产生字符串、随机产生部门编号。

#用于随机产生字符串
mysql> delimiter && /*这句话改变了分行符*/
mysql> create function rand_string(n INT) returns varchar(255)
    -> begin
    -> declare chars_str varchar(100) default 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
    -> declare return_str varchar(255) default '';
    -> declare i int default 0;
    -> while i<n do
    -> set return_str = concat(return_str,substring(chars_str,floor(1+rand()*52),1));
    -> set i=i+1;
    -> end while;
    -> return return_str;
    -> end &&
Query OK, 0 rows affected (0.00 sec)

mysql> select now() from dual;
    -> &&
+---------------------+
| now()               |
+---------------------+
| 2021-05-06 10:51:44 |
+---------------------+
1 row in set (0.00 sec)

#随机产生部门编号
mysql> delimiter $$
mysql> create function rand_num() 
	-> returns int(5) 
	-> begin 
	-> declare i int default 0; 
	-> set i = floor(100+rand()*10); 
	-> return i; 
	-> end $$
Query OK, 0 rows affected (0.00 sec)

创建存储过程：创建往emp表中插入数据的存储过程，创建往dept表中插入数据的存储过程。
过程和函数的区别是，过程不需要返回值，而函数需要。

mysql> create procedure insert_emp(in start int(10), in max_num int(10))
    -> begin
    -> declare i int default 0;
    -> set autocommit = 0;
    -> repeat
    -> set i = i + 1;
    -> insert into emp(empno,ename,job,mgr,hiredate,sal,comm,deptno)values((start + i),rand_string(6),'SALESMAN',0001,CURDATE(),2000,400,rand_num());
    -> until i = max_num
    -> end repeat;
    -> commit;
    -> end $$
Query OK, 0 rows affected (0.00 sec)

mysql> create procedure insert_dept(in start int(10), in max_num int(10))
    -> begin 
    -> declare i int default 0;
    -> set autocommit=0;
    -> repeat
    -> set i=i+1;
    -> insert into dept(deptno,dname,loc)values((start+i),rand_string(10),rand_string(8));
    -> until i=max_num
    -> end repeat;
    -> commit;
    -> end $$
Query OK, 0 rows affected (0.00 sec)

调用存储过程，注意员工表有一个部门表的外键，所以先插部门表。

mysql> delimiter ;
mysql> call insert_dept(100,10);\
Query OK, 0 rows affected (0.00 sec)

mysql> select * from dept limit 1000;
+----+--------+------------+----------+
| id | deptno | dname      | loc      |
+----+--------+------------+----------+
|  1 |    101 | cKsIougvMw | XwrrMHcn |
|  2 |    102 | hYufpkdOHU | DfSZtVum |
|  3 |    103 | asMIfAkyPB | pUDieYCT |
|  4 |    104 | JSjqexZFcw | cVvquZOV |
|  5 |    105 | koRrgIUBWg | PECYllvw |
|  6 |    106 | UiiqdvThcQ | ZwlSCjnN |
|  7 |    107 | cXGpEczqex | dVwsCHEY |
|  8 |    108 | fFMRZudhAg | daQcOPFJ |
|  9 |    109 | DNhxPIUxGM | POCsiMmX |
| 10 |    110 | bqzBfaKxhu | CAWhXpHs |
+----+--------+------------+----------+
10 rows in set (0.00 sec)

#执行存储过程，往emp表添加50万条数据
mysql> call insert_emp(100001,500000);
Query OK, 0 rows affected (29.76 sec)/*执行了近30秒*/

Show Profile

是mysql提供可以用来分析当前会话中语句执行的资源消耗情况。可以用于SQL的调优和测量。默认情况下参数处于关闭状态，并保存最近15次的运行结果。
分析步骤：
- 是否支持，看看当前的mysql版本是否支持show variables like 'profiling';
- 开启功能，默认是关闭，使用前需开启set profiling=on;
- 运行SQL
  - select * from emp group by id%10 limit 150000;
  - select * from emp group by id%20 order by 5;
- 查看结果，show profiles;
- 诊断SQL，show profile cpu,block io for query 上一步前面的问题SQL数字号码;
- 日常开发需要注意的结论
  - converting HEAP to MyISAM 查询结果太大，内存不够，存储到硬盘中。
  - Creating tmp table创建临时表
  - Copying to tmp table on disk 把内存中临时表复制到磁盘，危险！
  - locked

全局查询日志

只允许在测试环境用，绝对不可以在生产环境中使用。
配置启用：在my.cnf中设置：

#开启
general_log=1
#记录日志文件的路径
general_log_file=/path/log_file
#输出格式
log_output=FILE

编码启用：

set global general_log = 1;
set global log_output='TABLE';

此后，编写的sq语句，将会记录到mysql库里的general_log表，可以用下面的命令查看：

select * from mysql.general_log;