处理大事务难题的业务存储过程优化办法
一、问题呈现
1. 问题表象
业务调用存储过程未能成功,经检查发现存在大事务,单独执行时暴露问题。
greatsql> INSERT INTO
`ywdb1`.`t1`(
`TIMEKEY`,
`zbbh`,
`zcbl`,
`zcblms`,
`zjzh`,
`zjzhms`,
`cbzh`,
`ljzjzh`,
`xmbh`,
`xmmc`,
`sfgj`,
`dd`,
`ddsm`,
`cb`,
`cz`,
`bqzje`,
`bnzje`,
`ljzje`,
`zcjz`,
`zcje`,
`bqzzb`,
`bnjzzb`,
`bfqj`
)
SELECT
`TIMEKEY`,
`zbbh`,
`zcbl`,
`zcblms`,
`zjzh`,
`zjzhms`,
`cbzh`,
`ljzjzh`,
`xmbh`,
`xmmc`,
`sfgj`,
`dd`,
`ddsm`,
`cb`,
`cz`,
`bqzje`,
`bnzje`,
`ljzje`,
`zcjz`,
`zcje`,
`bqzzb`,
`bnjzzb`,
`bfqj`
FROM ywdb2.t2
WHERE
TIMEKEY = concat(substr(_sj, 1, 4), '-', substr(_sj, 5, 2));
ERROR 8532 (HY000): Execute backend SQL on node failed with 1197: Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase this mysqld variable and try again.
错误日志显示:
2025-02-20T00:36:01.969610+08:00 O [Note] [MY-011953] [InnoDB] Page cleaner took 10448ms to flush 2000 pages
2025-02-20T00:38:26.020404+08:00 O [Note] [MY-011953] [InnoDB] Page cleaner took 8427ms to flush 560 pages
2025-02-20T00:38:39.690478+08:00 3478 [ERROR] [MY-011735] [Rep] Plugin group_replication reported: '[GCS] gcs_packet's payload is too big. Only packets smaller than 2113929216 bytes can be compressed. Payload size is 3823615463.'
2025-02-20T00:38:39.690504+08:00 3478 [ERROR] [MY-011735] [Rep] Plugin group_replication reported: '[GCS] Error preparing the message for sending.'
2025-02-20T00:38:40.092629+08:00 3478 [ERROR] [MY-011614] [Rep] Plugin group_replication reported: 'Error while broadcasting the transaction to the group on session 3478'
2025-02-20T00:38:41.047249+08:00 3478 [ERROR] [MY-011207] [Rep] Run function 'before_commit' in plugin 'group_replication' failed
二、问题剖析
1.错误信息解读
ERROR 8532 (HY000)提示,执行的操作涉及多语句事务,且所需存储超出了max_binlog_cache_size的限制。
日志中([Error]):提及group_replication报告的消息,存在GCS_packet's payload is too big的报错。来自group_replication,强调负载过大,仅小于2113929216字节的包可发送。出现before_commit运行错误,致使无法在group_replication中完成事务。
2.参数配置分析
默认情况下,不建议大事务超过2G。查看参数:
greatsql> SHOW variables WHERE variable_name IN ('group_replication_transaction_size_limit','group_replication_compression_threshold');
+------------------------------------------+------------+
| Variable_name | Value |
+------------------------------------------+------------+
| group_replication_compression_threshold | 1000000 |
| group_replication_transaction_size_limit | 2147483647 |
+------------------------------------------+------------+
2 rows in set (0.00 sec)
- group_replication_transaction_size_limit :
- 值:2147483647
- 含义:该参数定义可复制事务的最大大小,单位字节。2147483647为2G。
- group_replication_compression_threshold :
- 值:1000000
- 含义:此参数表示组复制时,事务大小超该阈值(1,000,000字节即约1MB)会考虑压缩。即仅事务大小超1MB时,数据会压缩以减网络传输负担。
查看max_binlog_cache_size的值为4G,该参数限制binlog缓存中存储的最大字节数。
greatsql> SHOW variables LIKE '%max_binlog%';
+----------------------------+------------+
| Variable_name | Value |
+----------------------------+------------+
| max_binlog_cache_size | 4294967296 |
| max_binlog_size | 1073741824 |
| max_binlog_stmt_cache_size | 268435456 |
+----------------------------+------------+
3 rows in set (0.02 sec)
3.存储过程信息查看
查看此存储过程相关信息(INFORMATION_SCHEMA_ROUTINES):
greatsql> SELECT * FROM ROUTINES WHERE ROUTINE_DEFINITION LIKE '%t1%' \G
*************************** 1. row ***************************
SPECIFIC_NAME: sp_t1
ROUTINE_CATALOG: def
ROUTINE_SCHEMA: ywdb1
ROUTINE_NAME: sp_t1
ROUTINE_TYPE: PROCEDURE
DTD_IDENTIFIER: NULL
CHARACTER_MAXIMUM_LENGTH: NULL
CHARACTER_OCTET_LENGTH: NULL
NUMERIC_PRECISION: NULL
NUMERIC_SCALE: NULL
DATETIME_PRECISION: NULL
CHARACTER_SET_NAME: NULL
COLLATION_NAME: NULL
ROUTINE_BODY: SQL
ROUTINE_DEFINITION: BEGIN DECLARE EXIT HANDLER FOR SQLEXCEPTION BEGIN -- 发生错误时回滚事务
ROLLBACK;
-- 这里可以添加错误处理逻辑
END;
START TRANSACTION;
DELETE FROM t1 WHERE TIMEKEY = concat(substr(_sj,1,4),'-',substr(_sj,5,2));
INSERT INTO `ywdb1`.`t1` (
`zbbh`,
`zclb`,
`zcbl`,
`zcblms`,
`zjzh`,
`zjzhms`,
`cbzh`,
`cbzhms`,
`ljzjzh`,
`ljzjzhms`,
`xmbh`,
`xmmc`,
`sfgj`,
`dd`,
`ddsm`,
`cb`,
`cz`,
`bqzje`,
`bnzje`,
`ljzje`,
`zcjz`,
`zcje`,
`bqjzzb`,
`bnjzzb`,
`ljjzzb`,
`bfqj`
)
SELECT
`TIMEKEY`,
`zbbh`,
`zclb`,
`zcbl`,
`zcblms`,
`zjzh`,
`zjzhms`,
`cbzh`,
`cbzhms`,
`ljzjzh`,
`ljzjzhms`,
`xmbh`,
`xmmc`,
`sfgj`,
`dd`,
`ddsm`,
`cb`,
`cz`,
`bqzje`,
`bnzje`,
`ljzje`,
`zcjz`,
`zcje`,
`bqjzzb`,
`bnjzzb`,
`ljjzzb`,
`bfqj`
FROM ywdb2.t2
WHERE TIMEKEY = concat(substr(_sj,1,4),'-',substr(_sj,5,2))
GROUP BY `TIMEKEY`, `zbbh`, `zclb`, `zcbl`, `zcblms`, `zjzh`, `zjzhms`, `cbzh`, `cbzhms`, `ljzjzh`, `ljzjzhms`, `xmbh`, `xmmc`, `sfgj`, `dd`, `ddsm`, `cb`, `cz`, `bqzje`, `bnzje`, `ljzje`, `zcjz`, `zcje`, `bqjzzb`, `bnjzzb`, `ljjzzb`, `bfqj`;
--提交事务
COMMIT;
END
EXTERNAL_NAME: NULL
EXTERNAL_LANGUAGE: SQL
PARAMETER_STYLE: SQL
IS_DETERMINISTIC: NO
SQL_DATA_ACCESS: CONTAINS SQL
SQL_PATH: NULL
SECURITY_TYPE: DEFINER
CREATED: 2025-02-17 15:05:54
LAST_ALTERED: 2025-02-17 15:05:54
SQL_MODE: STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_
ENGINE_SUBSTITUTION
ROUTINE_COMMENT:
DEFINER: root@%
CHARACTER_SET_CLIENT: utf8mb4
COLLATION_CONNECTION: utf8mb4_0900_ai_ci
DATABASE_COLLATION: utf8mb4_0900_bin
业务上,ETL表后续清洗操作确实涉及大事务动作。
4.涉及表大小查看
greatsql> SELECT count(*) FROM ywdb1.t1;
+-------------+
| count(*) |
+-------------+
| 3663246 |
+------------ +
1 rows in set (47.42 sec)
greatsql> SELECT count(*) FROM ywdb2.t2;
+-------------+
| count(*) |
+-------------+
| 71580710 |
+------------ +
1 rows in set (34 min 54.12 sec)
查看ywdb2.t2的ibd文件大小(单表大小161G):
$ du -sh *
161G t2.ibd
可见,delete以及依据ywdb2.t2大表做insert ... select ... 均为大体量大事务,事务行数超500W条以上(单月账期数据量)。
三、存储过程改造途径
1.改造思路
改写应用程序,借助频繁提交等方式,将大事务转为小事务。
2.改造方式
INSERT...SELECT...语句,存储过程内改为循环,并每10000条提交一次。(使用游标)
以存储过程ywdb1.sp_t1的修改为例:
``sqlroot
greatsql> USE test_db1;
greatsql> CREATE TABLE test_t111 LIKE ywdb1.t1;
greatsql> CREATE TABLE test_t222 LIKE ywdb2.t2;
greatsql>
delimiter //
CREATE DEFINER=@%PROCEDUREtest_sp_t1`(_sj varchar(10))
BEGIN
-- 声明变量
DECLARE done int DEFAULT FALSE;
DECLARE v_timekey varchar(50);
DECLARE v_zbbh varchar(100) ;
DECLARE v_zclb varchar(100);
DECLARE v_zclbms varchar(100);
DECLARE v_zjzh varchar(100);
DECLARE v_zjzhms varchar(500);
DECLARE v_cbzh varchar(100);
DECLARE v_cbzhms varchar(100);
DECLARE v_ljzjzh varchar(100);
DECLARE v_ljzjzhms varchar(100);
DECLARE v_xmbh varchar(100);
DECLARE v_xmmc varchar(100);
DECLARE v_sfgj varchar(100);
DECLARE v_dd varchar(100);
DECLARE v_ddsm varchar(100);
DECLARE v_cb decimal(18,4);
DECLARE v_cz decimal(18,4);
DECLARE v_bqzje decimal(18,4);
DECLARE v_bnzje decimal(18,4);
DECLARE v_ljzje decimal(18,4);
DECLARE v_zcjz decimal(18,4);
DECLARE v_zcje decimal(18,4);
DECLARE v_bqjzzb decimal(18,4);
DECLARE v_bnjzzb decimal(18,4);
DECLARE v_ljjzzb decimal(18,4);
DECLARE v_bfqj varchar(20);
DECLARE count_num int DEFAULT 0; -- 计数器,用于分批提交
-- 声明游标,用于从源表中选择数据
DECLARE cur_sel CURSOR FOR SELECT
TIMEKEY,
zbbh,
zclb,
zclbms,
zjzh,
zjzhms,
cbzh,
cbzhms,
ljzjzh,
ljzjzhms,
xmbh,
xmmc,
sfgj,
dd,
ddsm,
cb,
cz,
bqzje,
bnzje,
ljzje,
zcjz,
zcje,
bqjzzb,
bnjzzb,
ljjzzb,
bfqj
FROM test_db1.test_t222
WHERE TIMEKEY = concat(substr(_sj,1,4),'-',substr(_sj,5,2)); -- 根据传入的日期参数过滤数据
-- 声明游标读取结束时的处理程序
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done=TRUE;
-- 删除目标表中符合条件的数据
DELETE FROM test_t111 WHERE TIMEKEY = concat(substr(_sj,1,4),'-',substr(_sj,5,2));
-- 开始事务
START transaction;
-- 打开游标
OPEN cur_sel;
-- 循环读取游标数据
read_loop:LOOP
FETCH cur_sel INTO v_timekey, v_zbbh,v_zclb,v_zclbms,v_zjzh,v_zjzhms,v_cbzh,v_cbzhms,v_ljzjzh,v_ljzjzhms,v_xmbh,v_xmmc,v_sfgj,v_dd,v_ddsm, v_cb,v_cz,v_bqzje,v_bnzje,v_ljzje,v_zcjz,v_zcje,v_bqjzzb, v_bnjzzb,v_ljjzzb,v_bfqj;
-- 如果游标读取结束,退出循环
IF done THEN
leave read_loop;
END IF;
-- 计数器加 1
SET count_num=count_num+1;
-- 将读取的数据插入目标表
INSERT INTO `test_db1`.`test_t111`(
`TIMEKEY`,
`zbbh`,
`zclb`,
`zclbms`,
`zjzh`,
`zjzhms`,
`cbzh`,
`cbzhms`,
`ljzjzh`,
`ljzjzhms`,
`xmbh`,
`xmmc`,
`sfgj`,
`dd`,
`ddsm`,
`cb`,
`cz`,
`bqzje`,
`bnzje`,
`ljzje`,
`zcjz`,
`zcje`,
`bqjzzb
