处理大事务难题的业务存储过程优化办法
一、问题呈现
1. 问题表象
业务调用存储过程未能成功,经检查发现存在大事务,单独执行时暴露问题。
greatsql> INSERT INTO
`ywdb1`.`t1`(
`TIMEKEY`,
`zbbh`,
`zcbl`,
`zcblms`,
`zjzh`,
`zjzhms`,
`cbzh`,
`ljzjzh`,
`xmbh`,
`xmmc`,
`sfgj`,
`dd`,
`ddsm`,
`cb`,
`cz`,
`bqzje`,
`bnzje`,
`ljzje`,
`zcjz`,
`zcje`,
`bqzzb`,
`bnjzzb`,
`bfqj`
)
SELECT
`TIMEKEY`,
`zbbh`,
`zcbl`,
`zcblms`,
`zjzh`,
`zjzhms`,
`cbzh`,
`ljzjzh`,
`xmbh`,
`xmmc`,
`sfgj`,
`dd`,
`ddsm`,
`cb`,
`cz`,
`bqzje`,
`bnzje`,
`ljzje`,
`zcjz`,
`zcje`,
`bqzzb`,
`bnjzzb`,
`bfqj`
FROM ywdb2.t2
WHERE
TIMEKEY = concat(substr(_sj, 1, 4), '-', substr(_sj, 5, 2));
ERROR 8532 (HY000): Execute backend SQL on node failed with 1197: Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase this mysqld variable and try again.
错误日志显示:
2025-02-20T00:36:01.969610+08:00 O [Note] [MY-011953] [InnoDB] Page cleaner took 10448ms to flush 2000 pages
2025-02-20T00:38:26.020404+08:00 O [Note] [MY-011953] [InnoDB] Page cleaner took 8427ms to flush 560 pages
2025-02-20T00:38:39.690478+08:00 3478 [ERROR] [MY-011735] [Rep] Plugin group_replication reported: '[GCS] gcs_packet's payload is too big. Only packets smaller than 2113929216 bytes can be compressed. Payload size is 3823615463.'
2025-02-20T00:38:39.690504+08:00 3478 [ERROR] [MY-011735] [Rep] Plugin group_replication reported: '[GCS] Error preparing the message for sending.'
2025-02-20T00:38:40.092629+08:00 3478 [ERROR] [MY-011614] [Rep] Plugin group_replication reported: 'Error while broadcasting the transaction to the group on session 3478'
2025-02-20T00:38:41.047249+08:00 3478 [ERROR] [MY-011207] [Rep] Run function 'before_commit' in plugin 'group_replication' failed
二、问题剖析
1.错误信息解读
ERROR 8532 (HY000)提示,执行的操作涉及多语句事务,且所需存储超出了max_binlog_cache_size的限制。
日志中([Error]
):提及group_replication
报告的消息,存在GCS_packet's payload is too big
的报错。来自group_replication
,强调负载过大,仅小于2113929216字节的包可发送。出现before_commit
运行错误,致使无法在group_replication
中完成事务。
2.参数配置分析
默认情况下,不建议大事务超过2G。查看参数:
greatsql> SHOW variables WHERE variable_name IN ('group_replication_transaction_size_limit','group_replication_compression_threshold');
+------------------------------------------+------------+
| Variable_name | Value |
+------------------------------------------+------------+
| group_replication_compression_threshold | 1000000 |
| group_replication_transaction_size_limit | 2147483647 |
+------------------------------------------+------------+
2 rows in set (0.00 sec)
- group_replication_transaction_size_limit :
- 值:2147483647
- 含义:该参数定义可复制事务的最大大小,单位字节。2147483647为2G。
- group_replication_compression_threshold :
- 值:1000000
- 含义:此参数表示组复制时,事务大小超该阈值(1,000,000字节即约1MB)会考虑压缩。即仅事务大小超1MB时,数据会压缩以减网络传输负担。
查看max_binlog_cache_size
的值为4G,该参数限制binlog缓存中存储的最大字节数。
greatsql> SHOW variables LIKE '%max_binlog%';
+----------------------------+------------+
| Variable_name | Value |
+----------------------------+------------+
| max_binlog_cache_size | 4294967296 |
| max_binlog_size | 1073741824 |
| max_binlog_stmt_cache_size | 268435456 |
+----------------------------+------------+
3 rows in set (0.02 sec)
3.存储过程信息查看
查看此存储过程相关信息(INFORMATION_SCHEMA_ROUTINES):
greatsql> SELECT * FROM ROUTINES WHERE ROUTINE_DEFINITION LIKE '%t1%' \G
*************************** 1. row ***************************
SPECIFIC_NAME: sp_t1
ROUTINE_CATALOG: def
ROUTINE_SCHEMA: ywdb1
ROUTINE_NAME: sp_t1
ROUTINE_TYPE: PROCEDURE
DTD_IDENTIFIER: NULL
CHARACTER_MAXIMUM_LENGTH: NULL
CHARACTER_OCTET_LENGTH: NULL
NUMERIC_PRECISION: NULL
NUMERIC_SCALE: NULL
DATETIME_PRECISION: NULL
CHARACTER_SET_NAME: NULL
COLLATION_NAME: NULL
ROUTINE_BODY: SQL
ROUTINE_DEFINITION: BEGIN DECLARE EXIT HANDLER FOR SQLEXCEPTION BEGIN -- 发生错误时回滚事务
ROLLBACK;
-- 这里可以添加错误处理逻辑
END;
START TRANSACTION;
DELETE FROM t1 WHERE TIMEKEY = concat(substr(_sj,1,4),'-',substr(_sj,5,2));
INSERT INTO `ywdb1`.`t1` (
`zbbh`,
`zclb`,
`zcbl`,
`zcblms`,
`zjzh`,
`zjzhms`,
`cbzh`,
`cbzhms`,
`ljzjzh`,
`ljzjzhms`,
`xmbh`,
`xmmc`,
`sfgj`,
`dd`,
`ddsm`,
`cb`,
`cz`,
`bqzje`,
`bnzje`,
`ljzje`,
`zcjz`,
`zcje`,
`bqjzzb`,
`bnjzzb`,
`ljjzzb`,
`bfqj`
)
SELECT
`TIMEKEY`,
`zbbh`,
`zclb`,
`zcbl`,
`zcblms`,
`zjzh`,
`zjzhms`,
`cbzh`,
`cbzhms`,
`ljzjzh`,
`ljzjzhms`,
`xmbh`,
`xmmc`,
`sfgj`,
`dd`,
`ddsm`,
`cb`,
`cz`,
`bqzje`,
`bnzje`,
`ljzje`,
`zcjz`,
`zcje`,
`bqjzzb`,
`bnjzzb`,
`ljjzzb`,
`bfqj`
FROM ywdb2.t2
WHERE TIMEKEY = concat(substr(_sj,1,4),'-',substr(_sj,5,2))
GROUP BY `TIMEKEY`, `zbbh`, `zclb`, `zcbl`, `zcblms`, `zjzh`, `zjzhms`, `cbzh`, `cbzhms`, `ljzjzh`, `ljzjzhms`, `xmbh`, `xmmc`, `sfgj`, `dd`, `ddsm`, `cb`, `cz`, `bqzje`, `bnzje`, `ljzje`, `zcjz`, `zcje`, `bqjzzb`, `bnjzzb`, `ljjzzb`, `bfqj`;
--提交事务
COMMIT;
END
EXTERNAL_NAME: NULL
EXTERNAL_LANGUAGE: SQL
PARAMETER_STYLE: SQL
IS_DETERMINISTIC: NO
SQL_DATA_ACCESS: CONTAINS SQL
SQL_PATH: NULL
SECURITY_TYPE: DEFINER
CREATED: 2025-02-17 15:05:54
LAST_ALTERED: 2025-02-17 15:05:54
SQL_MODE: STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_
ENGINE_SUBSTITUTION
ROUTINE_COMMENT:
DEFINER: root@%
CHARACTER_SET_CLIENT: utf8mb4
COLLATION_CONNECTION: utf8mb4_0900_ai_ci
DATABASE_COLLATION: utf8mb4_0900_bin
业务上,ETL表后续清洗操作确实涉及大事务动作。
4.涉及表大小查看
greatsql> SELECT count(*) FROM ywdb1.t1;
+-------------+
| count(*) |
+-------------+
| 3663246 |
+------------ +
1 rows in set (47.42 sec)
greatsql> SELECT count(*) FROM ywdb2.t2;
+-------------+
| count(*) |
+-------------+
| 71580710 |
+------------ +
1 rows in set (34 min 54.12 sec)
查看ywdb2.t2的ibd文件大小(单表大小161G):
$ du -sh *
161G t2.ibd
可见,delete以及依据ywdb2.t2大表做insert ... select ... 均为大体量大事务,事务行数超500W条以上(单月账期数据量)。
三、存储过程改造途径
1.改造思路
改写应用程序,借助频繁提交等方式,将大事务转为小事务。
2.改造方式
INSERT...SELECT...
语句,存储过程内改为循环,并每10000条提交一次。(使用游标)
以存储过程ywdb1.sp_t1的修改为例:
``sql
root
greatsql> USE test_db1;
greatsql> CREATE TABLE test_t111 LIKE ywdb1.t1;
greatsql> CREATE TABLE test_t222 LIKE ywdb2.t2;
greatsql>
delimiter //
CREATE DEFINER=@
%PROCEDURE
test_sp_t1`(_sj varchar(10))
BEGIN
-- 声明变量
DECLARE done int DEFAULT FALSE;
DECLARE v_timekey varchar(50);
DECLARE v_zbbh varchar(100) ;
DECLARE v_zclb varchar(100);
DECLARE v_zclbms varchar(100);
DECLARE v_zjzh varchar(100);
DECLARE v_zjzhms varchar(500);
DECLARE v_cbzh varchar(100);
DECLARE v_cbzhms varchar(100);
DECLARE v_ljzjzh varchar(100);
DECLARE v_ljzjzhms varchar(100);
DECLARE v_xmbh varchar(100);
DECLARE v_xmmc varchar(100);
DECLARE v_sfgj varchar(100);
DECLARE v_dd varchar(100);
DECLARE v_ddsm varchar(100);
DECLARE v_cb decimal(18,4);
DECLARE v_cz decimal(18,4);
DECLARE v_bqzje decimal(18,4);
DECLARE v_bnzje decimal(18,4);
DECLARE v_ljzje decimal(18,4);
DECLARE v_zcjz decimal(18,4);
DECLARE v_zcje decimal(18,4);
DECLARE v_bqjzzb decimal(18,4);
DECLARE v_bnjzzb decimal(18,4);
DECLARE v_ljjzzb decimal(18,4);
DECLARE v_bfqj varchar(20);
DECLARE count_num int DEFAULT 0; -- 计数器,用于分批提交
-- 声明游标,用于从源表中选择数据
DECLARE cur_sel CURSOR FOR SELECT
TIMEKEY
,
zbbh
,
zclb
,
zclbms
,
zjzh
,
zjzhms
,
cbzh
,
cbzhms
,
ljzjzh
,
ljzjzhms
,
xmbh
,
xmmc
,
sfgj
,
dd
,
ddsm
,
cb
,
cz
,
bqzje
,
bnzje
,
ljzje
,
zcjz
,
zcje
,
bqjzzb
,
bnjzzb
,
ljjzzb
,
bfqj
FROM test_db1.test_t222
WHERE TIMEKEY = concat(substr(_sj,1,4),'-',substr(_sj,5,2)); -- 根据传入的日期参数过滤数据
-- 声明游标读取结束时的处理程序
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done=TRUE;
-- 删除目标表中符合条件的数据
DELETE FROM test_t111 WHERE TIMEKEY = concat(substr(_sj,1,4),'-',substr(_sj,5,2));
-- 开始事务
START transaction;
-- 打开游标
OPEN cur_sel;
-- 循环读取游标数据
read_loop:LOOP
FETCH cur_sel INTO v_timekey, v_zbbh,v_zclb,v_zclbms,v_zjzh,v_zjzhms,v_cbzh,v_cbzhms,v_ljzjzh,v_ljzjzhms,v_xmbh,v_xmmc,v_sfgj,v_dd,v_ddsm, v_cb,v_cz,v_bqzje,v_bnzje,v_ljzje,v_zcjz,v_zcje,v_bqjzzb, v_bnjzzb,v_ljjzzb,v_bfqj;
-- 如果游标读取结束,退出循环
IF done THEN
leave read_loop;
END IF;
-- 计数器加 1
SET count_num=count_num+1;
-- 将读取的数据插入目标表
INSERT INTO `test_db1`.`test_t111`(
`TIMEKEY`,
`zbbh`,
`zclb`,
`zclbms`,
`zjzh`,
`zjzhms`,
`cbzh`,
`cbzhms`,
`ljzjzh`,
`ljzjzhms`,
`xmbh`,
`xmmc`,
`sfgj`,
`dd`,
`ddsm`,
`cb`,
`cz`,
`bqzje`,
`bnzje`,
`ljzje`,
`zcjz`,
`zcje`,
`bqjzzb