应对大事务挑战的业务存储过程优化

处理大事务难题的业务存储过程优化办法

一、问题呈现

1. 问题表象

业务调用存储过程未能成功,经检查发现存在大事务,单独执行时暴露问题。

greatsql> INSERT INTO
  `ywdb1`.`t1`(
  `TIMEKEY`,
  `zbbh`,
  `zcbl`,
  `zcblms`,
  `zjzh`,
  `zjzhms`,
  `cbzh`,
  `ljzjzh`,
  `xmbh`,
  `xmmc`,
  `sfgj`,
  `dd`,
  `ddsm`,
  `cb`,
  `cz`,
  `bqzje`,
  `bnzje`,
  `ljzje`,
  `zcjz`,
  `zcje`,
  `bqzzb`,
  `bnjzzb`,
  `bfqj`
)
SELECT
  `TIMEKEY`,
  `zbbh`,
  `zcbl`,
  `zcblms`,
  `zjzh`,
  `zjzhms`,
  `cbzh`,
  `ljzjzh`,
  `xmbh`,
  `xmmc`,
  `sfgj`,
  `dd`,
  `ddsm`,
  `cb`,
  `cz`,
  `bqzje`,
  `bnzje`,
  `ljzje`,
  `zcjz`,
  `zcje`,
  `bqzzb`,
  `bnjzzb`,
  `bfqj`
FROM ywdb2.t2
WHERE
  TIMEKEY = concat(substr(_sj, 1, 4), '-', substr(_sj, 5, 2));

ERROR 8532 (HY000): Execute backend SQL on node failed with 1197: Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase this mysqld variable and try again.

错误日志显示:

2025-02-20T00:36:01.969610+08:00 O [Note] [MY-011953] [InnoDB] Page cleaner took 10448ms to flush 2000 pages  
2025-02-20T00:38:26.020404+08:00 O [Note] [MY-011953] [InnoDB] Page cleaner took 8427ms to flush 560 pages  
2025-02-20T00:38:39.690478+08:00 3478 [ERROR] [MY-011735] [Rep] Plugin group_replication reported: '[GCS] gcs_packet's payload is too big. Only packets smaller than 2113929216 bytes can be compressed. Payload size is 3823615463.'  
2025-02-20T00:38:39.690504+08:00 3478 [ERROR] [MY-011735] [Rep] Plugin group_replication reported: '[GCS] Error preparing the message for sending.'  
2025-02-20T00:38:40.092629+08:00 3478 [ERROR] [MY-011614] [Rep] Plugin group_replication reported: 'Error while broadcasting the transaction to the group on session 3478'  
2025-02-20T00:38:41.047249+08:00 3478 [ERROR] [MY-011207] [Rep] Run function 'before_commit' in plugin 'group_replication' failed  

二、问题剖析

1.错误信息解读

ERROR 8532 (HY000)提示,执行的操作涉及多语句事务,且所需存储超出了max_binlog_cache_size的限制。

日志中([Error]):提及group_replication报告的消息,存在GCS_packet's payload is too big的报错。来自group_replication,强调负载过大,仅小于2113929216字节的包可发送。出现before_commit运行错误,致使无法在group_replication中完成事务。

2.参数配置分析

默认情况下,不建议大事务超过2G。查看参数:

greatsql> SHOW variables WHERE variable_name IN ('group_replication_transaction_size_limit','group_replication_compression_threshold');
+------------------------------------------+------------+
| Variable_name                            | Value      |
+------------------------------------------+------------+
| group_replication_compression_threshold  | 1000000    |
| group_replication_transaction_size_limit | 2147483647 |
+------------------------------------------+------------+
2 rows in set (0.00 sec)
  • group_replication_transaction_size_limit
  • 值:2147483647
  • 含义:该参数定义可复制事务的最大大小,单位字节。2147483647为2G。
  • group_replication_compression_threshold
  • 值:1000000
  • 含义:此参数表示组复制时,事务大小超该阈值(1,000,000字节即约1MB)会考虑压缩。即仅事务大小超1MB时,数据会压缩以减网络传输负担。

查看max_binlog_cache_size的值为4G,该参数限制binlog缓存中存储的最大字节数。

greatsql> SHOW variables LIKE '%max_binlog%';
+----------------------------+------------+
| Variable_name              | Value      |
+----------------------------+------------+
| max_binlog_cache_size      | 4294967296 |
| max_binlog_size            | 1073741824 |
| max_binlog_stmt_cache_size | 268435456  |
+----------------------------+------------+
3 rows in set (0.02 sec)
3.存储过程信息查看

查看此存储过程相关信息(INFORMATION_SCHEMA_ROUTINES):

greatsql> SELECT * FROM ROUTINES WHERE ROUTINE_DEFINITION LIKE '%t1%' \G  
*************************** 1. row ***************************  
                SPECIFIC_NAME: sp_t1  
              ROUTINE_CATALOG: def  
               ROUTINE_SCHEMA: ywdb1  
                 ROUTINE_NAME: sp_t1  
                 ROUTINE_TYPE: PROCEDURE  
               DTD_IDENTIFIER: NULL  
     CHARACTER_MAXIMUM_LENGTH: NULL  
       CHARACTER_OCTET_LENGTH: NULL  
            NUMERIC_PRECISION: NULL  
                NUMERIC_SCALE: NULL  
           DATETIME_PRECISION: NULL  
           CHARACTER_SET_NAME: NULL  
               COLLATION_NAME: NULL  
                 ROUTINE_BODY: SQL  
           ROUTINE_DEFINITION: BEGIN DECLARE EXIT HANDLER FOR SQLEXCEPTION BEGIN -- 发生错误时回滚事务  
    ROLLBACK;  
    -- 这里可以添加错误处理逻辑  
    END;  
    START TRANSACTION;
    DELETE FROM t1 WHERE TIMEKEY = concat(substr(_sj,1,4),'-',substr(_sj,5,2));
    INSERT INTO `ywdb1`.`t1` (
      `zbbh`,
      `zclb`,
      `zcbl`,
      `zcblms`,
      `zjzh`,
      `zjzhms`,
      `cbzh`,
      `cbzhms`,
      `ljzjzh`,
      `ljzjzhms`,
      `xmbh`,
      `xmmc`,
      `sfgj`,
      `dd`,
      `ddsm`,
      `cb`,
      `cz`,
      `bqzje`,
      `bnzje`,
      `ljzje`,
      `zcjz`,
      `zcje`,
      `bqjzzb`,
      `bnjzzb`,
      `ljjzzb`,
      `bfqj`
    )
    SELECT
      `TIMEKEY`,
      `zbbh`,
      `zclb`,
      `zcbl`,
      `zcblms`,
      `zjzh`,
      `zjzhms`,
      `cbzh`,
      `cbzhms`,
      `ljzjzh`,
      `ljzjzhms`,
      `xmbh`,
      `xmmc`,
      `sfgj`,
      `dd`,
      `ddsm`,
      `cb`,
      `cz`,
      `bqzje`,
      `bnzje`,
      `ljzje`,
      `zcjz`,
      `zcje`,
      `bqjzzb`,
      `bnjzzb`,
      `ljjzzb`,
      `bfqj`
    FROM ywdb2.t2
    WHERE TIMEKEY = concat(substr(_sj,1,4),'-',substr(_sj,5,2))
    GROUP BY `TIMEKEY`, `zbbh`, `zclb`, `zcbl`, `zcblms`, `zjzh`, `zjzhms`, `cbzh`, `cbzhms`, `ljzjzh`, `ljzjzhms`, `xmbh`, `xmmc`, `sfgj`, `dd`, `ddsm`, `cb`, `cz`, `bqzje`, `bnzje`, `ljzje`, `zcjz`, `zcje`, `bqjzzb`, `bnjzzb`, `ljjzzb`, `bfqj`;
    --提交事务
    COMMIT;
    END
               EXTERNAL_NAME: NULL
           EXTERNAL_LANGUAGE: SQL
             PARAMETER_STYLE: SQL
            IS_DETERMINISTIC: NO
             SQL_DATA_ACCESS: CONTAINS SQL
                    SQL_PATH: NULL
               SECURITY_TYPE: DEFINER
                     CREATED: 2025-02-17 15:05:54
                LAST_ALTERED: 2025-02-17 15:05:54
                    SQL_MODE: STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_
    ENGINE_SUBSTITUTION
             ROUTINE_COMMENT: 
                     DEFINER: root@%
        CHARACTER_SET_CLIENT: utf8mb4
        COLLATION_CONNECTION: utf8mb4_0900_ai_ci
          DATABASE_COLLATION: utf8mb4_0900_bin

业务上,ETL表后续清洗操作确实涉及大事务动作。

4.涉及表大小查看
greatsql> SELECT count(*) FROM ywdb1.t1;
+-------------+
|  count(*)   | 
+-------------+
|  3663246   |
+------------ +
1 rows in set (47.42 sec)
greatsql> SELECT count(*) FROM ywdb2.t2;
+-------------+
|  count(*)   | 
+-------------+
|  71580710   |
+------------ +
1 rows in set (34 min 54.12 sec)

查看ywdb2.t2的ibd文件大小(单表大小161G):

$ du -sh *
161G t2.ibd

可见,delete以及依据ywdb2.t2大表做insert ... select ... 均为大体量大事务,事务行数超500W条以上(单月账期数据量)。

三、存储过程改造途径

1.改造思路

改写应用程序,借助频繁提交等方式,将大事务转为小事务。

2.改造方式

INSERT...SELECT...语句,存储过程内改为循环,并每10000条提交一次。(使用游标)

以存储过程ywdb1.sp_t1的修改为例:

``sql
greatsql> USE test_db1;
greatsql> CREATE TABLE test_t111 LIKE ywdb1.t1;
greatsql> CREATE TABLE test_t222 LIKE ywdb2.t2;
greatsql>
delimiter //
CREATE DEFINER=
root@%PROCEDUREtest_sp_t1`(_sj varchar(10))
BEGIN
-- 声明变量
DECLARE done int DEFAULT FALSE;

DECLARE v_timekey varchar(50);
DECLARE v_zbbh varchar(100) ;
DECLARE v_zclb varchar(100);
DECLARE v_zclbms varchar(100);
DECLARE v_zjzh varchar(100);
DECLARE v_zjzhms varchar(500);
DECLARE v_cbzh varchar(100);
DECLARE v_cbzhms varchar(100);
DECLARE v_ljzjzh varchar(100);
DECLARE v_ljzjzhms varchar(100);
DECLARE v_xmbh varchar(100);
DECLARE v_xmmc varchar(100);
DECLARE v_sfgj varchar(100);
DECLARE v_dd varchar(100);
DECLARE v_ddsm varchar(100);
DECLARE v_cb decimal(18,4);
DECLARE v_cz decimal(18,4);
DECLARE v_bqzje decimal(18,4);
DECLARE v_bnzje decimal(18,4);
DECLARE v_ljzje decimal(18,4);
DECLARE v_zcjz decimal(18,4);
DECLARE v_zcje decimal(18,4);
DECLARE v_bqjzzb decimal(18,4);
DECLARE v_bnjzzb decimal(18,4);
DECLARE v_ljjzzb decimal(18,4);
DECLARE v_bfqj varchar(20);

DECLARE count_num int DEFAULT 0; -- 计数器,用于分批提交
-- 声明游标,用于从源表中选择数据
DECLARE cur_sel CURSOR FOR SELECT
TIMEKEY,
zbbh,
zclb,
zclbms,
zjzh,
zjzhms,
cbzh,
cbzhms,
ljzjzh,
ljzjzhms,
xmbh,
xmmc,
sfgj,
dd,
ddsm,
cb,
cz,
bqzje,
bnzje,
ljzje,
zcjz,
zcje,
bqjzzb,
bnjzzb,
ljjzzb,
bfqj
FROM test_db1.test_t222
WHERE TIMEKEY = concat(substr(_sj,1,4),'-',substr(_sj,5,2)); -- 根据传入的日期参数过滤数据

-- 声明游标读取结束时的处理程序 
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done=TRUE;

-- 删除目标表中符合条件的数据 
DELETE FROM test_t111 WHERE TIMEKEY = concat(substr(_sj,1,4),'-',substr(_sj,5,2));

-- 开始事务 
START transaction;

-- 打开游标
  OPEN cur_sel;

-- 循环读取游标数据
 read_loop:LOOP
   FETCH cur_sel INTO v_timekey, v_zbbh,v_zclb,v_zclbms,v_zjzh,v_zjzhms,v_cbzh,v_cbzhms,v_ljzjzh,v_ljzjzhms,v_xmbh,v_xmmc,v_sfgj,v_dd,v_ddsm, v_cb,v_cz,v_bqzje,v_bnzje,v_ljzje,v_zcjz,v_zcje,v_bqjzzb, v_bnjzzb,v_ljjzzb,v_bfqj;

-- 如果游标读取结束,退出循环 
 IF done THEN
   leave read_loop;
 END IF;

-- 计数器加 1  
SET count_num=count_num+1;

-- 将读取的数据插入目标表  
INSERT INTO `test_db1`.`test_t111`(
  `TIMEKEY`,
  `zbbh`,
  `zclb`,
  `zclbms`,
  `zjzh`,
  `zjzhms`,
  `cbzh`,
  `cbzhms`,
  `ljzjzh`,
  `ljzjzhms`,
  `xmbh`,
  `xmmc`,
  `sfgj`,
  `dd`,
  `ddsm`,
  `cb`,
  `cz`,
  `bqzje`,
  `bnzje`,
  `ljzje`,
  `zcjz`,
  `zcje`,
  `bqjzzb
版权声明:程序员胖胖胖虎阿 发表于 2025年6月19日 下午4:47。
转载请注明:

应对大事务挑战的业务存储过程优化

| 胖虎的工具箱-编程导航

相关文章

暂无评论

暂无评论...