当前位置 博文首页 > 程序员石磊:OGG ora-01403错误的快捷处理方案

    程序员石磊:OGG ora-01403错误的快捷处理方案

    作者:[db:作者] 时间:2021-07-04 15:55

    OGG运维中有一个经典错误-1403。现象是目标端复制update或者delete操作导致复制进程abended,原因是update或delete时找不到目标数据。至于该数据为什么不在目标端有很多可能,比如人为删除、trigger没有禁用导致删除、级联外键删除没有禁用导致删除等等。通常我们的排查手段是确认目标端的trigger、级联外键删除、job是否启动了?如果启动了禁用它。然后再排查源端表是否有主键,主键在trandata中是否生效。上述排查都没有问题的话就开始做表级初始化吧,数据泵导出导入,同步变化…

    但是有时候我们也可以不这么折腾,可以采取“补缺”的方式让复制进程迅速恢复。思路如下:

    1. 通过目标端ggserr日志和replcat.dsc文件来定位丢失的数据
    2. 在源端使用database link执行insert into 目标端 select * from 源表 where=(步骤一确认的条件)的方式来手工补缺。
    3. 启动复制进程,复制进程会重新操作abended之前失败的操作。

    下面通过一个实验来演示上述过程

    1. source插入第一条测试数据
      Insert into FM_TAX_RATE_TEST (TEST_ID,COUNTRY, STATE, TAX_TYPE, TAX_RATE)
      Values (1,‘CN’, ‘68’, ‘WT3’, 0.0015);

    commit;

    1. target确认同步
      select * from fm_tax_rate_test;

    COUNTR STAT TAX_TY TAX_RATE TEST_ID


    CN 68 WT3 .0015 1

    1. target删除复制记录,人为制造1403错误
      delete from fm_tax_rate_test where test_id=1;

    commit;

    1. source对第一条测记录执行update操作会导致target复制进程中断。中断原因是update语句中的where字句定位的数据在target端不存在,因为我刚刚手工删除了这条记录。
      update FM_TAX_RATE_TEST set country=‘US’ where test_id=1;

    commit;

    此时target端已经中断,在source增加数据变化,期待target重启后会应用这些故障后产生的变化。
    Insert into FM_TAX_RATE_TEST (TEST_ID,COUNTRY, STATE, TAX_TYPE, TAX_RATE)
    Values (2,‘TW’, ‘68’, ‘WT3’, 0.0015);

    Insert into FM_TAX_RATE_TEST (TEST_ID,COUNTRY, STATE, TAX_TYPE, TAX_RATE)
    Values (3,‘JP’, ‘68’, ‘WT3’, 0.0015); 2

    commit;

    target复制进程中断
    GGSCI (cdbsym3) 6> info all

    Program Status Group Lag at Chkpt Time Since Chkpt

    MANAGER RUNNING
    REPLICAT RUNNING REPSYM 00:00:00 00:00:02
    REPLICAT ABENDED REPSYM_T 00:10:20 00:00:01

    target端ggserr.log中错误信息片段
    2015-03-31 13:50:26 WARNING OGG-01004 Oracle GoldenGate Delivery for Oracle, repsym_t.prm: Aborted grouped transaction on ‘OGG_TEST.FM_TAX_RATE_TEST’, Database error 1403 (OCI Error ORA-01403: no data found, SQL <UPDATE “OGG_TEST”.“FM_TAX_RATE_TEST” SET “COUNTRY” = :a1,“STATE” = :a2,“TAX_TYPE” = :a3,“TAX_RATE” = :a4,“TEST_ID” = :a5 WHERE “TEST_ID” = :b0>).
    2015-03-31 13:50:26 WARNING OGG-01003 Oracle GoldenGate Delivery for Oracle, repsym_t.prm: Repositioning to rba 170249512 in seqno 12.
    2015-03-31 13:50:26 WARNING OGG-01154 Oracle GoldenGate Delivery for Oracle, repsym_t.prm: SQL error 1403 mapping OGG_TEST.FM_TAX_RATE_TEST to OGG_TEST.FM_TAX_RATE_TEST OCI Error ORA-01403: no data found, SQL .
    2015-03-31 13:50:26 WARNING OGG-01003 Oracle GoldenGate Delivery for Oracle, repsym_t.prm: Repositioning to rba 170249512 in seqno 12.
    2015-03-31 13:50:26 ERROR OGG-01296 Oracle GoldenGate Delivery for Oracle, repsym_t.prm: Error mapping from OGG_TEST.FM_TAX_RATE_TEST to OGG_TEST.FM_TAX_RATE_TEST.
    2015-03-31 13:50:26 ERROR OGG-01668 Oracle GoldenGate Delivery for Oracle, repsym_t.prm: PROCESS ABENDING.

    target端discard文件中记录了test_id=1的数据执行udpate失败

    more repsym_t.dsc
    Oracle GoldenGate Delivery for Oracle process started, group REPSYM_T discard file opened: 2015-03-31 13:50:25

    Current time: 2015-03-31 13:50:26
    Discarded record from action ABEND on error 1403

    OCI Error ORA-01403: no data found, SQL
    RATE" = :a4,“TEST_ID” = :a5 WHERE “TEST_ID” = :b0>
    Aborting transaction on ./dirdat/yt beginning at seqno 12 rba 170249512
    error at seqno 12 rba 170249512
    Problem replicating OGG_TEST.FM_TAX_RATE_TEST to OGG_TEST.FM_TAX_RATE_TEST
    Record not found
    Mapping problem with compressed key update record (target format)…
    *
    TEST_ID = 1
    COUNTRY = US
    STATE = 68
    TAX_TYPE = WT3
    TAX_RATE = .00150000
    TEST_ID = 1

    这时候很多运维人员最常用的就是按照csn一致性导出source表,重新初始化target端数据不一致的表。在使用下面的方式来修改复制进程参数文件,重启复制进程追进度。
    map schema.table, target schema.table, filter (@GETENV (“TRANSACTION”, “CSN”) > 9527);
    如果同步的表比较大,这个过程会很漫长。

    如果只是缺少那么几条数据,别人被认为误删除了造成的,也需要这么大动干戈处理么?其实可以用个简单的方法来处理,在源库创建一个database link,将target端缺少的数据手工insert过去补全这个漏洞,然后启动复制进程。复制进程会再次尝试失败的update语句,where字句锁定刚才手工插入的数据,修改成功。 复制进程继续应用source端数据变化。

    1. 源端创建database link。其中SERVICE_NAME = data为target数据库的SID
      5-1 在tnsnames.ora中添加target端数据库的字符串
      to19 =
      (DESCRIPTION =
      (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.78.2.19)(PORT = 1553))
      )
      (CONNECT_DATA =
      (SERVICE_NAME = data)
      )
      )

    5-2 创建database link指向target数据库; 其中ogg_test为target数据库的schema。
    create public databbase link to19 connect to ogg_test identified by ogg_test;

    5-3 通过database link手工同步丢失语句。其中select语句是源表的数据,insert into是目标数据库。
    insert into ogg_test.fm_tax_rate_test@to19 select * from ogg_test.fm_tax_rate_test where test_id=1;

    1. target启动复制进程
      GGSCI (cdbsym3) 4> start repsym

    Sending START request to MANAGER …
    REPLICAT REPSYM starting

    GGSCI (cdbsym3) 5> info all

    Program Status Group Lag at Chkpt Time Since Chkpt

    MANAGER RUNNING
    REPLICAT RUNNING REPSYM 00:00:00 00:00:00
    REPLICAT RUNNING REPSYM_T 00:00:00 00:00:01

    数据变化已经被应用到复制端了
    GGSCI (cdbsym3) 8> stats repsym total table dbp.rb_restraints

    Sending STATS request to REPLICAT REPSYM …

    Start of Statistics at 2015-03-31 11:09:14.

    Replicating from SYMBOLS.RB_RESTRAINTS to DBP.RB_RESTRAINTS:

    *** Total statistics since 2015-03-31 11:08:13 ***
    Total inserts 1.00
    Total updates 4.00
    Total deletes 0.00
    Total discards 0.00
    Total operations 5.00

    End of Statistics.

    1. 在数据库中查看复制进程启动后的数据变化
      OGG_TEST@data> select * from ogg_test.fm_tax_rate_test;

    COUNTR STAT TAX_TY TAX_RATE TEST_ID


    US 68 WT3 .0015 1
    TW 68 WT3 .0015 2
    JP 68 WT3 .0015 3

    其中第一条数据就是我们通过手工同步的数据,后面两条数据是故障之后的数据变化。

    注意:如果手工同步之前源表的数据也执行delete操作就无法通过isnert into select 的方式获取并同步到target端了。祈祷吧

    该方案在特定条件下还是很好用的,节省了很多时间。祝同学们好运

    还有一个办法

    复制进程配置文件增加
    HANDLECOLLISIONS
    在这里插入图片描述

    cs