MYSQL

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • how replication handles commit fails

    7 answers - 1090 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hi,
    I am looking at the following code in ha_commit_trans():
    if (!trans->no_2pc && trans->nht 1)
    {
    for (; *ht && !error; ht++)
    {
    int err;
    if ((err= (*(*ht)->prepare)(thd, all)))
    {
    my_error(ER_ERRR_DURING_CMMIT, MYF(0), err);
    error= 1;
    }
    statistic_increment(thd->status_var.ha_prepare_count,&LCK_status);
    }
    DBUG_EXECUTE_IF("crash_commit_after_prepare", abort(););
    if (error || (is_real_trans && xid &&
    (error= !(cookie= tc_log->log(thd, xid)))))
    {
    ha_rollback_trans(thd, all);
    error= 1;
    goto end;
    }
    DBUG_EXECUTE_IF("crash_commit_after_log", abort(););
    }
    error=ha_commit_one_phase(thd, all) ? cookie ? 2 : 1 : 0;
    The tricky thing is that the statement of
    "(error= !(cookie= tc_log->log(thd, xid))))" committed the replication's
    binlog change before the real commit started.
    Later, in ha_commit_one_phase(), the code never reverse binlog changes if
    the transaction failed committing. How do we handle the failure here?
    Thanks,
    Wei
  • No.1 | | 1848 bytes | |

    Hi!

    Jun 18, Wei Li wrote:
    Hi,

    I am looking at the following code in ha_commit_trans():

    if (!trans->no_2pc && trans->nht 1)
    {
    for (; *ht && !error; ht++)
    {
    int err;
    if ((err= (*(*ht)->prepare)(thd, all)))
    {
    my_error(ER_ERRR_DURING_CMMIT, MYF(0), err);
    error= 1;
    }
    }
    DBUG_EXECUTE_IF("crash_commit_after_prepare", abort(););
    if (error || (is_real_trans && xid &&
    (error= !(cookie= tc_log->log(thd, xid)))))
    {
    ha_rollback_trans(thd, all);
    error= 1;
    goto end;
    }
    DBUG_EXECUTE_IF("crash_commit_after_log", abort(););
    }
    error=ha_commit_one_phase(thd, all) ? cookie ? 2 : 1 : 0;

    The tricky thing is that the statement of
    "(error= !(cookie= tc_log->log(thd, xid))))" committed the replication's
    binlog change before the real commit started.

    Later, in ha_commit_one_phase(), the code never reverse binlog changes if
    the transaction failed committing. How do we handle the failure here?

    We don't. Two-phase commit protocol works in the assumption that commit
    phase cannot fail if prepare phase succeeded. That is by returning a
    success code from prepare(), storage engine promises to be able to
    commit - if there could've been an error, it must've been returned from
    prepare(). And it someone breaks the protocol, succeeding in prepare,
    but failing in commit - you'll have inconsistent data. Commit cannot be
    undone, by definition.

    It's intrinsic to the two phase commit, and has nothing to do with
    binlog, think about two storage engines:

    1st: prepare - ok.
    2nd: prepare - ok.
    1st: commit - ok
    2nd: commit - failure

    there's nothing we can do at the moment, we cannot revert the commit in
    the 1st storage engine.

    Regards,
    Sergei
  • No.2 | | 3211 bytes | |

    Hi Sergei and Wei,

    Jun 19, 2006, at 1:39 PM, Sergei Golubchik wrote:

    Hi!

    Jun 18, Wei Li wrote:
    >Hi,
    >>

    >I am looking at the following code in ha_commit_trans():
    >
    >if (!trans->no_2pc && trans->nht 1)
    >{
    >for (; *ht && !error; ht++)
    >{
    >int err;
    >if ((err= (*(*ht)->prepare)(thd, all)))
    >{
    >my_error(ER_ERRR_DURING_CMMIT, MYF(0), err);
    >error= 1;
    >}
    >}
    >DBUG_EXECUTE_IF("crash_commit_after_prepare", abort(););
    >if (error || (is_real_trans && xid &&
    >(error= !(cookie= tc_log->log(thd, xid)))))
    >{
    >ha_rollback_trans(thd, all);
    >error= 1;
    >goto end;
    >}
    >DBUG_EXECUTE_IF("crash_commit_after_log", abort(););
    >}
    >error=ha_commit_one_phase(thd, all) ? cookie ? 2 : 1 : 0;
    >
    >>

    >The tricky thing is that the statement of
    >"(error= !(cookie= tc_log->log(thd, xid))))" committed the
    >replication's
    >binlog change before the real commit started.
    >>

    >Later, in ha_commit_one_phase(), the code never reverse binlog
    >changes if
    >the transaction failed committing. How do we handle the failure here?
    >

    We don't. Two-phase commit protocol works in the assumption that commit
    phase cannot fail if prepare phase succeeded. That is by returning a
    success code from prepare(), storage engine promises to be able to
    commit - if there could've been an error, it must've been returned from
    prepare(). And it someone breaks the protocol, succeeding in prepare,
    but failing in commit - you'll have inconsistent data. Commit cannot be
    undone, by definition.

    It's intrinsic to the two phase commit, and has nothing to do with
    binlog, think about two storage engines:

    1st: prepare - ok.
    2nd: prepare - ok.
    1st: commit - ok
    2nd: commit - failure

    there's nothing we can do at the moment, we cannot revert the commit in
    the 1st storage engine.

    Regards,
    Sergei

    This is a good question and something that I have been wondering about.

    Although it is very unlikely, this situation can actually occur.

    You see, even if an engine returns K on the prepare(), it probably
    still relies on the fact that at least one disk write operation must
    work in order to do the commit().

    What this means is that the engine can actually not guarantee that the
    commit() will work. It can only guarantee that if it does not work,
    then the commit will be completed on recovery.

    So my question is: since recovery is only done on startup, if a
    commit() call fails, doesn't this mean that the data server should
    actually shutdown immediately and automatically restart (or some
    equivalent operation)?

    Thanks for your help,

    Paul

    __
    \ \/ _ _/ Paul McCullagh (MSc)
    \ / | | CT
    / \ | | SNAP Innovation GmbH
    / /\ \| | Altonaer Poststr 9a
    22767 Hamburg, Germany
    PrimeBase XT www.primebase.com/xt
  • No.3 | | 3217 bytes | |

    6/19/06, Paul McCullagh <paul.mccullagh (AT) primebase (DOT) comwrote:
    Hi Sergei and Wei,

    Jun 19, 2006, at 1:39 PM, Sergei Golubchik wrote:

    Hi!

    Jun 18, Wei Li wrote:
    >Hi,
    >>

    >I am looking at the following code in ha_commit_trans():
    >
    >if (!trans->no_2pc && trans->nht 1)
    >{
    >for (; *ht && !error; ht++)
    >{
    >int err;
    >if ((err= (*(*ht)->prepare)(thd, all)))
    >{
    >my_error(ER_ERRR_DURING_CMMIT, MYF(0), err);
    >error= 1;
    >}
    >}
    >DBUG_EXECUTE_IF("crash_commit_after_prepare", abort(););
    >if (error || (is_real_trans && xid &&
    >(error= !(cookie= tc_log->log(thd, xid)))))
    >{
    >ha_rollback_trans(thd, all);
    >error= 1;
    >goto end;
    >}
    >DBUG_EXECUTE_IF("crash_commit_after_log", abort(););
    >}
    >error=ha_commit_one_phase(thd, all) ? cookie ? 2 : 1 : 0;
    >
    >>

    >The tricky thing is that the statement of
    >"(error= !(cookie= tc_log->log(thd, xid))))" committed the
    >replication's
    >binlog change before the real commit started.
    >>

    >Later, in ha_commit_one_phase(), the code never reverse binlog
    >changes if
    >the transaction failed committing. How do we handle the failure here?
    >

    We don't. Two-phase commit protocol works in the assumption that commit
    phase cannot fail if prepare phase succeeded. That is by returning a
    success code from prepare(), storage engine promises to be able to
    commit - if there could've been an error, it must've been returned from
    prepare(). And it someone breaks the protocol, succeeding in prepare,
    but failing in commit - you'll have inconsistent data. Commit cannot be
    undone, by definition.

    It's intrinsic to the two phase commit, and has nothing to do with
    binlog, think about two storage engines:

    1st: prepare - ok.
    2nd: prepare - ok.
    1st: commit - ok
    2nd: commit - failure

    there's nothing we can do at the moment, we cannot revert the commit in
    the 1st storage engine.

    Regards,
    Sergei

    This is a good question and something that I have been wondering about.

    Although it is very unlikely, this situation can actually occur.

    You see, even if an engine returns K on the prepare(), it probably
    still relies on the fact that at least one disk write operation must
    work in order to do the commit().

    What this means is that the engine can actually not guarantee that the
    commit() will work. It can only guarantee that if it does not work,
    then the commit will be completed on recovery.

    So my question is: since recovery is only done on startup, if a
    commit() call fails, doesn't this mean that the data server should
    actually shutdown immediately and automatically restart (or some
    equivalent operation)?

    Why would the write succeed during the recovery but not during the commit?
    And I think all database transactions depend on a working storage system.
  • No.4 | | 457 bytes | |

    Mon, 2006-06-19 at 22:36 +0200, van der Spek wrote:
    Why would the write succeed during the recovery but not during the commit?
    And I think all database transactions depend on a working storage system.

    If you hit a file system problem that forces the file system read only.

    but then "recovery" can involve an umount/reboot/fs repair

    You'd be amazed at how much things don't work when the FS suddenly goes
    read only :)
  • No.5 | | 786 bytes | |

    Mon, 2006-06-19 at 16:29 +0200, Paul McCullagh wrote:
    What this means is that the engine can actually not guarantee that the
    commit() will work. It can only guarantee that if it does not work,
    then the commit will be completed on recovery.

    So my question is: since recovery is only done on startup, if a
    commit() call fails, doesn't this mean that the data server should
    actually shutdown immediately and automatically restart (or some
    equivalent operation)?

    My guess would be that it would be best to leave the choice to restart
    to the administrator and still try to service any requests (e.g. read
    requests could still go on fine especially other tables shouldn't be
    affected).

    although i'm not familiar with the XA code
  • No.6 | | 3789 bytes | |

    Hi!

    Jun 19, Paul McCullagh wrote:
    Hi Sergei and Wei,

    >>Later, in ha_commit_one_phase(), the code never reverse binlog
    >>changes if the transaction failed committing. How do we handle the
    >>failure here?

    >
    >We don't. Two-phase commit protocol works in the assumption that
    >commit phase cannot fail if prepare phase succeeded. That is by
    >returning a success code from prepare(), storage engine promises to
    >be able to commit - if there could've been an error, it must've been
    >returned from prepare(). And it someone breaks the protocol,
    >succeeding in prepare, but failing in commit - you'll have
    >inconsistent data. Commit cannot be undone, by definition.
    >

    This is a good question and something that I have been wondering about.

    Although it is very unlikely, this situation can actually occur.

    You see, even if an engine returns K on the prepare(), it probably
    still relies on the fact that at least one disk write operation must
    work in order to do the commit().

    You're right. Even better example - a federated storage engine (not
    necessarily _our_ federated, but conceptually similar, with remote
    storage). It may experience network failure anytime, also between
    prepare and commit.

    What this means is that the engine can actually not guarantee that the
    commit() will work. It can only guarantee that if it does not work,
    then the commit will be completed on recovery.

    XA standard allows both. xa_commit() has many dirrefent return values, I
    quote two most relevant here:

    [XA_RETRY]
    The resource manager is not able to commit the transaction branch at
    this time. This value may be returned when a blocking condition
    exists and TMNWAIT was set. Note, however, that this value may also
    be returned even when TMNWAIT is not set (for example, if the
    necessary stable storage is currently unavailable). This value cannot
    be returned if TMNEPHASE is set in flags . All resources held on
    behalf of xid remain in a prepared state until commitment is
    possible. The transaction manager should reissue xa_commit() at a
    later time.

    [XAER_RMERR]
    An error occurred in committing the work performed on behalf of the
    transaction branch and the branch's work has been rolled back. Note
    that returning this error signals a catastrophic event to a
    transaction manager since other resource managers may successfully
    commit their work on behalf of this branch. This error should be
    returned only when a resource manager concludes that it can never
    commit the branch and that it cannot hold the branch's resources in a
    prepared state. , [XA_RETRY] should be returned.

    So my question is: since recovery is only done on startup, if a
    commit() call fails, doesn't this mean that the data server should
    actually shutdown immediately and automatically restart (or some
    equivalent operation)?

    According to the XA standard - no, it only means MySQL should keep
    retrying the commit as long as it is getting XA_RETRY back.
    But I don't think it is a practical solution. There must be some
    timeouts or whatever limits to prevent MySQL from retrying forever.

    Also, XA standard does not specify when recovery happens. And if I'd be
    given a choice whether to crash MySQL when commit fails, or add a support
    for recovery not only at startup - I'd rather fix recovery :)

    Anyway, XA_RETRY is not supported at the moment. Though it'll be
    straightforward to add, when it'll be necessary.

    Regards,
    Sergei
  • No.7 | | 2881 bytes | |

    Hi Sergei,

    Jun 20, 2006, at 11:00 AM, Sergei Golubchik wrote:

    XA standard allows both. xa_commit() has many dirrefent return values,
    I
    quote two most relevant here:

    [XA_RETRY]
    The resource manager is not able to commit the transaction branch at
    this time. This value may be returned when a blocking condition
    exists and TMNWAIT was set. Note, however, that this value may also
    be returned even when TMNWAIT is not set (for example, if the
    necessary stable storage is currently unavailable). This value
    cannot
    be returned if TMNEPHASE is set in flags . All resources held on
    behalf of xid remain in a prepared state until commitment is
    possible. The transaction manager should reissue xa_commit() at a
    later time.

    [XAER_RMERR]
    An error occurred in committing the work performed on behalf of the
    transaction branch and the branch's work has been rolled back. Note
    that returning this error signals a catastrophic event to a
    transaction manager since other resource managers may successfully
    commit their work on behalf of this branch. This error should be
    returned only when a resource manager concludes that it can never
    commit the branch and that it cannot hold the branch's resources in
    a
    prepared state. , [XA_RETRY] should be returned.
    >
    >So my question is: since recovery is only done on startup, if a
    >commit() call fails, doesn't this mean that the data server should
    >actually shutdown immediately and automatically restart (or some
    >equivalent operation)?
    >

    According to the XA standard - no, it only means MySQL should keep
    retrying the commit as long as it is getting XA_RETRY back.
    But I don't think it is a practical solution. There must be some
    timeouts or whatever limits to prevent MySQL from retrying forever.

    Also, XA standard does not specify when recovery happens. And if I'd be
    given a choice whether to crash MySQL when commit fails, or add a
    support
    for recovery not only at startup - I'd rather fix recovery :)

    K, this makes a lot of sense.

    And as Stewart Smith says, the engine could provide read-only access to
    the affected tables during the retry cycle until the commit succeeds.

    I guess the engine should also print an error to the log each time the
    retry fails so that the operator can fix things if required.

    Thanks!

    Paul

    __
    \ \/ _ _/ Paul McCullagh (MSc)
    \ / | | CT
    / \ | | SNAP Innovation GmbH
    / /\ \| | Altonaer Poststr 9a
    22767 Hamburg, Germany
    PrimeBase XT www.primebase.com/xt

    __
    \ \/ _ _/ Paul McCullagh (MSc)
    \ / | | CT
    / \ | | SNAP Innovation GmbH
    / /\ \| | Altonaer Poststr 9a
    22767 Hamburg, Germany
    PrimeBase XT www.primebase.com/xt

Re: how replication handles commit fails


max 4000 letters.
Your nickname that display:
In order to stop the spam: 2 + 1 =
QUESTION ON "MYSQL"

EMSDN.COM