how replication handles commit fails
7 answers - 1090 bytes -

Hi,
I am looking at the following code in ha_commit_trans():
if (!trans->no_2pc && trans->nht 1)
{
for (; *ht && !error; ht++)
{
int err;
if ((err= (*(*ht)->prepare)(thd, all)))
{
my_error(ER_ERRR_DURING_CMMIT, MYF(0), err);
error= 1;
}
statistic_increment(thd->status_var.ha_prepare_count,&LCK_status);
}
DBUG_EXECUTE_IF("crash_commit_after_prepare", abort(););
if (error || (is_real_trans && xid &&
(error= !(cookie= tc_log->log(thd, xid)))))
{
ha_rollback_trans(thd, all);
error= 1;
goto end;
}
DBUG_EXECUTE_IF("crash_commit_after_log", abort(););
}
error=ha_commit_one_phase(thd, all) ? cookie ? 2 : 1 : 0;
The tricky thing is that the statement of
"(error= !(cookie= tc_log->log(thd, xid))))" committed the replication's
binlog change before the real commit started.
Later, in ha_commit_one_phase(), the code never reverse binlog changes if
the transaction failed committing. How do we handle the failure here?
Thanks,
Wei
No.1 | | 1848 bytes |
| 
Hi!
Jun 18, Wei Li wrote:
Hi,
I am looking at the following code in ha_commit_trans():
if (!trans->no_2pc && trans->nht 1)
{
for (; *ht && !error; ht++)
{
int err;
if ((err= (*(*ht)->prepare)(thd, all)))
{
my_error(ER_ERRR_DURING_CMMIT, MYF(0), err);
error= 1;
}
}
DBUG_EXECUTE_IF("crash_commit_after_prepare", abort(););
if (error || (is_real_trans && xid &&
(error= !(cookie= tc_log->log(thd, xid)))))
{
ha_rollback_trans(thd, all);
error= 1;
goto end;
}
DBUG_EXECUTE_IF("crash_commit_after_log", abort(););
}
error=ha_commit_one_phase(thd, all) ? cookie ? 2 : 1 : 0;
The tricky thing is that the statement of
"(error= !(cookie= tc_log->log(thd, xid))))" committed the replication's
binlog change before the real commit started.
Later, in ha_commit_one_phase(), the code never reverse binlog changes if
the transaction failed committing. How do we handle the failure here?
We don't. Two-phase commit protocol works in the assumption that commit
phase cannot fail if prepare phase succeeded. That is by returning a
success code from prepare(), storage engine promises to be able to
commit - if there could've been an error, it must've been returned from
prepare(). And it someone breaks the protocol, succeeding in prepare,
but failing in commit - you'll have inconsistent data. Commit cannot be
undone, by definition.
It's intrinsic to the two phase commit, and has nothing to do with
binlog, think about two storage engines:
1st: prepare - ok.
2nd: prepare - ok.
1st: commit - ok
2nd: commit - failure
there's nothing we can do at the moment, we cannot revert the commit in
the 1st storage engine.
Regards,
Sergei
No.2 | | 3211 bytes |
| 
Hi Sergei and Wei,
Jun 19, 2006, at 1:39 PM, Sergei Golubchik wrote:
Hi!
Jun 18, Wei Li wrote:
>Hi,
>>
>I am looking at the following code in ha_commit_trans():
>
>if (!trans->no_2pc && trans->nht 1)
>{
>for (; *ht && !error; ht++)
>{
>int err;
>if ((err= (*(*ht)->prepare)(thd, all)))
>{
>my_error(ER_ERRR_DURING_CMMIT, MYF(0), err);
>error= 1;
>}
>}
>DBUG_EXECUTE_IF("crash_commit_after_prepare", abort(););
>if (error || (is_real_trans && xid &&
>(error= !(cookie= tc_log->log(thd, xid)))))
>{
>ha_rollback_trans(thd, all);
>error= 1;
>goto end;
>}
>DBUG_EXECUTE_IF("crash_commit_after_log", abort(););
>}
>error=ha_commit_one_phase(thd, all) ? cookie ? 2 : 1 : 0;
>
>>
>The tricky thing is that the statement of
>"(error= !(cookie= tc_log->log(thd, xid))))" committed the
>replication's
>binlog change before the real commit started.
>>
>Later, in ha_commit_one_phase(), the code never reverse binlog
>changes if
>the transaction failed committing. How do we handle the failure here?
>
We don't. Two-phase commit protocol works in the assumption that commit
phase cannot fail if prepare phase succeeded. That is by returning a
success code from prepare(), storage engine promises to be able to
commit - if there could've been an error, it must've been returned from
prepare(). And it someone breaks the protocol, succeeding in prepare,
but failing in commit - you'll have inconsistent data. Commit cannot be
undone, by definition.
It's intrinsic to the two phase commit, and has nothing to do with
binlog, think about two storage engines:
1st: prepare - ok.
2nd: prepare - ok.
1st: commit - ok
2nd: commit - failure
there's nothing we can do at the moment, we cannot revert the commit in
the 1st storage engine.
Regards,
Sergei
This is a good question and something that I have been wondering about.
Although it is very unlikely, this situation can actually occur.
You see, even if an engine returns K on the prepare(), it probably
still relies on the fact that at least one disk write operation must
work in order to do the commit().
What this means is that the engine can actually not guarantee that the
commit() will work. It can only guarantee that if it does not work,
then the commit will be completed on recovery.
So my question is: since recovery is only done on startup, if a
commit() call fails, doesn't this mean that the data server should
actually shutdown immediately and automatically restart (or some
equivalent operation)?
Thanks for your help,
Paul
__
\ \/ _ _/ Paul McCullagh (MSc)
\ / | | CT
/ \ | | SNAP Innovation GmbH
/ /\ \| | Altonaer Poststr 9a
22767 Hamburg, Germany
PrimeBase XT www.primebase.com/xt
No.3 | | 3217 bytes |
| 
6/19/06, Paul McCullagh <paul.mccullagh (AT) primebase (DOT) comwrote:
Hi Sergei and Wei,
Jun 19, 2006, at 1:39 PM, Sergei Golubchik wrote:
Hi!
Jun 18, Wei Li wrote:
>Hi,
>>
>I am looking at the following code in ha_commit_trans():
>
>if (!trans->no_2pc && trans->nht 1)
>{
>for (; *ht && !error; ht++)
>{
>int err;
>if ((err= (*(*ht)->prepare)(thd, all)))
>{
>my_error(ER_ERRR_DURING_CMMIT, MYF(0), err);
>error= 1;
>}
>}
>DBUG_EXECUTE_IF("crash_commit_after_prepare", abort(););
>if (error || (is_real_trans && xid &&
>(error= !(cookie= tc_log->log(thd, xid)))))
>{
>ha_rollback_trans(thd, all);
>error= 1;
>goto end;
>}
>DBUG_EXECUTE_IF("crash_commit_after_log", abort(););
>}
>error=ha_commit_one_phase(thd, all) ? cookie ? 2 : 1 : 0;
>
>>
>The tricky thing is that the statement of
>"(error= !(cookie= tc_log->log(thd, xid))))" committed the
>replication's
>binlog change before the real commit started.
>>
>Later, in ha_commit_one_phase(), the code never reverse binlog
>changes if
>the transaction failed committing. How do we handle the failure here?
>
We don't. Two-phase commit protocol works in the assumption that commit
phase cannot fail if prepare phase succeeded. That is by returning a
success code from prepare(), storage engine promises to be able to
commit - if there could've been an error, it must've been returned from
prepare(). And it someone breaks the protocol, succeeding in prepare,
but failing in commit - you'll have inconsistent data. Commit cannot be
undone, by definition.
It's intrinsic to the two phase commit, and has nothing to do with
binlog, think about two storage engines:
1st: prepare - ok.
2nd: prepare - ok.
1st: commit - ok
2nd: commit - failure
there's nothing we can do at the moment, we cannot revert the commit in
the 1st storage engine.
Regards,
Sergei
This is a good question and something that I have been wondering about.
Although it is very unlikely, this situation can actually occur.
You see, even if an engine returns K on the prepare(), it probably
still relies on the fact that at least one disk write operation must
work in order to do the commit().
What this means is that the engine can actually not guarantee that the
commit() will work. It can only guarantee that if it does not work,
then the commit will be completed on recovery.
So my question is: since recovery is only done on startup, if a
commit() call fails, doesn't this mean that the data server should
actually shutdown immediately and automatically restart (or some
equivalent operation)?
Why would the write succeed during the recovery but not during the commit?
And I think all database transactions depend on a working storage system.
No.4 | | 457 bytes |
| 
Mon, 2006-06-19 at 22:36 +0200, van der Spek wrote:
Why would the write succeed during the recovery but not during the commit?
And I think all database transactions depend on a working storage system.
If you hit a file system problem that forces the file system read only.
but then "recovery" can involve an umount/reboot/fs repair
You'd be amazed at how much things don't work when the FS suddenly goes
read only :)
No.5 | | 786 bytes |
| 
Mon, 2006-06-19 at 16:29 +0200, Paul McCullagh wrote:
What this means is that the engine can actually not guarantee that the
commit() will work. It can only guarantee that if it does not work,
then the commit will be completed on recovery.
So my question is: since recovery is only done on startup, if a
commit() call fails, doesn't this mean that the data server should
actually shutdown immediately and automatically restart (or some
equivalent operation)?
My guess would be that it would be best to leave the choice to restart
to the administrator and still try to service any requests (e.g. read
requests could still go on fine especially other tables shouldn't be
affected).
although i'm not familiar with the XA code
No.6 | | 3789 bytes |
| 
Hi!
Jun 19, Paul McCullagh wrote:
Hi Sergei and Wei,
>>Later, in ha_commit_one_phase(), the code never reverse binlog
>>changes if the transaction failed committing. How do we handle the
>>failure here?
>
>We don't. Two-phase commit protocol works in the assumption that
>commit phase cannot fail if prepare phase succeeded. That is by
>returning a success code from prepare(), storage engine promises to
>be able to commit - if there could've been an error, it must've been
>returned from prepare(). And it someone breaks the protocol,
>succeeding in prepare, but failing in commit - you'll have
>inconsistent data. Commit cannot be undone, by definition.
>
This is a good question and something that I have been wondering about.
Although it is very unlikely, this situation can actually occur.
You see, even if an engine returns K on the prepare(), it probably
still relies on the fact that at least one disk write operation must
work in order to do the commit().
You're right. Even better example - a federated storage engine (not
necessarily _our_ federated, but conceptually similar, with remote
storage). It may experience network failure anytime, also between
prepare and commit.
What this means is that the engine can actually not guarantee that the
commit() will work. It can only guarantee that if it does not work,
then the commit will be completed on recovery.
XA standard allows both. xa_commit() has many dirrefent return values, I
quote two most relevant here:
[XA_RETRY]
The resource manager is not able to commit the transaction branch at
this time. This value may be returned when a blocking condition
exists and TMNWAIT was set. Note, however, that this value may also
be returned even when TMNWAIT is not set (for example, if the
necessary stable storage is currently unavailable). This value cannot
be returned if TMNEPHASE is set in flags . All resources held on
behalf of xid remain in a prepared state until commitment is
possible. The transaction manager should reissue xa_commit() at a
later time.
[XAER_RMERR]
An error occurred in committing the work performed on behalf of the
transaction branch and the branch's work has been rolled back. Note
that returning this error signals a catastrophic event to a
transaction manager since other resource managers may successfully
commit their work on behalf of this branch. This error should be
returned only when a resource manager concludes that it can never
commit the branch and that it cannot hold the branch's resources in a
prepared state. , [XA_RETRY] should be returned.
So my question is: since recovery is only done on startup, if a
commit() call fails, doesn't this mean that the data server should
actually shutdown immediately and automatically restart (or some
equivalent operation)?
According to the XA standard - no, it only means MySQL should keep
retrying the commit as long as it is getting XA_RETRY back.
But I don't think it is a practical solution. There must be some
timeouts or whatever limits to prevent MySQL from retrying forever.
Also, XA standard does not specify when recovery happens. And if I'd be
given a choice whether to crash MySQL when commit fails, or add a support
for recovery not only at startup - I'd rather fix recovery :)
Anyway, XA_RETRY is not supported at the moment. Though it'll be
straightforward to add, when it'll be necessary.
Regards,
Sergei
No.7 | | 2881 bytes |
| 
Hi Sergei,
Jun 20, 2006, at 11:00 AM, Sergei Golubchik wrote:
XA standard allows both. xa_commit() has many dirrefent return values,
I
quote two most relevant here:
[XA_RETRY]
The resource manager is not able to commit the transaction branch at
this time. This value may be returned when a blocking condition
exists and TMNWAIT was set. Note, however, that this value may also
be returned even when TMNWAIT is not set (for example, if the
necessary stable storage is currently unavailable). This value
cannot
be returned if TMNEPHASE is set in flags . All resources held on
behalf of xid remain in a prepared state until commitment is
possible. The transaction manager should reissue xa_commit() at a
later time.
[XAER_RMERR]
An error occurred in committing the work performed on behalf of the
transaction branch and the branch's work has been rolled back. Note
that returning this error signals a catastrophic event to a
transaction manager since other resource managers may successfully
commit their work on behalf of this branch. This error should be
returned only when a resource manager concludes that it can never
commit the branch and that it cannot hold the branch's resources in
a
prepared state. , [XA_RETRY] should be returned.
>
>So my question is: since recovery is only done on startup, if a
>commit() call fails, doesn't this mean that the data server should
>actually shutdown immediately and automatically restart (or some
>equivalent operation)?
>
According to the XA standard - no, it only means MySQL should keep
retrying the commit as long as it is getting XA_RETRY back.
But I don't think it is a practical solution. There must be some
timeouts or whatever limits to prevent MySQL from retrying forever.
Also, XA standard does not specify when recovery happens. And if I'd be
given a choice whether to crash MySQL when commit fails, or add a
support
for recovery not only at startup - I'd rather fix recovery :)
K, this makes a lot of sense.
And as Stewart Smith says, the engine could provide read-only access to
the affected tables during the retry cycle until the commit succeeds.
I guess the engine should also print an error to the log each time the
retry fails so that the operator can fix things if required.
Thanks!
Paul
__
\ \/ _ _/ Paul McCullagh (MSc)
\ / | | CT
/ \ | | SNAP Innovation GmbH
/ /\ \| | Altonaer Poststr 9a
22767 Hamburg, Germany
PrimeBase XT www.primebase.com/xt
__
\ \/ _ _/ Paul McCullagh (MSc)
\ / | | CT
/ \ | | SNAP Innovation GmbH
/ /\ \| | Altonaer Poststr 9a
22767 Hamburg, Germany
PrimeBase XT www.primebase.com/xt