Vincent,
I am sorry for the long delay in responding to this e-mail. Somehow I
missed the response on the list. I appended the original message at the
bottom.
Message
From: Vincent Jardin [mailto:vincent.jardin (AT) 6wind (DOT) com]
Sent: Tuesday, April 11, 2006 3:17 PM
To: Spagnolo, Phillip A
Cc: quagga-dev (AT) lists (DOT) quagga.net; Kushi, David M; Henderson, Thomas R
Subject: Re: [quagga-dev 4082] Bug in long delay networks
Hi,
>The solution we found is to simply increase 2 in SPF_TIMERN
>(ospf->t_maxage, ospf_maxage_lsa_remover, 2) to a reasonable value.
>Maybe 60 or 600?
There is no recommendation into the RFC for having 2, 60 or
something
else. My concern is that higher the timer will be, more
entries will be
need to be kept until the remover is run.
Since I don't think it would be possible to guess
automatically the best
value, maybe this value should be configurable from the CLI and the
default one could remain 2, isn't it ?
As for the best default value, I don't know the answer. However, the
same problem does not occur with a Cisco router because it keeps the
LSAs around for at least a couple hundred seconds.
A CLI addition would be fine. We could also just put a comment in the
code and let people change it if needed.
>Attached is a patch with this fix and a couple of minor
related changes
>with explanations within the code.
Please can you elaborate more about this comment:
"+ /* This does not seem to be necessary. This LSA was already
flooded
+ when it entered the maxage list. This flood is redundant //
*/
" ?
For instance, can you describe a case when it occurs ?
This is the sequence of function calls
-ospf_lsa_flush_area()
-MAXAGE LSA set LSA to maxage
-ospf_flood_through_area() LSA is flooded throughout area
-ospf_lsa_maxage()
SPF_TIMERN (ospf->t_maxage, ospf_maxage_lsa_remover, 2);
add to maxage list and schedule remover
-ospf_maxage_lsa_remover()
-check if the LSA can be removed?
-ospf_flood_through()
-ospf_lsa_flush_area() already flooded above, so there is no
need to do it again
Does this make sense? I don't see why an LSA that has already been
maxaged and flooded needs to be reflooded after it has been checked for
neighbor state and retransmission count.
Sincerely,
Phil
Regards,
Vincent
Message:
All,
We have found a bug in ospfd for quagga 0.98.5 when it is used in high
delay networks. I think the problem exists in 0.99.3 because the same
code is found there.
The bug exists in ospf_lsa.c. It is found in ospf_lsa_maxage() when
SPF_TIMERN (ospf->t_maxage, ospf_maxage_lsa_remover, 2) is called to
schedule removal of the LSA from the database.
Here is an example:
5|
| |
2--| |
/ | |
/ 6|
/ | | |
1| |
\ | |
\ 7|
\| |
4--|
Nodes 2,3,4 are connected by a broadcast network.
Nodes 5,6,7 are connected by a PTMP network.
Let the delay of the PTMP network be 8 secs.
If the broadcast network of 2,3,4 is brought down then nodes 2,3,4 and
will generate a Network LSA and then maxage the Network LSA as all
neighbors are removed from the link. This is correct (RFC 2328 12.4.2
para 4). The problem is that these LSAs will reach all nodes in the
network and purge the databases while they are still in transit in the
PTMP network (5,6,7). When these LSAs come out of the PTMP then they
will be reinstalled and flooded again because they are already purged
from the databases. The flooding repeats this process again.
Short story, flooding is maintained for 3600 secs.
The solution we found is to simply increase 2 in SPF_TIMERN
(ospf->t_maxage, ospf_maxage_lsa_remover, 2) to a reasonable value.
Maybe 60 or 600?
Attached is a patch with this fix and a couple of minor related changes
with explanations within the code.
Is this the correct fix? Are there reason not to increase this value?
Thanks,
Phil
Phil Spagnolo
Network Technology Engineer
The Boeing Company
Phone: (425) 865-6723
Quagga-dev mailing list
Quagga-dev (AT) lists (DOT) quagga.net