Networking

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Strange behavior on directly connected interfaces?

    10 answers - 2825 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hi,
    This is my first post to the list and I would like to preface this by
    stating that I doubt this problem is actually related specifically to
    Juniper equipment (perhaps a configuration error involving Juniper
    equipment, however). I'm hoping the issue I'm working on right now
    might ring bells in the heads of others, and in any case I figure this
    is as good a place as any to find yourself beaten by the clue stick.
    I have a directly connected interface facing a large, flat Ethernet
    infrastructure. There are dozens of IP's mapped to the interface in
    question (this is a legacy aspect of the design, but migration to a
    more hierarchical infrastructure is a long process). Periodically,
    when packets are transmitted with an unreachable destination IP
    residing on the directly connected interface, a massive series of
    ICMP TTL exceeded packets is returned by a different host residing on
    a different logical interface. Traceroutes to the unreachable IP
    similarly show a one-node loop (the same IP responds until the TTL=0).
    The node is always the same, but if unmitigated ICMP traffic is
    permitted to and from addresses on the logical interface, sniffing the
    wire shows this behavior occurring to and from a number of nodes. I
    haven't managed to duplicate the multi-node behavior in a
    semi-controlled environment.
    When sniffing the segment in question, the ICMP is clearly visible,
    so for whatever reason it is universally broadcast, even though both
    nodes involved in the ICMP communication are legitimate unicast
    destinations. If a ping is left running, these TTL exceeded messages
    will continue an accelerate ad nauseum until a de facto pseudo
    broadcast storm occurs, crippling access on every switching node where
    the VLAN in question is mapped. Usually (but not always) the
    anomalies halt when the ping is killed. The issue is largely
    mitigated by denying all ICMP to and from addresses mapped to the
    logical interface.
    That's all I'm comfortable asserting about the issue at this time.
    What I'm really digging for here is an explanation as to why when the
    Juniper tries to transmit to an unreachable node, it doesn't discover
    the node is unreachable due to a lack of response from an ARP request
    and return ICMP unreachables on it's own. I may have missed something
    obvious here (I'm sort of hoping so) and would appreciate any
    suggestions or experience from others. If I've sent this message to a
    woefully inappropriate list I would greatly appreciate a suggestion as
    to a better place to bring my question(s).
    Thanks,
    -FC
    juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net
  • No.1 | | 3165 bytes | |

    frances,

    looks like you have a forwarding loop in your setup;

    for further troubleshooting attach a packet-sniffer to
    the subnet in question and spot for the source MAC-adress
    that is bouncing back your traffic.

    /hannes

    Frances Albemuth wrote:
    Hi,

    This is my first post to the list and I would like to preface this by
    stating that I doubt this problem is actually related specifically to
    Juniper equipment (perhaps a configuration error involving Juniper
    equipment, however). I'm hoping the issue I'm working on right now
    might ring bells in the heads of others, and in any case I figure this
    is as good a place as any to find yourself beaten by the clue stick.

    I have a directly connected interface facing a large, flat Ethernet
    infrastructure. There are dozens of IP's mapped to the interface in
    question (this is a legacy aspect of the design, but migration to a
    more hierarchical infrastructure is a long process). Periodically,
    when packets are transmitted with an unreachable destination IP
    residing on the directly connected interface, a massive series of
    ICMP TTL exceeded packets is returned by a different host residing on
    a different logical interface. Traceroutes to the unreachable IP
    similarly show a one-node loop (the same IP responds until the TTL=0).
    The node is always the same, but if unmitigated ICMP traffic is
    permitted to and from addresses on the logical interface, sniffing the
    wire shows this behavior occurring to and from a number of nodes. I
    haven't managed to duplicate the multi-node behavior in a
    semi-controlled environment.

    When sniffing the segment in question, the ICMP is clearly visible,
    so for whatever reason it is universally broadcast, even though both
    nodes involved in the ICMP communication are legitimate unicast
    destinations. If a ping is left running, these TTL exceeded messages
    will continue an accelerate ad nauseum until a de facto pseudo
    broadcast storm occurs, crippling access on every switching node where
    the VLAN in question is mapped. Usually (but not always) the
    anomalies halt when the ping is killed. The issue is largely
    mitigated by denying all ICMP to and from addresses mapped to the
    logical interface.

    That's all I'm comfortable asserting about the issue at this time.
    What I'm really digging for here is an explanation as to why when the
    Juniper tries to transmit to an unreachable node, it doesn't discover
    the node is unreachable due to a lack of response from an ARP request
    and return ICMP unreachables on it's own. I may have missed something
    obvious here (I'm sort of hoping so) and would appreciate any
    suggestions or experience from others. If I've sent this message to a
    woefully inappropriate list I would greatly appreciate a suggestion as
    to a better place to bring my question(s).

    Thanks,
    -FC

    juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net

    juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net
  • No.2 | | 4254 bytes | |

    Hi Hannes,

    Thanks for your response. When I'm sniffing on the segment I see a
    massive stream of ICMP TTL exceeded messages being returned by the
    "mystery node". The topology is definitely loop-free and the
    "loop-ish" behavior that we see only seems to occur when data is
    transmitted to unreachable destinations.

    I assume by forwarding loop you mean an Ethernet loop? I would agree
    that it behaves this way in some respects. course, if I had a
    genuine loop the problems would be more serious and would occur
    regardless of routed traffic (the Ethernet topology with a handful of
    hosts would cripple itself).

    Also interesting: the node returning the TTL exceeded "storm" lives
    behind a link with a maximum synchronous capacity of 10M. The "storm"
    itself results in 10M of traffic pushing consistently over all ports
    where the VLAN lives. It thusly only cripples other devices with a
    10M maximum synchronous bandwidth.

    Thanks!

    -FC

    5/16/06, Hannes Gredler <hannes (AT) juniper (DOT) netwrote:
    frances,

    looks like you have a forwarding loop in your setup;

    for further troubleshooting attach a packet-sniffer to
    the subnet in question and spot for the source MAC-adress
    that is bouncing back your traffic.

    /hannes
    --
    Frances Albemuth wrote:
    Hi,

    This is my first post to the list and I would like to preface this by
    stating that I doubt this problem is actually related specifically to
    Juniper equipment (perhaps a configuration error involving Juniper
    equipment, however). I'm hoping the issue I'm working on right now
    might ring bells in the heads of others, and in any case I figure this
    is as good a place as any to find yourself beaten by the clue stick.

    I have a directly connected interface facing a large, flat Ethernet
    infrastructure. There are dozens of IP's mapped to the interface in
    question (this is a legacy aspect of the design, but migration to a
    more hierarchical infrastructure is a long process). Periodically,
    when packets are transmitted with an unreachable destination IP
    residing on the directly connected interface, a massive series of
    ICMP TTL exceeded packets is returned by a different host residing on
    a different logical interface. Traceroutes to the unreachable IP
    similarly show a one-node loop (the same IP responds until the TTL=0).
    The node is always the same, but if unmitigated ICMP traffic is
    permitted to and from addresses on the logical interface, sniffing the
    wire shows this behavior occurring to and from a number of nodes. I
    haven't managed to duplicate the multi-node behavior in a
    semi-controlled environment.

    When sniffing the segment in question, the ICMP is clearly visible,
    so for whatever reason it is universally broadcast, even though both
    nodes involved in the ICMP communication are legitimate unicast
    destinations. If a ping is left running, these TTL exceeded messages
    will continue an accelerate ad nauseum until a de facto pseudo
    broadcast storm occurs, crippling access on every switching node where
    the VLAN in question is mapped. Usually (but not always) the
    anomalies halt when the ping is killed. The issue is largely
    mitigated by denying all ICMP to and from addresses mapped to the
    logical interface.

    That's all I'm comfortable asserting about the issue at this time.
    What I'm really digging for here is an explanation as to why when the
    Juniper tries to transmit to an unreachable node, it doesn't discover
    the node is unreachable due to a lack of response from an ARP request
    and return ICMP unreachables on it's own. I may have missed something
    obvious here (I'm sort of hoping so) and would appreciate any
    suggestions or experience from others. If I've sent this message to a
    woefully inappropriate list I would greatly appreciate a suggestion as
    to a better place to bring my question(s).

    Thanks,

    -FC

    juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net

    juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net
  • No.3 | | 5204 bytes | |

    frances,

    to mitigate the problem while diagnosing
    you could configure a firewall that discards
    traffic from non-local-subnet sources.

    but lets focus on the loop:
    what is the mac-adress of the mystery node ?

    /hannes

    Frances Albemuth wrote:
    Hi Hannes,

    Thanks for your response. When I'm sniffing on the segment I see a
    massive stream of ICMP TTL exceeded messages being returned by the
    "mystery node". The topology is definitely loop-free and the
    "loop-ish" behavior that we see only seems to occur when data is
    transmitted to unreachable destinations.

    I assume by forwarding loop you mean an Ethernet loop? I would agree
    that it behaves this way in some respects. course, if I had a
    genuine loop the problems would be more serious and would occur
    regardless of routed traffic (the Ethernet topology with a handful of
    hosts would cripple itself).

    Also interesting: the node returning the TTL exceeded "storm" lives
    behind a link with a maximum synchronous capacity of 10M. The "storm"
    itself results in 10M of traffic pushing consistently over all ports
    where the VLAN lives. It thusly only cripples other devices with a
    10M maximum synchronous bandwidth.

    Thanks!
    -FC

    5/16/06, Hannes Gredler <hannes (AT) juniper (DOT) netwrote:

    >frances,
    >>

    >looks like you have a forwarding loop in your setup;
    >>

    >for further troubleshooting attach a packet-sniffer to
    >the subnet in question and spot for the source MAC-adress
    >that is bouncing back your traffic.
    >>

    >/hannes
    >>
    >>

    >Frances Albemuth wrote:
    >Hi,
    >>

    >This is my first post to the list and I would like to preface this by
    >stating that I doubt this problem is actually related specifically to
    >Juniper equipment (perhaps a configuration error involving Juniper
    >equipment, however). I'm hoping the issue I'm working on right now
    >might ring bells in the heads of others, and in any case I figure this
    >is as good a place as any to find yourself beaten by the clue stick.
    >>

    >I have a directly connected interface facing a large, flat Ethernet
    >infrastructure. There are dozens of IP's mapped to the interface in
    >question (this is a legacy aspect of the design, but migration to a
    >more hierarchical infrastructure is a long process). Periodically,
    >when packets are transmitted with an unreachable destination IP
    >residing on the directly connected interface, a massive series of
    >ICMP TTL exceeded packets is returned by a different host residing on
    >a different logical interface. Traceroutes to the unreachable IP
    >similarly show a one-node loop (the same IP responds until the TTL=0).
    >The node is always the same, but if unmitigated ICMP traffic is
    >permitted to and from addresses on the logical interface, sniffing the
    >wire shows this behavior occurring to and from a number of nodes. I
    >haven't managed to duplicate the multi-node behavior in a
    >semi-controlled environment.
    >>

    >When sniffing the segment in question, the ICMP is clearly visible,
    >so for whatever reason it is universally broadcast, even though both
    >nodes involved in the ICMP communication are legitimate unicast
    >destinations. If a ping is left running, these TTL exceeded messages
    >will continue an accelerate ad nauseum until a de facto pseudo
    >broadcast storm occurs, crippling access on every switching node where
    >the VLAN in question is mapped. Usually (but not always) the
    >anomalies halt when the ping is killed. The issue is largely
    >mitigated by denying all ICMP to and from addresses mapped to the
    >logical interface.
    >>

    >That's all I'm comfortable asserting about the issue at this time.
    >What I'm really digging for here is an explanation as to why when the
    >Juniper tries to transmit to an unreachable node, it doesn't discover
    >the node is unreachable due to a lack of response from an ARP request
    >and return ICMP unreachables on it's own. I may have missed something
    >obvious here (I'm sort of hoping so) and would appreciate any
    >suggestions or experience from others. If I've sent this message to a
    >woefully inappropriate list I would greatly appreciate a suggestion as
    >to a better place to bring my question(s).
    >>

    >Thanks,
    >>

    >-FC
    >>

    >
    >juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net
    >
    >>


    juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net
  • No.4 | | 5785 bytes | |

    The issue can thus far be mitigated (believe it or not) by filtering
    ICMP to and from the "mystery node", or by filtering ICMP to and from
    every network on interface "A". I'm in possession of the MAC of the
    "mystery node" and I know exactly where it lives on the network, but
    it doesn't seem to correspond oddly with anything and I haven't
    identified anything quirky about the network configuration. What else
    should I be keeping an eye out for?

    -FC

    5/17/06, Hannes Gredler <hannes (AT) juniper (DOT) netwrote:
    frances,

    to mitigate the problem while diagnosing
    you could configure a firewall that discards
    traffic from non-local-subnet sources.

    but lets focus on the loop:
    what is the mac-adress of the mystery node ?

    /hannes

    Frances Albemuth wrote:
    Hi Hannes,

    Thanks for your response. When I'm sniffing on the segment I see a
    massive stream of ICMP TTL exceeded messages being returned by the
    "mystery node". The topology is definitely loop-free and the
    "loop-ish" behavior that we see only seems to occur when data is
    transmitted to unreachable destinations.

    I assume by forwarding loop you mean an Ethernet loop? I would agree
    that it behaves this way in some respects. course, if I had a
    genuine loop the problems would be more serious and would occur
    regardless of routed traffic (the Ethernet topology with a handful of
    hosts would cripple itself).

    Also interesting: the node returning the TTL exceeded "storm" lives
    behind a link with a maximum synchronous capacity of 10M. The "storm"
    itself results in 10M of traffic pushing consistently over all ports
    where the VLAN lives. It thusly only cripples other devices with a
    10M maximum synchronous bandwidth.

    Thanks!

    -FC

    5/16/06, Hannes Gredler <hannes (AT) juniper (DOT) netwrote:
    >
    >frances,
    >>

    >looks like you have a forwarding loop in your setup;
    >>

    >for further troubleshooting attach a packet-sniffer to
    >the subnet in question and spot for the source MAC-adress
    >that is bouncing back your traffic.
    >>

    >/hannes
    >>
    >>

    >Frances Albemuth wrote:
    >Hi,
    >>

    >This is my first post to the list and I would like to preface this by
    >stating that I doubt this problem is actually related specifically to
    >Juniper equipment (perhaps a configuration error involving Juniper
    >equipment, however). I'm hoping the issue I'm working on right now
    >might ring bells in the heads of others, and in any case I figure this
    >is as good a place as any to find yourself beaten by the clue stick.
    >>

    >I have a directly connected interface facing a large, flat Ethernet
    >infrastructure. There are dozens of IP's mapped to the interface in
    >question (this is a legacy aspect of the design, but migration to a
    >more hierarchical infrastructure is a long process). Periodically,
    >when packets are transmitted with an unreachable destination IP
    >residing on the directly connected interface, a massive series of
    >ICMP TTL exceeded packets is returned by a different host residing on
    >a different logical interface. Traceroutes to the unreachable IP
    >similarly show a one-node loop (the same IP responds until the TTL=0).
    >The node is always the same, but if unmitigated ICMP traffic is
    >permitted to and from addresses on the logical interface, sniffing the
    >wire shows this behavior occurring to and from a number of nodes. I
    >haven't managed to duplicate the multi-node behavior in a
    >semi-controlled environment.
    >>

    >When sniffing the segment in question, the ICMP is clearly visible,
    >so for whatever reason it is universally broadcast, even though both
    >nodes involved in the ICMP communication are legitimate unicast
    >destinations. If a ping is left running, these TTL exceeded messages
    >will continue an accelerate ad nauseum until a de facto pseudo
    >broadcast storm occurs, crippling access on every switching node where
    >the VLAN in question is mapped. Usually (but not always) the
    >anomalies halt when the ping is killed. The issue is largely
    >mitigated by denying all ICMP to and from addresses mapped to the
    >logical interface.
    >>

    >That's all I'm comfortable asserting about the issue at this time.
    >What I'm really digging for here is an explanation as to why when the
    >Juniper tries to transmit to an unreachable node, it doesn't discover
    >the node is unreachable due to a lack of response from an ARP request
    >and return ICMP unreachables on it's own. I may have missed something
    >obvious here (I'm sort of hoping so) and would appreciate any
    >suggestions or experience from others. If I've sent this message to a
    >woefully inappropriate list I would greatly appreciate a suggestion as
    >to a better place to bring my question(s).
    >>

    >Thanks,
    >>

    >-FC
    >>

    >
    >juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net
    >
    >>

    >


    juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net
  • No.5 | | 7624 bytes | |

    frances,

    question 1: what is the MAC adress of the device that
    generates the 10MBit/s worth of traffic.

    question 2: is your juniper router the only exit for your traffic

    question 3: could it be that there are hidden backdoor(s)

    question 4: what traffic is being looped / unicast / broadcast

    question 5: what is the destination MAC adress of the looped traffic
    (broadcast address / unicast address of the router)

    /hannes

    Frances Albemuth wrote:
    The issue can thus far be mitigated (believe it or not) by filtering
    ICMP to and from the "mystery node", or by filtering ICMP to and from
    every network on interface "A". I'm in possession of the MAC of the
    "mystery node" and I know exactly where it lives on the network, but
    it doesn't seem to correspond oddly with anything and I haven't
    identified anything quirky about the network configuration. What else
    should I be keeping an eye out for?
    -FC

    5/17/06, Hannes Gredler <hannes (AT) juniper (DOT) netwrote:

    >frances,
    >>

    >to mitigate the problem while diagnosing
    >you could configure a firewall that discards
    >traffic from non-local-subnet sources.
    >>

    >but lets focus on the loop:
    >what is the mac-adress of the mystery node ?
    >>

    >/hannes
    >>

    >Frances Albemuth wrote:
    >Hi Hannes,
    >>

    >Thanks for your response. When I'm sniffing on the segment I see a
    >massive stream of ICMP TTL exceeded messages being returned by the
    >"mystery node". The topology is definitely loop-free and the
    >"loop-ish" behavior that we see only seems to occur when data is
    >transmitted to unreachable destinations.
    >>

    >I assume by forwarding loop you mean an Ethernet loop? I would agree
    >that it behaves this way in some respects. course, if I had a
    >genuine loop the problems would be more serious and would occur
    >regardless of routed traffic (the Ethernet topology with a handful of
    >hosts would cripple itself).
    >>

    >Also interesting: the node returning the TTL exceeded "storm" lives
    >behind a link with a maximum synchronous capacity of 10M. The "storm"
    >itself results in 10M of traffic pushing consistently over all ports
    >where the VLAN lives. It thusly only cripples other devices with a
    >10M maximum synchronous bandwidth.
    >>

    >Thanks!
    >>

    >-FC
    >>

    >5/16/06, Hannes Gredler <hannes (AT) juniper (DOT) netwrote:
    >>
    >>frances,
    >>>

    >>looks like you have a forwarding loop in your setup;
    >>>

    >>for further troubleshooting attach a packet-sniffer to
    >>the subnet in question and spot for the source MAC-adress
    >>that is bouncing back your traffic.
    >>>

    >>/hannes
    >>>
    >>>

    >>Frances Albemuth wrote:
    >>Hi,
    >>>

    >>This is my first post to the list and I would like to preface

    >this by
    >>stating that I doubt this problem is actually related

    >specifically to
    >>Juniper equipment (perhaps a configuration error involving Juniper
    >>equipment, however). I'm hoping the issue I'm working on right now
    >>might ring bells in the heads of others, and in any case I figure

    >this
    >>is as good a place as any to find yourself beaten by the clue stick.
    >>>

    >>I have a directly connected interface facing a large, flat

    >Ethernet
    >>infrastructure. There are dozens of IP's mapped to the interface in
    >>question (this is a legacy aspect of the design, but migration to a
    >>more hierarchical infrastructure is a long process). Periodically,
    >>when packets are transmitted with an unreachable destination IP
    >>residing on the directly connected interface, a massive series of
    >>ICMP TTL exceeded packets is returned by a different host

    >residing on
    >>a different logical interface. Traceroutes to the unreachable IP
    >>similarly show a one-node loop (the same IP responds until the

    >TTL=0).
    >>The node is always the same, but if unmitigated ICMP traffic is
    >>permitted to and from addresses on the logical interface,

    >sniffing the
    >>wire shows this behavior occurring to and from a number of nodes. I
    >>haven't managed to duplicate the multi-node behavior in a
    >>semi-controlled environment.
    >>>

    >>When sniffing the segment in question, the ICMP is clearly

    >visible,
    >>so for whatever reason it is universally broadcast, even though both
    >>nodes involved in the ICMP communication are legitimate unicast
    >>destinations. If a ping is left running, these TTL exceeded

    >messages
    >>will continue an accelerate ad nauseum until a de facto pseudo
    >>broadcast storm occurs, crippling access on every switching node

    >where
    >>the VLAN in question is mapped. Usually (but not always) the
    >>anomalies halt when the ping is killed. The issue is largely
    >>mitigated by denying all ICMP to and from addresses mapped to the
    >>logical interface.
    >>>

    >>That's all I'm comfortable asserting about the issue at this time.
    >>What I'm really digging for here is an explanation as to why when

    >the
    >>Juniper tries to transmit to an unreachable node, it doesn't

    >discover
    >>the node is unreachable due to a lack of response from an ARP

    >request
    >>and return ICMP unreachables on it's own. I may have missed

    >something
    >>obvious here (I'm sort of hoping so) and would appreciate any
    >>suggestions or experience from others. If I've sent this message

    >to a
    >>woefully inappropriate list I would greatly appreciate a

    >suggestion as
    >>to a better place to bring my question(s).
    >>>

    >>Thanks,
    >>>

    >>-FC
    >>>

    >>
    >>juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net
    >>
    >>>

    >>


    juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net
  • No.6 | | 12696 bytes | |

    Hi Hannes,

    Some of these questions are easier to answer than others.

    5/17/06, Hannes Gredler <hannes (AT) juniper (DOT) netwrote:
    frances,

    question 1: what is the MAC adress of the device that
    generates the 10MBit/s worth of traffic.

    I'm not sure there's a single device responsible for this; it sort of
    looks as if there are at least two culprits. The two being examined
    at the moment are both consumer grade routing appliances. I've since
    determined more about what is occurring with these culprit devices:

    * These routers transmit bogus traffic to destinations located in
    the same subnet with spoofed source IPs but real source MACs.

    * Unicast is flooded because the ARP timeout exceeds the CAM table
    timeout. The CAM table never learns the MAC of the "target" device
    because that device is discarding all of this traffic and not
    generating any traffic of their own (at the time this occurs -- the
    behavior is not constant).

    * Some of the destination IP's generate no traffic during certain
    periods of the day.

    * The traffic the culprit devices transmit to other devices in the
    broadcast domain will never meet the requirements of a typical
    iptables or equivalent implementation so the traffic is quietly
    dropped.

    Net result? Bogus traffic is broadcast all over the place because
    the switching infrastructure never has a cause to learn the MAC(s) the
    culprit routers are trying to reach. The culprit routers don't ARP
    for it, they just remember the destination MAC, and the switches
    dutifully flood the unicast frames in hopes of identifying the
    legitimate destination MAC from a hypothetical return stream of
    traffic. This never happens, so these bursts of illegitimate traffic
    occur until someone generates traffic from behind a target device.
    Then the switching infrastructure learns the MAC and voila, the
    unicast traffic stops getting flooded all over the place.

    question 2: is your juniper router the only exit for your traffic

    Indeed it is.

    question 3: could it be that there are hidden backdoor(s)

    As in loop-ish cross-connections "behind" our infrastructure?
    Possible, but unlikely.

    question 4: what traffic is being looped / unicast / broadcast

    What's known about that traffic is largely articulated in the answer
    to question 1, though if you've got more questions about that traffic
    specifically I can probably find more answers

    question 5: what is the destination MAC adress of the looped traffic
    (broadcast address / unicast address of the router)

    Also covered largely in the answer to question 1, but to expand on
    this a bit, there are two distinct behaviors. I'll call one
    "weirdness" and the other "high weirdness". In the case of high
    weirdness, here's what happens to the best of my ability to tell:

    - Legitimate ICMP is transmitted from outside source and arrives at router.
    - Router figures packet should egress to directly connected network
    via specific logical interface (makes certain filter criterion are
    good, et al).
    - Router finds the destination address in the ARP table and fires
    off a frame into the "Ethernet cloud" with the destination MAC culled
    from the ARP table.
    - The switches haven't heard a frame from the device corresponding
    with the destination MAC for a while and have forgotten the
    destination MAC, so they flood the frames.
    - Naughty routers (two of them) hear the frames and get in on the
    action. They spoof the source IP of the router (!!) and transmit
    massive amounts of ICMP to the node which the router is also trying to
    transmit to.
    - None of this traffic warrants a response from the target node or
    the equipment behind it -- it's a firewall silently discarding
    unwanted traffic. So we still don't know how to get to this MAC
    without flooding.
    - Since these naughty routers are spoofing the IP of the real
    gateway but never ARP'ing for it, lots of routers are receiving
    flooded unicast frames which they believe they shouldn't be receiving
    and which they believe came from the real gateway. They send the
    gateway ICMP redirect host messages (redirecting it to itself).
    - For each ICMP echo that goes in, dozens of ICMP messages with
    different purposes come out.
    - Some of these packets are getting their TTL decremented (the only
    thing that slows the situation down) but others are not. Give it a
    good thirty seconds and you have a storm. if you stop the
    introduction of ICMP to the network, the TTL will decrement on enough
    of these packets to calm the situation.

    In the case of weirdness, we have a much less severe version of the
    situation outlined above, wherein lots of routers are getting frames
    that don't belong to them because of the ARP/CAM synchronization
    issue, but it doesn't get out of control because the two very naughty
    nodes don't get involved and the TTL decrements as it should.

    The other issue with the TTL exceeded messages coming back on a
    different logical interface is a little bit of a red herring - still
    interesting, but the situation above seems to be the elephant in the
    living room here.

    Let me know if you have thoughts, and thank you for your time and
    consideration.

    -FC

    /hannes

    Frances Albemuth wrote:
    The issue can thus far be mitigated (believe it or not) by filtering
    ICMP to and from the "mystery node", or by filtering ICMP to and from
    every network on interface "A". I'm in possession of the MAC of the
    "mystery node" and I know exactly where it lives on the network, but
    it doesn't seem to correspond oddly with anything and I haven't
    identified anything quirky about the network configuration. What else
    should I be keeping an eye out for?

    -FC

    5/17/06, Hannes Gredler <hannes (AT) juniper (DOT) netwrote:
    >
    >frances,
    >>

    >to mitigate the problem while diagnosing
    >you could configure a firewall that discards
    >traffic from non-local-subnet sources.
    >>

    >but lets focus on the loop:
    >what is the mac-adress of the mystery node ?
    >>

    >/hannes
    >>

    >Frances Albemuth wrote:
    >Hi Hannes,
    >>

    >Thanks for your response. When I'm sniffing on the segment I see a
    >massive stream of ICMP TTL exceeded messages being returned by the
    >"mystery node". The topology is definitely loop-free and the
    >"loop-ish" behavior that we see only seems to occur when data is
    >transmitted to unreachable destinations.
    >>

    >I assume by forwarding loop you mean an Ethernet loop? I would agree
    >that it behaves this way in some respects. course, if I had a
    >genuine loop the problems would be more serious and would occur
    >regardless of routed traffic (the Ethernet topology with a handful of
    >hosts would cripple itself).
    >>

    >Also interesting: the node returning the TTL exceeded "storm" lives
    >behind a link with a maximum synchronous capacity of 10M. The "storm"
    >itself results in 10M of traffic pushing consistently over all ports
    >where the VLAN lives. It thusly only cripples other devices with a
    >10M maximum synchronous bandwidth.
    >>

    >Thanks!
    >>

    >-FC
    >>

    >5/16/06, Hannes Gredler <hannes (AT) juniper (DOT) netwrote:
    >>
    >>frances,
    >>>

    >>looks like you have a forwarding loop in your setup;
    >>>

    >>for further troubleshooting attach a packet-sniffer to
    >>the subnet in question and spot for the source MAC-adress
    >>that is bouncing back your traffic.
    >>>

    >>/hannes
    >>>
    >>>

    >>Frances Albemuth wrote:
    >>Hi,
    >>>

    >>This is my first post to the list and I would like to preface

    >this by
    >>stating that I doubt this problem is actually related

    >specifically to
    >>Juniper equipment (perhaps a configuration error involving Juniper
    >>equipment, however). I'm hoping the issue I'm working on right now
    >>might ring bells in the heads of others, and in any case I figure

    >this
    >>is as good a place as any to find yourself beaten by the clue stick.
    >>>

    >>I have a directly connected interface facing a large, flat

    >Ethernet
    >>infrastructure. There are dozens of IP's mapped to the interface in
    >>question (this is a legacy aspect of the design, but migration to a
    >>more hierarchical infrastructure is a long process). Periodically,
    >>when packets are transmitted with an unreachable destination IP
    >>residing on the directly connected interface, a massive series of
    >>ICMP TTL exceeded packets is returned by a different host

    >residing on
    >>a different logical interface. Traceroutes to the unreachable IP
    >>similarly show a one-node loop (the same IP responds until the

    >TTL=0).
    >>The node is always the same, but if unmitigated ICMP traffic is
    >>permitted to and from addresses on the logical interface,

    >sniffing the
    >>wire shows this behavior occurring to and from a number of nodes. I
    >>haven't managed to duplicate the multi-node behavior in a
    >>semi-controlled environment.
    >>>

    >>When sniffing the segment in question, the ICMP is clearly

    >visible,
    >>so for whatever reason it is universally broadcast, even though both
    >>nodes involved in the ICMP communication are legitimate unicast
    >>destinations. If a ping is left running, these TTL exceeded

    >messages
    >>will continue an accelerate ad nauseum until a de facto pseudo
    >>broadcast storm occurs, crippling access on every switching node

    >where
    >>the VLAN in question is mapped. Usually (but not always) the
    >>anomalies halt when the ping is killed. The issue is largely
    >>mitigated by denying all ICMP to and from addresses mapped to the
    >>logical interface.
    >>>

    >>That's all I'm comfortable asserting about the issue at this time.
    >>What I'm really digging for here is an explanation as to why when

    >the
    >>Juniper tries to transmit to an unreachable node, it doesn't

    >discover
    >>the node is unreachable due to a lack of response from an ARP

    >request
    >>and return ICMP unreachables on it's own. I may have missed

    >something
    >>obvious here (I'm sort of hoping so) and would appreciate any
    >>suggestions or experience from others. If I've sent this message

    >to a
    >>woefully inappropriate list I would greatly appreciate a

    >suggestion as
    >>to a better place to bring my question(s).
    >>>

    >>Thanks,
    >>>

    >>-FC
    >>>

    >>
    >>juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net
    >>
    >>>

    >>

    >


    juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net
  • No.7 | | 16043 bytes | |

    Input is absolutely welcome, late or not. Thanks for offering your thoughts.

    I don't have administrative control over the "naughty" routers, but
    their behavior does seem to correlate with what little I know about
    proxy ARP. They don't specifically support a proxy ARP feature,
    however -- they are consumer grade router/firewall devices. I have
    one in a lab now, however, and I am able to reproduce the behavior, so
    I should be able to answer questions regarding how they function.
    Static mapping of MAC addresses is a possibility and would certainly
    mitigate unicast flooding, but the task of implementing this approach
    in all appropriate cases would be so arduous so as to be prohibitive.
    I've experimented extensively with the adjustment of CAM table
    timeouts to correlate with ARP expiration but I have only been
    partially successful thus far.

    -FC

    5/19/06, Harry Reynolds <harry (AT) juniper (DOT) netwrote:
    Butting in late and too lazy to completely digest this thread now.

    After a quick glance:

    I wonder if you have proxy arp enabled on the "naughty" routers, and if
    so, whether turning it off might help mitigate? You mention they send
    icmp to the target node, and I think proxy ARP would generate ARP so
    perhaps not

    Also, any chance of putting a manual/static mac entry in the switches
    that are flooding?

    Regards
    --
    Message
    From: juniper-nsp-bounces (AT) puck (DOT) nether.net
    [mailto:juniper-nsp-bounces (AT) puck (DOT) nether.net] Behalf
    Frances Albemuth
    Sent: Friday, May 19, 2006 9:23 AM
    To: Hannes Gredler
    Cc: juniper-nsp (AT) puck (DOT) nether.net
    Subject: Re: [j-nsp] Strange behavior on directly connected
    interfaces?

    Hi Hannes,

    Some of these questions are easier to answer than others.

    5/17/06, Hannes Gredler <hannes (AT) juniper (DOT) netwrote:
    frances,

    question 1: what is the MAC adress of the device that
    generates the 10MBit/s worth of traffic.
    --
    I'm not sure there's a single device responsible for this;
    it sort of looks as if there are at least two culprits. The
    two being examined at the moment are both consumer grade
    routing appliances. I've since determined more about what is
    occurring with these culprit devices:

    * These routers transmit bogus traffic to destinations
    located in the same subnet with spoofed source IPs but real
    source MACs.

    * Unicast is flooded because the ARP timeout exceeds the
    CAM table timeout. The CAM table never learns the MAC of the
    "target" device because that device is discarding all of this
    traffic and not generating any traffic of their own (at the
    time this occurs -- the behavior is not constant).

    * Some of the destination IP's generate no traffic during
    certain periods of the day.

    * The traffic the culprit devices transmit to other devices
    in the broadcast domain will never meet the requirements of a
    typical iptables or equivalent implementation so the traffic
    is quietly dropped.

    Net result? Bogus traffic is broadcast all over the place
    because the switching infrastructure never has a cause to
    learn the MAC(s) the culprit routers are trying to reach.
    The culprit routers don't ARP for it, they just remember the
    destination MAC, and the switches dutifully flood the unicast
    frames in hopes of identifying the legitimate destination MAC
    from a hypothetical return stream of traffic. This never
    happens, so these bursts of illegitimate traffic occur until
    someone generates traffic from behind a target device.
    Then the switching infrastructure learns the MAC and voila,
    the unicast traffic stops getting flooded all over the place.

    question 2: is your juniper router the only exit for your traffic
    --
    Indeed it is.

    question 3: could it be that there are hidden backdoor(s)
    --
    As in loop-ish cross-connections "behind" our infrastructure?
    Possible, but unlikely.

    question 4: what traffic is being looped / unicast / broadcast
    --
    What's known about that traffic is largely articulated in
    the answer to question 1, though if you've got more questions
    about that traffic specifically I can probably find more answers

    question 5: what is the destination MAC adress of the looped traffic
    (broadcast address / unicast address of the router)
    --
    Also covered largely in the answer to question 1, but to
    expand on this a bit, there are two distinct behaviors. I'll
    call one "weirdness" and the other "high weirdness". In the
    case of high weirdness, here's what happens to the best of my
    ability to tell:

    - Legitimate ICMP is transmitted from outside source and
    arrives at router.
    - Router figures packet should egress to directly connected
    network via specific logical interface (makes certain filter
    criterion are good, et al).
    - Router finds the destination address in the ARP table and
    fires off a frame into the "Ethernet cloud" with the
    destination MAC culled from the ARP table.
    - The switches haven't heard a frame from the device
    corresponding with the destination MAC for a while and have
    forgotten the destination MAC, so they flood the frames.
    - Naughty routers (two of them) hear the frames and get in
    on the action. They spoof the source IP of the router (!!)
    and transmit massive amounts of ICMP to the node which the
    router is also trying to transmit to.
    - None of this traffic warrants a response from the target
    node or the equipment behind it -- it's a firewall silently
    discarding unwanted traffic. So we still don't know how to
    get to this MAC without flooding.
    - Since these naughty routers are spoofing the IP of the
    real gateway but never ARP'ing for it, lots of routers are
    receiving flooded unicast frames which they believe they
    shouldn't be receiving and which they believe came from the
    real gateway. They send the gateway ICMP redirect host
    messages (redirecting it to itself).
    - For each ICMP echo that goes in, dozens of ICMP messages
    with different purposes come out.
    - Some of these packets are getting their TTL decremented
    (the only thing that slows the situation down) but others are
    not. Give it a good thirty seconds and you have a storm.
    if you stop the introduction of ICMP to the network,
    the TTL will decrement on enough of these packets to calm the
    situation.

    In the case of weirdness, we have a much less severe
    version of the situation outlined above, wherein lots of
    routers are getting frames that don't belong to them because
    of the ARP/CAM synchronization issue, but it doesn't get out
    of control because the two very naughty nodes don't get
    involved and the TTL decrements as it should.

    The other issue with the TTL exceeded messages coming back
    on a different logical interface is a little bit of a red
    herring - still interesting, but the situation above seems to
    be the elephant in the living room here.

    Let me know if you have thoughts, and thank you for your
    time and consideration.

    -FC

    /hannes

    Frances Albemuth wrote:
    The issue can thus far be mitigated (believe it or not) by
    filtering ICMP to and from the "mystery node", or by
    filtering ICMP
    to and from every network on interface "A". I'm in possession of
    the MAC of the "mystery node" and I know exactly where it
    lives on
    the network, but it doesn't seem to correspond oddly with
    anything
    and I haven't identified anything quirky about the network
    configuration. What else should I be keeping an eye out for?

    -FC

    5/17/06, Hannes Gredler <hannes (AT) juniper (DOT) netwrote:
    >
    >frances,
    >>

    >to mitigate the problem while diagnosing you could configure a
    >firewall that discards traffic from non-local-subnet sources.
    >>

    >but lets focus on the loop:
    >what is the mac-adress of the mystery node ?
    >>

    >/hannes
    >>

    >Frances Albemuth wrote:
    >Hi Hannes,
    >>

    >Thanks for your response. When I'm sniffing on the segment I
    >see a massive stream of ICMP TTL exceeded messages

    being returned
    >by the "mystery node". The topology is definitely

    loop-free and
    >the "loop-ish" behavior that we see only seems to

    occur when data
    >is transmitted to unreachable destinations.
    >>

    >I assume by forwarding loop you mean an Ethernet loop? I would
    >agree that it behaves this way in some respects.

    course, if I
    >had a genuine loop the problems would be more serious

    and would
    >occur regardless of routed traffic (the Ethernet

    topology with a
    >handful of hosts would cripple itself).
    >>

    >Also interesting: the node returning the TTL exceeded "storm"
    >lives behind a link with a maximum synchronous

    capacity of 10M. The "storm"
    >itself results in 10M of traffic pushing consistently over all
    >ports where the VLAN lives. It thusly only cripples other
    >devices with a 10M maximum synchronous bandwidth.
    >>

    >Thanks!
    >>

    >-FC
    >>

    >5/16/06, Hannes Gredler <hannes (AT) juniper (DOT) netwrote:
    >>
    >>frances,
    >>>

    >>looks like you have a forwarding loop in your setup;
    >>>

    >>for further troubleshooting attach a packet-sniffer to the
    >>subnet in question and spot for the source MAC-adress that is
    >>bouncing back your traffic.
    >>>

    >>/hannes
    >>>
    >>>

    >>Frances Albemuth wrote:
    >>Hi,
    >>>

    >>This is my first post to the list and I would like

    to preface
    >this by
    >>stating that I doubt this problem is actually related

    >specifically to
    >>Juniper equipment (perhaps a configuration error involving
    >>Juniper equipment, however). I'm hoping the issue

    I'm working
    >>on right now might ring bells in the heads of

    others, and in
    >>any case I figure

    >this
    >>is as good a place as any to find yourself beaten

    by the clue stick.
    >>>

    >>I have a directly connected interface facing a large, flat

    >Ethernet
    >>infrastructure. There are dozens of IP's mapped to the
    >>interface in question (this is a legacy aspect of

    the design,
    >>but migration to a more hierarchical infrastructure

    is a long
    >>process). Periodically, when packets are

    transmitted with an
    >>unreachable destination IP residing on the directly

    connected
    >>interface, a massive series of ICMP TTL exceeded

    packets is
    >>returned by a different host

    >residing on
    >>a different logical interface. Traceroutes to the

    unreachable
    >>IP similarly show a one-node loop (the same IP

    responds until
    >>the

    >TTL=0).
    >>The node is always the same, but if unmitigated

    ICMP traffic
    >>is permitted to and from addresses on the logical interface,

    >sniffing the
    >>wire shows this behavior occurring to and from a number of
    >>nodes. I haven't managed to duplicate the

    multi-node behavior
    >>in a semi-controlled environment.
    >>>

    >>When sniffing the segment in question, the ICMP is clearly

    >visible,
    >>so for whatever reason it is universally broadcast, even
    >>though both nodes involved in the ICMP communication are
    >>legitimate unicast destinations. If a ping is left

    running,
    >>these TTL exceeded

    >messages
    >>will continue an accelerate ad nauseum until a de

    facto pseudo
    >>broadcast storm occurs, crippling access on every switching
    >>node

    >where
    >>the VLAN in question is mapped. Usually (but not

    always) the
    >>anomalies halt when the ping is killed. The issue

    is largely
    >>mitigated by denying all ICMP to and from addresses

    mapped to
    >>the logical interface.
    >>>

    >>That's all I'm comfortable asserting about the

    issue at this time.
    >>What I'm really digging for here is an explanation

    as to why
    >>when

    >the
    >>Juniper tries to transmit to an unreachable node, it doesn't

    >discover
    >>the node is unreachable due to a lack of response

    from an ARP
    >request
    >>and return ICMP unreachables on it's own. I may have missed

    >something
    >>obvious here (I'm sort of hoping so) and would

    appreciate any
    >>suggestions or experience from others. If I've sent this
    >>message

    >to a
    >>woefully inappropriate list I would greatly appreciate a

    >suggestion as
    >>to a better place to bring my question(s).
    >>>

    >>Thanks,
    >>>

    >>-FC
    >>>

    >>
    >>juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net
    >>
    >>>

    >>

    >
    >


    juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net

    --

    juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net
  • No.8 | | 321 bytes | |

    A quick Juniper-specific question related to diagnostic efforts on this topic:

    Is there a means for viewing the age of an ARP via the CLI (and if
    so could someone point me in the right direction)?

    Thanks,

    -FC

    juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net
  • No.9 | | 766 bytes | |

    PGP SIGNED MESSAGE
    Hash: SHA1

    Frances Albemuth wrote:
    Is there a means for viewing the age of an ARP via the CLI (and if
    so could someone point me in the right direction)?

    This exists as a feature request on the JuniperClue wiki -
    - as far as
    I'm aware this enhancement has not been made.

    Regards,
    Rob
    - --
    Rob Shakir - <rob (AT) catalyst2 (DOT) net>
    Technical Manager - Catalyst2 Services Ltd.
    PGP Key ID: 0xC07E6DEB / RIPE: RJS-RIPE
    PGP SIGNATURE
    Version: GnuPG v1.4.2.2 (GNU/Linux)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

    wIDP3QKhxYF5rvSW6l2y748=
    =urrr
    PGP SIGNATURE

    juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net
  • No.10 | | 922 bytes | |

    Thanks for the quick answer! Alas, it was not the answer I was hoping for

    -FC

    5/20/06, Rob Shakir <rob (AT) catalyst2 (DOT) netwrote:
    PGP SIGNED MESSAGE
    Hash: SHA1

    Frances Albemuth wrote:
    Is there a means for viewing the age of an ARP via the CLI (and if
    so could someone point me in the right direction)?

    This exists as a feature request on the JuniperClue wiki -
    - as far as
    I'm aware this enhancement has not been made.

    Regards,
    Rob

    - --
    Rob Shakir - <rob (AT) catalyst2 (DOT) net>
    Technical Manager - Catalyst2 Services Ltd.
    PGP Key ID: 0xC07E6DEB / RIPE: RJS-RIPE
    PGP SIGNATURE
    Version: GnuPG v1.4.2.2 (GNU/Linux)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

    wIDP3QKhxYF5rvSW6l2y748=
    =urrr
    PGP SIGNATURE

    juniper-nsp mailing list juniper-nsp (AT) puck (DOT) nether.net

Re: Strange behavior on directly connected interfaces?


max 4000 letters.
Your nickname that display:
In order to stop the spam: 1 + 0 =
QUESTION ON "Networking"

EMSDN.COM