On Thursday, 05/17/2012 at 12:17 EDT, "Levy, Alan" <
ALevy@doit...>
wrote:
> We have a problem with a linux server that when we reboot it, it takes
at least
> 15 minutes to be able to ssh into.
>
> It seems, from out network traces, that when the server reboots, it has
2 mac
> addresses in the arp tables on the firewall (one from before and one
from
> after the reboot).
>
> We cannot ping or ssh into the server after the reboot. Yesterday, we
rebooted
> it. The network group ran traces, captured packets, etc and we found
that if we
> pinged the network gateway address from this server, everything started
working
> again.
>
> Has anyone seen this before ?
Switches (not hubs) have what is called a "filtering database". This
database contains (MAC,port,VLAN) tuples. It is populated by the switch
as it learns which end stations are plugged into which ports. This allows
switches to avoid sending unicast ethernet frames to ports whose end
stations don't care.
How does it learn? It looks at the SOURCE MAC addresses on inbound
frames. If a TARGET MAC address & VLAN is in its database, it will
forward the frame to the correct port. If not, it is supposed to (if not
administratively prevented from doing so) copy the frame to all ports. If
a MAC moves to a different port, the switch will forward to the wrong
port, until a frame comes into the new port. The frame that come in is
usually the gratuitous ARP broadcast response ("grat-ARP") that IP stacks
send when an interface is activated. It serves two purposes: (1) It
populates/updates each neighbor's ARP cache, and (2) it updates the
switch's filtering database. (Switches can be turned into expensive hubs
by turning off learning mode entirely.)
Database entries, like ARP caches, "age out" after a period of time. IEEE
say 5 minutes should be the default.
On a layer 3 (IP) VSWITCH, CP will issue the "grat ARPs" for each guest if
there is an OSA failover. On a layer 2 (ETHERNET) VSWITCH, CP will
remember the last grat-ARP the guest has sent for IPTIMEOUT minutes.
WARNING WARNING: The IPTIMEOUT needs to be at least as large as the
switch's forwarding entry "age out" setting. If the switch value is lower
than IPTIMEOUT, then an OSA failover event will result in frames being
sent to the bad OSA port since CP won't have any information to us on
grat-ARPs (Unless the port went dark, in which case all of the forwarding
entries related to that port should be purged.)
You can see this happening when PING won't work until you "ping out" from
the guest. The switch sees the MAC come in and updates its database. But
if there hasn't been a failover event with mismatched timeout/age-out
values, then I would look at the switch to see what's in its filtering
database. If that looks ok, then I would get VSWITCH data traces. If
that looks ok, then I suspect the physical switch of misbehavior.
Alan Altmark
Senior Managing z/VM and Linux Consultant
IBM System Lab Services and Training
ibm.com/systems/services/labservices
office: 607.429.3323
mobile; 607.321.7556
Alan_Altmark@us.i...
IBM Endicott
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to
LISTSERV@VM.M... with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/
opensubscriber is not affiliated with the authors of this message nor responsible for its content.