[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] using ipoib with xcp



On Fri, Apr 16, 2010 at 01:14:46PM +0200, Trygve Sanne Hardersen wrote:
>    I've looked a bit into the Open vSwtich source code and it seems to me
>    like MAC addresses can only be 6 bytes, but the IB addresses are 20 bytes.
>    I'm also seeing this in the Open vSwitch log:
>    |00043|bridge|INFO|created port ib0 on bridge brib0
>    |00044|dpif|WARN|dp0: failed to add ib0 as port: Invalid argument
>    |00045|bridge|ERR|failed to add ib0 interface to dp0: Invalid argument
>    |00046|bridge|ERR|ib0 interface not in dp0, dropping
>    |00047|bridge|ERR|ib0 port has no interfaces, dropping
>    I've tried to report this on the Open vSwitch discuss list, but my
>    messages do not seem to get through.

Did you subscribe to the list? 

-- Pasi

>    Thanks!
>    Trygve
>    On Wed, Apr 14, 2010 at 6:23 PM, Trygve Sanne Hardersen
>    <[1]trygve@xxxxxxxxxxxxx> wrote:
> 
>      I've finally got to spend some time looking further into this.
>      I now believe the underlaying problem is that Open vSwitch is unable to
>      connect the brib0 bridge interface to the ib0 physical interface. I
>      suspect the cause of this to be the long MAC address of the Infiniband
>      NICs, but so far I have not found a workaround for the issue.
>      These are the relevant devices for my setup:
>      [root@hypoxcp1 ~]# ifconfig
>      brib0     Link encap:Ethernet  HWaddr 80:00:00:48:FE:80
>                inet addr:10.1.2.2  Bcast:10.1.2.255  Mask:255.255.255.0
>                inet6 addr: fe80::8200:ff:fe48:fe80/64 Scope:Link
>                UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>                RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>                TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
>                collisions:0 txqueuelen:0
>                RX bytes:0 (0.0 b)  TX bytes:720 (720.0 b)
>      eth0      Link encap:Ethernet  HWaddr 00:30:48:CC:5C:A4
>                inet6 addr: fe80::230:48ff:fecc:5ca4/64 Scope:Link
>                UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>                RX packets:31755 errors:0 dropped:0 overruns:0 frame:0
>                TX packets:10544 errors:0 dropped:0 overruns:0 carrier:0
>                collisions:0 txqueuelen:1000
>                RX bytes:4284224 (4.0 MiB)  TX bytes:1433336 (1.3 MiB)
>      ib0       Link encap:InfiniBand  HWaddr
>      80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
>                inet addr:10.1.2.102  Bcast:10.1.2.255  Mask:255.255.255.0
>                UP BROADCAST MULTICAST  MTU:2044  Metric:1
>                RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>                TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>                collisions:0 txqueuelen:128
>                RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>      xenbr0    Link encap:Ethernet  HWaddr 00:30:48:CC:5C:A4
>                inet addr:10.1.1.2  Bcast:10.1.1.255  Mask:255.255.255.0
>                inet6 addr: fe80::230:48ff:fecc:5ca4/64 Scope:Link
>                UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>                RX packets:25892 errors:0 dropped:0 overruns:0 frame:0
>                TX packets:10538 errors:0 dropped:0 overruns:0 carrier:0
>                collisions:0 txqueuelen:0
>                RX bytes:3586527 (3.4 MiB)  TX bytes:1432868 (1.3 MiB)
>      The ifconfig command reports the wrong (or truncated) MAC address for
>      the ib0 device. The real address can be found using other commands:
>      [root@hypoxcp1 ~]# cat /sys/class/net/ib0/address
>      80:00:00:48:fe:80:00:00:00:00:00:00:00:30:48:ff:ff:cc:0b:25
>      [root@hypoxcp1 ~]# ip link show ib0
>      4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast qlen
>      128
>          link/infiniband
>      80:00:00:48:fe:80:00:00:00:00:00:00:00:30:48:ff:ff:cc:0b:25 brd
>      00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>      As mentioned earlier in this thread I've had issues with duplicate MAC
>      addresses in /etc/ovs-vswitchd.conf, but a clean install somehow fixed
>      that issue, so the proper MAC address is now added to the file:
>      [root@hypoxcp1 ~]# cat /etc/ovs-vswitchd.conf
>      
> bridge.brib0.mac=80:00:00:48:fe:80:00:00:00:00:00:00:00:30:48:ff:ff:cc:0b:25
>      bridge.brib0.port=brib0
>      bridge.brib0.port=ib0
>      bridge.brib0.port=vif1.2
>      bridge.brib0.xs-network-uuids=6455dd7f-4a61-43b8-a49d-656f749c4ac6
>      bridge.xenbr0.mac=00:30:48:cc:5c:a4
>      bridge.xenbr0.port=eth0
>      bridge.xenbr0.port=vif1.1
>      bridge.xenbr0.port=xenbr0
>      bridge.xenbr0.xs-network-uuids=528d85a4-f582-c181-54eb-acf09ac7dcf4
>      bridge.xenbr1.mac=00:30:48:cc:5c:a5
>      bridge.xenbr1.port=eth1
>      bridge.xenbr1.port=vif1.0
>      bridge.xenbr1.port=xenbr1
>      bridge.xenbr1.xs-network-uuids=4f033ff5-5a56-629c-1c27-0765ba7c03bb
>      I'm no expert on XCP and Open vSwitch, but I believe it works something
>      like this:
> 
>       1. XAPI writes /etc/ovs-vswitchd.conf based on the XCP DB
>       2. XAPI starts up Open vSwitch
>       3. Open vSwitch creates the interfaces defined
>          in /etc/ovs-vswitchd.conf
> 
>      To me it seems like the MAC address for the brib0 interface
>      is truncated, and I believe this causes Open vSwitch to not bind brib0
>      and ib0 together:
>      [root@hypoxcp1 ~]# ovs-ofctl show brib0
>      Apr 14 15:38:31|00001|ofctl|INFO|connecting to unix:/var/run/brib0.mgmt
>      features_reply (xid=0x6bb27f3f): ver:0x97, dpid:32f493d6e290
>      n_tables:2, n_buffers:256
>      features: capabilities:0x17, actions:0x3ff
>       LOCAL(brib0): addr:80:00:00:48:fe:80, config: 0, state:0
>      Apr 14 15:38:31|00002|ofctl|INFO|connecting to unix:/var/run/brib0.mgmt
>      get_config_reply (xid=0x9b99aaf1): miss_send_len=0
>      [root@hypoxcp1 ~]# ovs-ofctl show xenbr0
>      Apr 14 15:38:19|00001|ofctl|INFO|connecting to unix:/var/run/xenbr0.mgmt
>      features_reply (xid=0x836b0867): ver:0x97, dpid:f68bde598f51
>      n_tables:2, n_buffers:256
>      features: capabilities:0x17, actions:0x3ff
>       1(eth0): addr:00:30:48:cc:5c:a4, config: 0, state:0
>           current:    1GB-FD COPPER AUTO_NEG
>           advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER
>      AUTO_NEG
>           supported:  10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER
>      AUTO_NEG
>       LOCAL(xenbr0): addr:00:30:48:cc:5c:a4, config: 0, state:0
>      Apr 14 15:38:19|00002|ofctl|INFO|connecting to unix:/var/run/xenbr0.mgmt
>      get_config_reply (xid=0x2b665ea): miss_send_len=0
>      As you see the binding to ib0 is missing, and the MAC of brib0 is
>      different from that in /etc/ovs-vswitchd.conf.
>      As previously stated I can communicate between XCP hosts on both brib0
>      and ib0 using this setup. The problem is that VIFs on the brib0 network
>      are not reachable. I have the following IB interfaces on a single host:
>      ib0 - [2]10.1.2.2/24
>      brib0 - [3]10.1.2.102/24
>      vif1.2 - [4]10.1.2.202/24
>      From within the VM that uses vif1.3 I try to ping brib0 and ib0 and
>      watch the traffic on the XCP host:
>      [root@hypoxcp1 ~]# tcpdump -i vif1.2
>      tcpdump: WARNING: vif1.2: no IPv4 address assigned
>      tcpdump: verbose output suppressed, use -v or -vv for full protocol
>      decode
>      listening on vif1.2, link-type EN10MB (Ethernet), capture size 96 bytes
>      15:54:59.948660 arp who-has 10.1.2.102 tell 10.1.2.202
>      15:55:00.948643 arp who-has 10.1.2.102 tell 10.1.2.202
>      15:55:01.948645 arp who-has 10.1.2.102 tell 10.1.2.202
>      [root@hypoxcp1 ~]# tcpdump -i brib0
>      tcpdump: verbose output suppressed, use -v or -vv for full protocol
>      decode
>      listening on brib0, link-type EN10MB (Ethernet), capture size 96 bytes
>      15:54:22.612723 arp who-has 10.1.2.102 tell 10.1.2.202
>      15:54:23.612643 arp who-has 10.1.2.102 tell 10.1.2.202
>      15:54:24.612642 arp who-has 10.1.2.102 tell 10.1.2.202
>      [root@hypoxcp1 ~]# tcpdump -i ib0
>      tcpdump: WARNING: arptype 32 not supported by libpcap - falling back to
>      cooked socket
>      tcpdump: verbose output suppressed, use -v or -vv for full protocol
>      decode
>      listening on ib0, link-type LINUX_SLL (Linux cooked), capture size 96
>      bytes
>      The packets never reach ib0.
>      This setup adds the follow IP routes to the XCP host:
>      [root@hypoxcp1 ~]# route -n
>      Kernel IP routing table
>      Destination     Gateway       Genmask         Flags  Metric  Ref     Use
>      Iface
>      10.1.1.0          0.0.0.0         255.255.255.0   U       0          0
>           0 xenbr0
>      10.1.2.0          0.0.0.0         255.255.255.0   U       0          0
>           0 ib0
>      10.1.2.0          0.0.0.0         255.255.255.0   U       0          0
>           0 brib0
>      169.254.0.0     0.0.0.0         255.255.0.0      U       0          0
>           0 brib0
>      0.0.0.0           10.1.1.1        0.0.0.0             UG     0
>       0        0 xenbr0
>      If I remove the ib0 route I can talk to brib0 and ib0 from vif1.3, but
>      only on the same physical machine. Inter-host and inter-vm over network
>      communication breaks without that route.
>      I also tried using "bridge" networking instead of "vswitch", but the
>      system behaves the same way AFAICT, though the configuration is of
>      course different.
>      I'm not sure what to try next. I could use the IB network for the
>      management interface and not run any VMs on it, but please let me know
>      if you have any idea what's wrong.
>      Thanks!
>      Trygve
>      On Thu, Apr 8, 2010 at 11:40 AM, Trygve Sanne Hardersen
>      <[5]trygve@xxxxxxxxxxxxx> wrote:
> 
>        Hi
>        Yes, I believe the packets are lost between brib0 and ib0, so they are
>        never sent across the network but it works on a single host.
>        I'll do some more testing and let you know what I find.
>        Thanks!
>        Trygve
> 
>        On Thu, Apr 8, 2010 at 10:40 AM, Dave Scott
>        <[6]Dave.Scott@xxxxxxxxxxxxx> wrote:
> 
>          Hi,
> 
> 
> 
>          Is it true that you have managed to get VM <-> Host connectivity
>          working but not VM <-> VM (across host) connectivity working?
> 
> 
> 
>          If so then it would be interesting to use something like tcpdump to
>          find out where the packets are going missing. If they*re entering
>          the vswitch and then getting lost then it would be worth talking
>          about this on the openvswitch mailing list.
> 
> 
> 
>          Another possibility is to revert to non-vswitch based networking in
>          dom0: try writing *bridge* to /etc/xensource/network.conf and
>          rebooting.
> 
> 
> 
>          Cheers,
> 
>          Dave
> 
> 
> 
>          From: [7]xen-users-bounces@xxxxxxxxxxxxxxxxxxx
>          [mailto:[8]xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
>          Trygve Sanne Hardersen
>          Sent: 07 April 2010 23:02
>          To: Xen
>          Subject: [Xen-users] using ipoib with xcp
> 
> 
> 
>          Hello,
> 
> 
> 
>          I have been playing with the XCP for a while now, and must say I'm
>          very exited about the technology. I had no prior experience with Xen
>          so it has taken me a while to understand the concepts, but now I
>          feel most important issues are solved and I've purchased some
>          hardware to build my (tiny) cloud on.
> 
> 
> 
>          The box is a Supermicro 1026TT-IBXF, so I have 2 x Ethernet and 1 x
>          Infiniband (IB) NICs per node. I want to use the IB NIC to provide
>          fast connectivity between the domUs, while the Ethernet NICs will be
>          used for the XCP management interface and ISP connectivity.
> 
> 
> 
>          I have successfully built OFED 1.5.1 in the XCP DDK VM and
>          installed OFED in the XPC 0.1.1 dom0. From there I can bring up the
>          IB network, but I'm having problems getting this to work properly
>          within XCP virtual machines. This is what happens:
> 
> 
> 
>          Starting out I have 2 nodes in a pool; both are clean with only lo,
>          eth0/xenbr0 and eth1/xenbr1 configured. I run the following commands
>          to add the IB NICs to the pool:
> 
> 
> 
>          xe pif-scan host-uuid=NODE1
> 
>          xe pif-plug uuid=NODE1_IB0
> 
>          xe pif-scan host-uuid=NODE2
> 
>          xe pif-plug uuid=NODE2_IB0
> 
> 
> 
>          As expected this adds ib0/brib0 on both nodes and a single pool-wide
>          network, but there is no connectivity between the hosts after I give
>          brib0 an IP:
> 
> 
> 
>          xe pif-reconfigure-ip uuid=NODE1_IB0 IP=10.1.2.2
>          netmask=255.255.255.0 mode=static
> 
>          xe pif-reconfigure-ip uuid=NODE2_IB0 IP=10.1.2.3
>          netmask=255.255.255.0 mode=static
> 
>          ping 10.1.2.2 --> reply
> 
>          ping 10.1.2.3 --> destination host unavailable
> 
> 
> 
>          However if I also give ib0 an IP and use this as gateway for brib0,
>          connectivity is achieved:
> 
> 
> 
>          ifconfig ib0 10.1.2.22 netmask 255.255.255.0
> 
>          xe pif-reconfigure-ip uuid= NODE1_IB0 IP=10.1.2.2
>          netmask=255.255.255.0 gateway=10.1.2.22 mode=static
> 
>          ifconfig ib0 10.1.2.33 netmask 255.255.255.0
> 
>          xe pif-reconfigure-ip uuid= NODE2_IB0 IP=10.1.2.3
>          netmask=255.255.255.0 gateway=10.1.2.33 mode=static
> 
>          ping 10.1.2.2 --> reply
> 
>          ping 10.1.2.3 --> reply
> 
>          ping 10.1.2.22 --> reply
> 
>          ping 10.1.2.33 --> reply
> 
> 
> 
>          This is very well, but when I add a VIF on the IB network to a VM it
>          is not able to communicate through it:
> 
> 
> 
>          xe vif-create device=2 mac=random network-uuid=IB_NET
>          vm-uuid=NODE1_IBVM
> 
>          ifconfig eth2 10.1.2.122 netmask 255.255.255.0
> 
>          xe vif-create device=2 mac=random network-uuid=IB_NET
>          vm-uuid=NODE2_IBVM
> 
>          ifconfig eth2 10.1.2.133 netmask 255.255.255.0
> 
>          ping 10.1.2.122 --> reply
> 
>          ping 10.1.2.22 --> destination host unavailable
> 
>          ping 10.1.2.2 --> destination host unavailable
> 
>          ping 10.1.2.133 --> destination host unavailable
> 
>          ping 10.1.2.33 --> destination host unavailable
> 
>          ping 10.1.2.3 --> destination host unavailable
> 
> 
> 
>          I believe that the problem lies somewhere in the routing table
>          configuration. This setup gives the following routing table:
> 
> 
> 
>          10.1.2.0        0.0.0.0         255.255.255.0   U     0      0
>           0 ib0
> 
>          10.1.2.0        0.0.0.0         255.255.255.0   U     0      0
>           0 brib0
> 
> 
> 
>          If I delete and then add the brib0 route, the route order is
>          changed:
> 
> 
> 
>          10.1.2.0        0.0.0.0         255.255.255.0   U     0      0
>           0 brib0
> 
>          10.1.2.0        0.0.0.0         255.255.255.0   U     0      0
>           0 ib0
> 
> 
> 
>          Using this the VM can talk to the host (and visa versa), but hot
>          across the network. Connectivity between ib0/brib0 over the network
>          is also broken.
> 
> 
> 
>          I've also noticed that the same MAC is added to
>          /etc/ovs-vswitchd.conf multiple times for brib0:
> 
> 
> 
>          
> bridge.brib0.mac=80:00:00:48:fe:80:00:00:00:00:00:00:00:30:48:ff:ff:cc:0b:25
> 
>          
> bridge.brib0.mac=80:00:00:48:fe:80:00:00:00:00:00:00:00:30:48:ff:ff:cc:0b:25
> 
>          
> bridge.brib0.mac=80:00:00:48:fe:80:00:00:00:00:00:00:00:30:48:ff:ff:cc:0b:25
> 
> 
> 
>          I've tried removing some of these but that does not seem to have any
>          effect. My experience with IP routing and especially vswitch is
>          limited and I'm not sure what to try from here. I've tried various
>          configurations but no luck so far.
> 
> 
> 
>          Note that I'm testing with 2 XCP nodes configured in a pool. I've
>          also checked that the PIFs are in the same order on both nodes (the
>          reference mentions this). The MTU (1500) of brib0 differs from that
>          of ib0 (2044), but changing this does not solve the problem.
> 
> 
> 
>          Any help is much appreciated. Thanks!
> 
> 
> 
>          Trygve
> 
>          --
>          HypoBytes Ltd.
>          Trygve Sanne Hardersen
>          Akersveien 24F
>          0177 Oslo
>          Norway
> 
>          [9]hypobytes.com
>          +47 40 55 30 25
> 
>        --
>        HypoBytes Ltd.
>        Trygve Sanne Hardersen
>        Akersveien 24F
>        0177 Oslo
>        Norway
> 
>        [10]hypobytes.com
>        +47 40 55 30 25
> 
>      --
>      HypoBytes Ltd.
>      Trygve Sanne Hardersen
>      Akersveien 24F
>      0177 Oslo
>      Norway
> 
>      [11]hypobytes.com
>      +47 40 55 30 25
> 
>    --
>    HypoBytes Ltd.
>    Trygve Sanne Hardersen
>    Akersveien 24F
>    0177 Oslo
>    Norway
> 
>    [12]hypobytes.com
>    +47 40 55 30 25
> 
> References
> 
>    Visible links
>    1. mailto:trygve@xxxxxxxxxxxxx
>    2. http://10.1.2.2/24
>    3. http://10.1.2.102/24
>    4. http://10.1.2.202/24
>    5. mailto:trygve@xxxxxxxxxxxxx
>    6. mailto:Dave.Scott@xxxxxxxxxxxxx
>    7. mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx
>    8. mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx
>    9. http://hypobytes.com/
>   10. http://hypobytes.com/
>   11. http://hypobytes.com/
>   12. http://hypobytes.com/

> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.