[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: [Xen-users] Millions of errors/collisions on physical interface in routed Xen-setup??



"Rustedt, Florian" <Florian.Rustedt@xxxxxxxxxxx> writes:

> > -----Urspr?ngliche Nachricht-----
> > Von: Luke S Crawford [mailto:lsc@xxxxxxxxx]
> > Gesendet: Mittwoch, 3. Juni 2009 10:17
> > An: Rustedt, Florian
> > Cc: Xen-users@xxxxxxxxxxxxxxxxxxx
> > Betreff: Re: [Xen-users] Millions of errors/collisions on
> > physical interface in routed Xen-setup??
> 
> > I'd look at netstat -i    my first guess would be that you've
> > got a hardware problem
> 
> Ok, will do a failover to the second XEN, statistically near impossibility, 
> that i've got the errors there, too, if it is hardware-based, right?

well, not if you have an "a+" data center tech wiring down your patch panel.  
Really, I shouldn't make fun of people for not being able to wire down a 
110 patch panel well;  it's not easy.  Oftentimes I choose a rats nest of
(carefully labeled on both ends) patch cables instead of facing the nightmare
of wiring up a 110 patch panel and finding someone to loan me a real 
($3000)  ethernet cable tester, then re-wiring all the connections that are
just a little bit off.

But my point is that yeah, it is possible to have two sets of bad hardware 
that are bad in the same way. I would just swap out the ethernet cable (use 
a pre-made brand-new cable, and skip any patch panels to be sure.)  

> > [...] 'dedicate a core to the dom0' advice.
> 
> Don't knew this? Why? I've got eight cores per dom0, so do i have to take one 
> seperate for dom0 handling? What would be the advantage?

If all the CPUs are busy when you try to send a packet, or when you
try to recieve a packet, that's trouble. on the order of 60ms lag, depending
on what the weights are. (I could be way off on that, but I seem to remember 
that guests got 60ms timeslices.  60ms is a long time. I could be way off
on that 60ms number, but each domU does get a timeslice, that isn't immediately
interupted when the dom0 gets a packet.  In fact, the next available timeslice
is handed out to the domain that has recently used the least CPU who currently 
wants it, and that might not be the Dom0.   fix that part by setting the dom0
weight really high.) 

If you dedicate a core to the dom0  (set cpus='1-7' or so in the domU config,
and in the xend-config.sxp set dom0-cpus 1)  then the dom0 can run and
push packets around at the same time as the DomUs are running.  
(oh, make sure you set the vcpus in the domU to 7 or less if you do this;  
running with more vcpus in a domain than that domain has physical CPUs is
seriously bad.)  

Me, I run a whole lot of DomUs per Dom0, so I give each DomU only 1 vcpu... 
7 of them can run at any one time, while the dom0 is busy pushing packets
and disk bits.  It helps a whole lot once you start trying to scale to more 
guests than you have physical CPUs.  The thing is, usually having a 
slightly slower system that doesn't get a lot slower is better than 
a system that is much faster during the best of times and much slower during
the worst.  only giving guests access to one cpu at a time slows down 
the top-end, sure, but being able to run 7 at once sure helps keep
the system responsive when several users decide to benchmark you at the
same time.  

At the very least you should weight your dom0 higher;  that's what I
did on the 4 core boxes I used to have (like the one in my example)
xm sched-credit -d 0 60000

I'd put that in my /etc/rc.local.   Without that,  you get too many heavy
DomU users, and disk and network grind to a halt.  With that, well, I still
saw the dropped packets you saw in the example, but it seemed to work OK
otherwise.  

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.