[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Hackathon minutes] PV frontends/backends and NUMA machines



[remembering to cc the list this time]

On Tue, May 21, 2013 at 10:24 AM, Wei Liu <wei.liu2@xxxxxxxxxx> wrote:
> On Tue, May 21, 2013 at 10:44:02AM +0200, Roger Pau Monné wrote:
> [...]
>> > 4. Teach the toolstack to pin the netback threads to dom0 vcpus
>> > running on the correct node (s)
>> >
>> > Dario will do #1.  I volunteered to take a stab at #2 and #3.  #4 we
>> > should be able to do independently of 2 and 3 -- it should give a
>> > slight performance improvement due to cache proximity even if dom0
>> > memory is striped across the nodes.
>> >
>> > Does someone want to volunteer to take a look at #4?  I suspect that
>> > the core technical implementation will be simple, but getting a stable
>> > design that everyone is happy with for the future will take a
>> > significant number of iterations.  Learn from my fail w/ USB hot-plug
>> > in 4.3, and start the design process early. :-)
>>
>> #4 is easy to implement from my POV in blkback, you just need to write a
>> node in the xenstore backend directory that tells blkback to pin the
>> created kthread to a specific NUMA node, and make sure that the memory
>> used for that blkback instance is allocated from inside the kthread. My
>> indirect descriptors series already removes any shared structures
>> between different blkback instances, so some part of this work is
>> already done. And I guess that something similar could be implemented
>> for QEMU/Qdisk from the toolstack level (pin the qemu process to a
>> specific NUMA node).
>>
>> I'm already quite familiar with the blkback code, so I can take care of
>> #4 for blkback.
>>
>
> So the core thing in netback is almost ready, I trust Linux scheduler
> now and don't pin kthread at all but relevant code shuold be easy to
> add. I just checked my code, all memory allocation is already node
> awared.
>
> As for the toolstack part, I'm not sure writing the initial node to
> xenstore will be sufficient. Do we do inter-node migration? If so
> frontend / backend should also update xenstore information as it
> migrates?

We can of course migrate the vcpus, but migrating the actual memory
from one node to another is pretty tricky, particularly for PV guests.
 It won't be something that happens very often; when it does, we will
need to sort out migrating the backend threads.

> IIRC the memory of a guest is striped through nodes, if it is this case,
> how can pinning benefit? (I might be talking crap as I don't know much
> about NUMA and its current status in Xen)

It's striped across nodes *of its NUMA affinity*.  So if you have a
4-node box, and you set its NUMA affinity to node 3, then the
allocator will try to get all of the memory from node 3.  If its
affinity is set to {2,3}, then the allocator will stripe it across
nodes 2 and 3.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.