|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC/PATCH v2] XENMEM_claim_pages (subop of existing) hypercall
> Hi Jan --
>
> Thanks for the quick review!
>
> Mostly fixed for v3 but a couple things:
>
>>> + * Claimed memory is considered unavailable unless the request
>>> + * is made by a domain with sufficient unclaimed pages.
>>> + */
>>> + if ( (total_unclaimed_pages + request >
>>> + total_avail_pages + tmem_freeable_pages()) &&
>>> + (d == NULL || d->unclaimed_pages < request) )
>>> + goto not_found;
>>
>> The treatment of d being NULL certainly needs further thought:
>> Is it really better to fulfill the claim and fail some (perhaps
>> important) _xmalloc()?
>
> Ideally, allocation in the presence of existing claims should
> behave as if the claiming domains had actually already allocated
> the unclaimed-amount-of-memory. So I'd argue that enforcing
> the claim should be sacrosanct here.
Well, are we sure that failing an "anonymous" allocations is not going to
trigger a BUG_ON? That's a lot of code review. If you get this wrong, now Xen
suddenly crashes if allocating domains close to the max. It doesn't, today,
afaict.
>
>> Also, I'm missing a mechanism by which the tools could find out
>> how much unclaimed memory is available, in order to determine
>> (if in use) how much memory needs to be ballooned out of Dom0.
>
> OK. I'm not certain if this will be useful on a per-domain
> basis as well but, for completeness, I will also add
> unclaimed_pages into xc_dominfo etc (which causes a bump
> in XEN_DOMCTL_INTERFACE_VERSION).
>
>> Similarly, but perhaps of lower priority, there is no integration
>> with the low-mem handling.
>
> I'd also consider this lower priority as Olaf and Andre
> have argued that the claim mechanism is not needed for
> sharing/paging so the two mechanisms may not
> be used together, at least for the foreseeable future.
> So I plan to skip this, unless you change your mind and
> consider it a showstopper for acceptance.
This is a slippery slope. Let's not work out the interactions with existing
subsystems before adding code to the tree. What could go wrong?
As a data point for everyone, I've found the low men virq extremely useful as a
sync interrupt signaling to our toolstack that it needs to get its act together
and start rebalancing memory, if it hasn't yet. I don't see how that cannot be
useful to any other toolstack.
Andres
>
>> Finally, there still are a number of formatting issues.
>
> Hmmm... I found one I think. Is there an equivalent to
> checkpatch for hypervisor code? If you see any formatting
> issues in v3, please call them out explicitly as I am
> sincerely trying to avoid them.
>
> Thanks,
> Dan
>
>
>
> ------------------------------
>
> Message: 4
> Date: Thu, 15 Nov 2012 18:23:09 +0000
> From: Mats Petersson <mats.petersson@xxxxxxxxxx>
> To: Tim Deegan <tim@xxxxxxx>
> Cc: "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>
> Subject: Re: [Xen-devel] [PATCH V2] xen: vmx: Use an INT 2 call to
> process real NMI's instead of self_nmi() in VMEXIT handler
> Message-ID: <50A5330D.9020204@xxxxxxxxxx>
> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
>
> On 15/11/12 17:44, Tim Deegan wrote:
>> At 17:33 +0000 on 15 Nov (1353000782), Mats Petersson wrote:
>>> On 15/11/12 17:15, Tim Deegan wrote:
>>>> At 17:03 +0000 on 15 Nov (1352998993), Mats Petersson wrote:
>>>>>> On an AMD CPU we _don't_ have dedicated stacks for NMI or MCE when we're
>>>>>> running a HVM guest, so the stack issue doesn't apply (but nested NMIs
>>>>>> are still bad).
>>>>>>
>>>>>> On an Intel CPU, we _do_ use dedicated stacks for NMI and MCE in HVM
>>>>>> guests. We don't really have to but it saves time in the context switch
>>>>>> not to update the IDT. Using do_nmi() here means that the first NMI is
>>>>>> handled on the normal stack instead. It's also consistent with the way
>>>>>> we call do_machine_check() for the MCE case. But it needs an explicit
>>>>>> IRET after the call to do_nmi() to make sure that NMIs get re-enabled.
>>>>> Both AMD and Intel has an identical setup with regard to stacks and
>>>>> general "what happens when we taken one of these interrupts".
>>>> My reading of svm_ctxt_switch_{to,from} makes me disagree with this.
>>>> AFAICT, on SVM we're not using dedicated stacks at all.
>>> In SVM, the VMRUN returns to whatever stack you had before the VMRUN.
>>> This is not what I'm talking about, however. The stack used for the NMI
>>> and MCE comes from the interrupt descriptor entry for those respective
>>> vectors.
>> This is the code I was referring to:
>>
>> /*
>> * Cannot use ISTs for NMI/#MC/#DF while we are running with the guest
>> TR.
>> * But this doesn't matter: the IST is only req'd to handle
>> SYSCALL/SYSRET.
>> */
>> idt_tables[cpu][TRAP_double_fault].a &= ~(7UL << 32);
>> idt_tables[cpu][TRAP_nmi].a &= ~(7UL << 32);
>> idt_tables[cpu][TRAP_machine_check].a &= ~(7UL << 32);
>>
>> Am I misreading it?
>
> No, you are reading it perfectly right, I'm wrong...
>
> --
> Mats
>>
>>> So in conclusion, the do_mce_exception() call probably should be a
>>> __asm__ __volatile__("int $X"), where X is the relevant vector.
>> This handles MCEs that were raised in guest context. If we've managed
>> to get this far into the exit handler, the hypervisor stack is probably
>> OK. :)
>>
>> I'd be happy to invoke the MCE handler though the IDT here, just for
>> symmetry with the other cases, but I don't think it makes much
>> difference.
>>
>> Tim.
>>
>>
>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Thu, 15 Nov 2012 13:29:28 -0500
> From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> To: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
> Cc: "netdev@xxxxxxxxxxxxxxx" <netdev@xxxxxxxxxxxxxxx>, ANNIE LI
> <annie.li@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx"
> <xen-devel@xxxxxxxxxxxxxxxxxxx>, Roger Pau Monne
> <roger.pau@xxxxxxxxxx>
> Subject: Re: [Xen-devel] [PATCH 0/4] Implement persistent grant in
> xen-netfront/netback
> Message-ID: <20121115182928.GB22320@xxxxxxxxxxxxxxxxxxx>
> Content-Type: text/plain; charset=iso-8859-1
>
> On Thu, Nov 15, 2012 at 11:15:06AM +0000, Ian Campbell wrote:
>> On Thu, 2012-11-15 at 10:56 +0000, Roger Pau Monne wrote:
>>> On 15/11/12 09:38, ANNIE LI wrote:
>>>>
>>>>
>>>> On 2012-11-15 15:40, Pasi K?rkk?inen wrote:
>>>>> Hello,
>>>>>
>>>>> On Thu, Nov 15, 2012 at 03:03:07PM +0800, Annie Li wrote:
>>>>>> This patch implements persistent grants for xen-netfront/netback. This
>>>>>> mechanism maintains page pools in netback/netfront, these page pools is
>>>>>> used to
>>>>>> save grant pages which are mapped. This way improve performance which is
>>>>>> wasted
>>>>>> when doing grant operations.
>>>>>>
>>>>>> Current netback/netfront does map/unmap grant operations frequently when
>>>>>> transmitting/receiving packets, and grant operations costs much cpu
>>>>>> clock. In
>>>>>> this patch, netfront/netback maps grant pages when needed and then saves
>>>>>> them
>>>>>> into a page pool for future use. All these pages will be unmapped when
>>>>>> removing/releasing the net device.
>>>>>>
>>>>> Do you have performance numbers available already? with/without
>>>>> persistent grants?
>>>> I have some simple netperf/netserver test result with/without persistent
>>>> grants,
>>>>
>>>> Following is result of with persistent grant patch,
>>>>
>>>> Guests, Sum, Avg, Min, Max
>>>> 1, 15106.4, 15106.4, 15106.36, 15106.36
>>>> 2, 13052.7, 6526.34, 6261.81, 6790.86
>>>> 3, 12675.1, 6337.53, 6220.24, 6454.83
>>>> 4, 13194, 6596.98, 6274.70, 6919.25
>>>>
>>>>
>>>> Following are result of without persistent patch
>>>>
>>>> Guests, Sum, Avg, Min, Max
>>>> 1, 10864.1, 10864.1, 10864.10, 10864.10
>>>> 2, 10898.5, 5449.24, 4862.08, 6036.40
>>>> 3, 10734.5, 5367.26, 5261.43, 5473.08
>>>> 4, 10924, 5461.99, 5314.84, 5609.14
>>>
>>> In the block case, performance improvement is seen when using a large
>>> number of guests, could you perform the same benchmark increasing the
>>> number of guests to 15?
>>
>> It would also be nice to see some analysis of the numbers which justify
>> why this change is a good one without every reviewer having to evaluate
>> the raw data themselves. In fact this should really be part of the
>> commit message.
>
> You mean like a nice graph, eh?
>
> I will run these patches on my 32GB box and see if I can give you
> a nice PDF/jpg.
>
>>
>> Ian.
>>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Thu, 15 Nov 2012 18:29:13 +0000
> From: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
> To: Michael Palmeter <michael.palmeter@xxxxxxxxxx>
> Cc: Dario Faggioli <raistlin@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxx"
> <xen-devel@xxxxxxxxxxxxx>
> Subject: Re: [Xen-devel] Xen credit scheduler question
> Message-ID: <50A53479.5050901@xxxxxxxxxxxxx>
> Content-Type: text/plain; charset="windows-1252"; Format="flowed"
>
> On 15/11/12 15:43, Michael Palmeter wrote:
>>
>> Hi all (and Mr. Dunlap in particular),
>>
>
> Haha -- please don't call me "Mr"; I prefer "George", but if you want a
> title, use "Dr" (since I have PhD). :-)
>
>> Example scenario:
>>
>> * Server hardware: 2 sockets, 8-cores per socket, 2 hardware threads
>> per core (total of 32 hardware threads)
>> * Test VM: a single virtual machine with a single vCPU, weight=256
>> and cap=100%
>>
>> In this scenario, from what I understand, I should be able to load the
>> Test VM with traffic to a maximum of approximately 1/32 of the
>> aggregate compute capacity of the server. The total CPU utilization
>> of the server hardware should be approximately 3.4%, plus the overhead
>> of dom0 (say 1-2). The credits available to any vCPU capped at 100%
>> should be equal to 1/32 of the aggregate compute available for the
>> whole server, correct?
>>
>
> I think to really be precise, you should say, "1/32nd of the logical cpu
> time available", where "logical cpu time" simply means, "time processing
> on one logical CPU". At the moment, that is all that either the credit1
> or credit2 schedulers look at.
>
> As I'm sure you're aware, not all "logical cpu time" is equal. If one
> thread of a hyperthread pair is running but the other idle, it will get
> significantly higher performance than if the other thread is busy. How
> much is highly unpredictable, and depends very much on exactly what
> units are shared with the other hyperthread, and the workload running on
> each unit. But even when both threads are busy, it should (in theory)
> be rare for both threads to get a throughput of 50%; the whole idea of
> HT is that threads typically get 70-80% of the full performance of the
> core (so the overall throughput is increased).
>
> But if course, while this is particularly extreme in the case of
> hyperthreads, it's also true on a smaller scale even without that --
> cores share caches, NUMA nodes share memory bandwidth, and so on. No
> attempt is made to compensate VMs for cache misses or extra memory
> latency due to sharing either. :-)
>
>> Put simply, is there a way to constrain a VM with 1 vCPU to consume no
>> more than 0.5 of a physical core (hyper-threaded) on the server
>> hardware mentioned below? Does the cap help in that respect?
>>
>
> You can use "cap" to make the VM in question get 50% of logical vcpu
> time, which on an idle system will give it 0.5 of the capacity of a
> physical core (if we don't consider Intel's Turbo Boost technology).
> But if the system becomes busy, it will get less than 0.5 of the
> processing capacity of a physical core.
>
>> I have been struggling to understand how the scheduler can deal with
>> the uncertainty that hyperthreading introduces, however. I know this
>> is an issue that you are tackling in the credit2 scheduler, but I
>> would like to know what your thoughts are on this problem (if you are
>> able to share). Any insight or assistance you could offer would be
>> greatly appreciated.
>>
>
> At the moment it does not attempt to; the only thing it does is try not
> to schedule two hyperthreads that share a core if there is an idle
> core. But if there are more active vcpus than cores, then some will
> share; and the ones that share a core with another vcpu will be charged
> the same as the ones that have the core all to themselves.
>
> Could you explain why you your question is important to you -- i.e,.
> what are you trying to accomplish? It sounds a bit like you're more
> concerned with accuracy in reporting, and control of resources, rather
> than fairness, for instance.
>
> -George
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <http://lists.xen.org/archives/html/xen-devel/attachments/20121115/b2ff8583/attachment.html>
>
> ------------------------------
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel
>
>
> End of Xen-devel Digest, Vol 93, Issue 157
> ******************************************
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |