[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.



On 23/06/16 08:37, Yu Zhang wrote:
> On 6/22/2016 7:33 PM, George Dunlap wrote:
>> On 22/06/16 11:07, Yu Zhang wrote:
>>>
>>> On 6/22/2016 5:47 PM, George Dunlap wrote:
>>>> On 22/06/16 10:29, Jan Beulich wrote:
>>>>>>>> On 22.06.16 at 11:16, <george.dunlap@xxxxxxxxxx> wrote:
>>>>>> On 22/06/16 07:39, Jan Beulich wrote:
>>>>>>>>>> On 21.06.16 at 16:38, <george.dunlap@xxxxxxxxxx> wrote:
>>>>>>>> On 21/06/16 10:47, Jan Beulich wrote:
>>>>>>>>>>>>> And then - didn't we mean to disable that part of XenGT during
>>>>>>>>>>>>> migration, i.e. temporarily accept the higher performance
>>>>>>>>>>>>> overhead without the p2m_ioreq_server entries? In which case
>>>>>>>>>>>>> flipping everything back to p2m_ram_rw after (completed or
>>>>>>>>>>>>> canceled) migration would be exactly what we want. The (new
>>>>>>>>>>>>> or previous) ioreq server should attach only afterwards, and
>>>>>>>>>>>>> can then freely re-establish any p2m_ioreq_server entries it
>>>>>>>>>>>>> deems necessary.
>>>>>>>>>>>>>
>>>>>>>>>>>> Well, I agree this part of XenGT should be disabled during
>>>>>>>>>>>> migration.
>>>>>>>>>>>> But in such
>>>>>>>>>>>> case I think it's device model's job to trigger the p2m type
>>>>>>>>>>>> flipping(i.e. by calling
>>>>>>>>>>>> HVMOP_set_mem_type).
>>>>>>>>>>> I agree - this would seem to be the simpler model here, despite
>>>>>>>>>>> (as
>>>>>>>>>>> George validly says) the more consistent model would be for the
>>>>>>>>>>> hypervisor to do the cleanup. Such cleanup would imo be
>>>>>>>>>>> reasonable
>>>>>>>>>>> only if there was an easy way for the hypervisor to enumerate
>>>>>>>>>>> all
>>>>>>>>>>> p2m_ioreq_server pages.
>>>>>>>>>> Well, for me, the "easy way" means we should avoid traversing
>>>>>>>>>> the whole ept
>>>>>>>>>> paging structure all at once, right?
>>>>>>>>> Yes.
>>>>>>>> Does calling p2m_change_entry_type_global() not satisfy this
>>>>>>>> requirement?
>>>>>>> Not really - that addresses the "low overhead" aspect, but not the
>>>>>>> "enumerate all such entries" one.
>>>>>> I'm sorry, I think I'm missing something here.  What do we need the
>>>>>> enumeration for?
>>>>> We'd need that if we were to do the cleanup in the hypervisor (as
>>>>> we can't rely on all p2m entry re-calculation to have happened by
>>>>> the time a new ioreq server registers for the type).
>>>> So you're afraid of this sequence of events?
>>>> 1) Server A de-registered, triggering a ioreq_server -> ram_rw type
>>>> change
>>>> 2) gfn N is marked as misconfigured
>>>> 3) Server B registers and marks gfn N as ioreq_server
>>>> 4) When N is accessed, the misconfiguration is resolved incorrectly to
>>>> ram_rw
>>>>
>>>> But that can't happen, because misconfigured entries are resolved
>>>> before
>>>> setting a p2m entry; so at step 3, gfn N will be first set to
>>>> (non-misconfigured) ram_rw, then changed to (non-misconfigured)
>>>> ioreq_server.
>>>>
>>>> Or is there another sequence of events that I'm missing?
>>> Thanks for your reply, George. :)
>>> If no log dirty is triggered during this process, your sequence is
>>> correct.
>>> However, if log dirty is triggered, we'll met problems. I have described
>>> this
>>> in previous mails :
>>>
>>> http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02426.html
>>>
>>> on Jun 20
>>>
>>> and
>>>
>>> http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02575.html
>>>
>>> on Jun 21
>> Right -- sorry, now I see the issue:
>>
>> 1. Server A marks gfn X as ioreq_server
>> 2. Server A deregisters, gfn X misconfigured
>> 3. Server B registers, marks gfn Y as ioreq_server
>> 4. Logdirty mode enabled; gfn Y misconfigured
>> 5. When X or Y are accessed, resolve_misconfigure() has no way of
>> telling whether the entry is from server A (which should be set to
>> logdirty) or from server B (which should be left as ioreq_server).
> 
> Exactly.  :)
> Another simpler scenario would be
> 1. Server A marks gfn X as p2m_ioreq_server;
> 2. Logdirty mode enabled; gfn X misconfigured;
> 3. When X is written, it will not cause ept vioalation, but ept
> misconfig, and the
> resolve_misconfig() would set gfn X back to p2m_ram_rw, thereafter we
> can not
> track access to X;

Right, so this is a reason that simply making misconfigurations always
resolve ioreq_server into ram_rw isn't compatible with logdirty.

> Note: Not resetting the p2m type for p2m_ioreq_server when
> p2m->ioreq_server is
> not NULL is suitable for this simpler scenario, but is not correct if
> take your scenario
> into account.
> 
> The core reason is I could not find a simple solution in
> resolve_misconfig() to handle
> handle both the outdated p2m_ioreq_server entries, the in-use ones and
> to support
> the logdirty feature at the same time.

Indeed; and as I said, the real problem is that
p2m_change_entry_type_global() isn't really properly abstracted; in
order to use it you need to know how it works and be careful not to use
it at the wrong time.

Short-term, thinking through a handful of the scenarios we want to
support should be good enough.  Long-term, making it more robust so that
we don't have to think so hard about is probably better.

>> In a sense this is a deficiency in the change_entry_type_global()
>> interface.  A common OS principle is "make the common case fast, and the
>> uncommon case correct".  The scenario described above seems to me to be
>> an uncommon case which is handled quickly but incorrectly; ideally we
>> should handle it correctly, even if it's not very quick.
>>
>> Synchronously resolving a previous misconfig is probably the most
>> straightforward thing to do.  It could be done at point #3, when an M->N
>> type change is not complete and a new p2m entry of type M is written; it
>> could be at point #4, when an N->O type change is initiated while an
>> M->N type change hasn't completed.  Or it could be when an N->O type
>> change happens while there are unfinished M->N transitions *and*
>> post-type-change M entries.
> 
> Sorry, I did not quite get it.  Could you please elaborate more? Thanks! :)

Well the basic idea is to make change_entry_type_global() *appear* to
all external callers as though the change happened immediately.  And the
basic problem is that at the moment, you can start a second
change_entry_type_global() before the first one has actually finished
changing all the types it was meant to change.  So the improvement is to
make sure that all the types which need to be changed actually get
changed before the second invocation starts.

The absolute simplest thing to do would be to make it actually search
through the p2m table and make the change immediately.  But we'd like to
avoid this if we can because it's so slow.

So the next simplest thing to do would be that when someone calls
change_entry_type_global() a second time, you go through every
misconfigured entry and resolve it, so that the new
change_entry_type_global() starts with a clean slate.  This should be
faster than just sweeping the whole p2m table, since we only need to
check the p2m entries that haven't been touched since we did the type
change, but it may still be a lot of work.

So there are other optimizations we might be able to make to try to
avoid going through and re-syncing things; and those are the examples
that I gave.

>> For the time being though, this will fail at #4, right?  That is,
>> logdirty mode cannot be enabled while server B is registered?
>>
>> That does mean we'd be forced to sort out the situation before we allow
>> logdirty and ioreq_server to be used at the same time, but that doesn't
>> really seem like such a bad idea to me.
> 
> One solution I thought of is to just return failure in
> hap_enable_log_dirty()
> if p2m->ioreq.server is not NULL. But I did not choose such approach,
> because:
> 
> 1> I still want to keep the logdirty feature so that XenGT can use it to
> keep track
> of the dirty rams when we support live migration in the future;

But that's not something you can set in stone -- you can have it return
-EBUSY now, and then at such time as you add dirty vram support for
ioreq_server p2m types (which I don't think should be hard at all,
actually), you can remove that restriction.

The fact is that at he moment, setting logdirty *will not work*; and the
best interface is one in which broken things cannot happen even by
accident.  Regardless of what we end up deciding wrt Xen changing the
entries, I think that Xen should refuse to enable logdirty mode when
there is an ioreq server registered for ioreq_server p2m entries.

> 2> I also agree with Paul's argument: it is device model's duty to do
> the p2m type
> resetting work.

But that's not really the point.  We all agree that people should look
where they're going and it's generally an individual's responsibility
not to fall into pits or run into things in the road or on the sidewalk.
 But that doesn't mean that it's therefore OK to dig a pit with spikes
at the bottom in the middle of where people normally walk or cycle.
Because even though it is an individual person's responsibility not to
walk into holes, occasionally people are distracted or don't see well or
make mistakes; and the consequences for being temporarily distracted
should never be "falling onto a bed of sharp spikes". :-)  If you do
have to dig a hole in the sidewalk, then at very least you need to put a
physical barrier around it, so that the consequences for being
distracted are "runs into a barrier" rather than "falls into a pit".
But the best of all, if you can manage it, is not to dig the hole at all.

Similarly, even if it could be in theory the device model's duty to
reset the p2m entries it changed, it's still the case that programmers
make mistakes.  When those mistakes happen, we at very least want it to
be as easy to figure out what the problem is as possible; and if we can,
we want to make those mistakes completely harmless.

Making it possible for ioreq server A's entries to remain outstanding
after ioreq server B connects is the programming equivalent of leaving a
big open hole in the middle of a sidewalk: it means that when there's a
mistake made, there aren't any obvious immediate failures that tell you,
"Server A forgot to release some entries".  Instead, you will get random
failures, as bits of memory behave strangely or run very slowly for no
apparent reason.

Having it impossible to connect ioreq server B if there are still
outstanding entries from ioreq server A is the equivalent of digging a
hole and then putting up a barrier.  Now you get a failure when you try
it, and you're told exactly what the problem is -- the last guy didn't
release all his entries.  If you don't have the code you're still a bit
stuck, but at least you know it just doesn't work, rather than failing
in mysterious and difficult to detect ways later.

Having Xen reset the entries just makes this entire problem go away --
it's like accessing whatever you needed to access by digging sideways
from the storm drains, rather than digging a hole in the sidewalk.

It's all well and good to say, "It's the device model's responsibility",
but we need to plan on programmers making mistakes.  (And we also need
to plan for administrators making mistakes, which is why I think
returning -EBUSY when you try to enable logdirty with

>> I'm still open to being convinced, but at the moment it really seems to
>> me like improving the situation is the better long-term option.
>>
> 
> Thanks for all your advices, George. I'm also willing to taking other
> advices, if we have
> a more acceptable(for you, Jan and other maintainers) resync approach in
> hypervisor,
> I'd like  to add this. If the code is too complicated, I can submit it
> in a separate new
> patchset. :)

Well I think sometime in early July I should be able to make some time
to take a look at it properly.  Maybe I can start with a "draft" patch,
and you can take it and fix it up and make it work.  Or maybe I'll find
it's actually too complicated, and then agree with you that relying on
the server to clean up after itself is the only option. :-)

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.