[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.





On 6/23/2016 6:33 PM, George Dunlap wrote:
On 23/06/16 08:37, Yu Zhang wrote:
On 6/22/2016 7:33 PM, George Dunlap wrote:
On 22/06/16 11:07, Yu Zhang wrote:
On 6/22/2016 5:47 PM, George Dunlap wrote:
On 22/06/16 10:29, Jan Beulich wrote:
On 22.06.16 at 11:16, <george.dunlap@xxxxxxxxxx> wrote:
On 22/06/16 07:39, Jan Beulich wrote:
On 21.06.16 at 16:38, <george.dunlap@xxxxxxxxxx> wrote:
On 21/06/16 10:47, Jan Beulich wrote:
And then - didn't we mean to disable that part of XenGT during
migration, i.e. temporarily accept the higher performance
overhead without the p2m_ioreq_server entries? In which case
flipping everything back to p2m_ram_rw after (completed or
canceled) migration would be exactly what we want. The (new
or previous) ioreq server should attach only afterwards, and
can then freely re-establish any p2m_ioreq_server entries it
deems necessary.

Well, I agree this part of XenGT should be disabled during
migration.
But in such
case I think it's device model's job to trigger the p2m type
flipping(i.e. by calling
HVMOP_set_mem_type).
I agree - this would seem to be the simpler model here, despite
(as
George validly says) the more consistent model would be for the
hypervisor to do the cleanup. Such cleanup would imo be
reasonable
only if there was an easy way for the hypervisor to enumerate
all
p2m_ioreq_server pages.
Well, for me, the "easy way" means we should avoid traversing
the whole ept
paging structure all at once, right?
Yes.
Does calling p2m_change_entry_type_global() not satisfy this
requirement?
Not really - that addresses the "low overhead" aspect, but not the
"enumerate all such entries" one.
I'm sorry, I think I'm missing something here.  What do we need the
enumeration for?
We'd need that if we were to do the cleanup in the hypervisor (as
we can't rely on all p2m entry re-calculation to have happened by
the time a new ioreq server registers for the type).
So you're afraid of this sequence of events?
1) Server A de-registered, triggering a ioreq_server -> ram_rw type
change
2) gfn N is marked as misconfigured
3) Server B registers and marks gfn N as ioreq_server
4) When N is accessed, the misconfiguration is resolved incorrectly to
ram_rw

But that can't happen, because misconfigured entries are resolved
before
setting a p2m entry; so at step 3, gfn N will be first set to
(non-misconfigured) ram_rw, then changed to (non-misconfigured)
ioreq_server.

Or is there another sequence of events that I'm missing?
Thanks for your reply, George. :)
If no log dirty is triggered during this process, your sequence is
correct.
However, if log dirty is triggered, we'll met problems. I have described
this
in previous mails :

http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02426.html

on Jun 20

and

http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02575.html

on Jun 21
Right -- sorry, now I see the issue:

1. Server A marks gfn X as ioreq_server
2. Server A deregisters, gfn X misconfigured
3. Server B registers, marks gfn Y as ioreq_server
4. Logdirty mode enabled; gfn Y misconfigured
5. When X or Y are accessed, resolve_misconfigure() has no way of
telling whether the entry is from server A (which should be set to
logdirty) or from server B (which should be left as ioreq_server).
Exactly.  :)
Another simpler scenario would be
1. Server A marks gfn X as p2m_ioreq_server;
2. Logdirty mode enabled; gfn X misconfigured;
3. When X is written, it will not cause ept vioalation, but ept
misconfig, and the
resolve_misconfig() would set gfn X back to p2m_ram_rw, thereafter we
can not
track access to X;
Right, so this is a reason that simply making misconfigurations always
resolve ioreq_server into ram_rw isn't compatible with logdirty.

Note: Not resetting the p2m type for p2m_ioreq_server when
p2m->ioreq_server is
not NULL is suitable for this simpler scenario, but is not correct if
take your scenario
into account.

The core reason is I could not find a simple solution in
resolve_misconfig() to handle
handle both the outdated p2m_ioreq_server entries, the in-use ones and
to support
the logdirty feature at the same time.
Indeed; and as I said, the real problem is that
p2m_change_entry_type_global() isn't really properly abstracted; in
order to use it you need to know how it works and be careful not to use
it at the wrong time.

Short-term, thinking through a handful of the scenarios we want to
support should be good enough.  Long-term, making it more robust so that
we don't have to think so hard about is probably better.

In a sense this is a deficiency in the change_entry_type_global()
interface.  A common OS principle is "make the common case fast, and the
uncommon case correct".  The scenario described above seems to me to be
an uncommon case which is handled quickly but incorrectly; ideally we
should handle it correctly, even if it's not very quick.

Synchronously resolving a previous misconfig is probably the most
straightforward thing to do.  It could be done at point #3, when an M->N
type change is not complete and a new p2m entry of type M is written; it
could be at point #4, when an N->O type change is initiated while an
M->N type change hasn't completed.  Or it could be when an N->O type
change happens while there are unfinished M->N transitions *and*
post-type-change M entries.
Sorry, I did not quite get it.  Could you please elaborate more? Thanks! :)
Well the basic idea is to make change_entry_type_global() *appear* to
all external callers as though the change happened immediately.  And the
basic problem is that at the moment, you can start a second
change_entry_type_global() before the first one has actually finished
changing all the types it was meant to change.  So the improvement is to
make sure that all the types which need to be changed actually get
changed before the second invocation starts.

The absolute simplest thing to do would be to make it actually search
through the p2m table and make the change immediately.  But we'd like to
avoid this if we can because it's so slow.

So the next simplest thing to do would be that when someone calls
change_entry_type_global() a second time, you go through every
misconfigured entry and resolve it, so that the new
change_entry_type_global() starts with a clean slate.  This should be
faster than just sweeping the whole p2m table, since we only need to
check the p2m entries that haven't been touched since we did the type
change, but it may still be a lot of work.

So there are other optimizations we might be able to make to try to
avoid going through and re-syncing things; and those are the examples
that I gave.

For the time being though, this will fail at #4, right?  That is,
logdirty mode cannot be enabled while server B is registered?

That does mean we'd be forced to sort out the situation before we allow
logdirty and ioreq_server to be used at the same time, but that doesn't
really seem like such a bad idea to me.
One solution I thought of is to just return failure in
hap_enable_log_dirty()
if p2m->ioreq.server is not NULL. But I did not choose such approach,
because:

1> I still want to keep the logdirty feature so that XenGT can use it to
keep track
of the dirty rams when we support live migration in the future;
But that's not something you can set in stone -- you can have it return
-EBUSY now, and then at such time as you add dirty vram support for
ioreq_server p2m types (which I don't think should be hard at all,
actually), you can remove that restriction.

OK. Returning -EBUSY is fine to me.
In fact, I do not really worry about the tracking of ioreq_server gfns, I was
worrying about the normal dirty ram pages. But if p2m_change_entry_type()
can be enhanced in the future, I can remove that restriction then. :)

The fact is that at he moment, setting logdirty *will not work*; and the
best interface is one in which broken things cannot happen even by
accident.  Regardless of what we end up deciding wrt Xen changing the
entries, I think that Xen should refuse to enable logdirty mode when
there is an ioreq server registered for ioreq_server p2m entries.

2> I also agree with Paul's argument: it is device model's duty to do
the p2m type
resetting work.
But that's not really the point.  We all agree that people should look
where they're going and it's generally an individual's responsibility
not to fall into pits or run into things in the road or on the sidewalk.
  But that doesn't mean that it's therefore OK to dig a pit with spikes
at the bottom in the middle of where people normally walk or cycle.
Because even though it is an individual person's responsibility not to
walk into holes, occasionally people are distracted or don't see well or
make mistakes; and the consequences for being temporarily distracted
should never be "falling onto a bed of sharp spikes". :-)  If you do
have to dig a hole in the sidewalk, then at very least you need to put a
physical barrier around it, so that the consequences for being
distracted are "runs into a barrier" rather than "falls into a pit".
But the best of all, if you can manage it, is not to dig the hole at all.

Similarly, even if it could be in theory the device model's duty to
reset the p2m entries it changed, it's still the case that programmers
make mistakes.  When those mistakes happen, we at very least want it to
be as easy to figure out what the problem is as possible; and if we can,
we want to make those mistakes completely harmless.

Making it possible for ioreq server A's entries to remain outstanding
after ioreq server B connects is the programming equivalent of leaving a
big open hole in the middle of a sidewalk: it means that when there's a
mistake made, there aren't any obvious immediate failures that tell you,
"Server A forgot to release some entries".  Instead, you will get random
failures, as bits of memory behave strangely or run very slowly for no
apparent reason.

Having it impossible to connect ioreq server B if there are still
outstanding entries from ioreq server A is the equivalent of digging a
hole and then putting up a barrier.  Now you get a failure when you try
it, and you're told exactly what the problem is -- the last guy didn't
release all his entries.  If you don't have the code you're still a bit
stuck, but at least you know it just doesn't work, rather than failing
in mysterious and difficult to detect ways later.

Having Xen reset the entries just makes this entire problem go away --
it's like accessing whatever you needed to access by digging sideways
from the storm drains, rather than digging a hole in the sidewalk.

It's all well and good to say, "It's the device model's responsibility",
but we need to plan on programmers making mistakes.  (And we also need
to plan for administrators making mistakes, which is why I think
returning -EBUSY when you try to enable logdirty with

Hah. This is a very good metaphor. I am convinced. :)
Though I have doubts about how to refactor the p2m_change_entry_type_global() interface,
I'm now willing to take your suggestions:
a> still need the p2m resetting when ioreq server is unbounded;
b> disable log dirty feature if one ioreq server is bounded.

Does anyone else has different opinions? Thanks!

I'm still open to being convinced, but at the moment it really seems to
me like improving the situation is the better long-term option.

Thanks for all your advices, George. I'm also willing to taking other
advices, if we have
a more acceptable(for you, Jan and other maintainers) resync approach in
hypervisor,
I'd like  to add this. If the code is too complicated, I can submit it
in a separate new
patchset. :)
Well I think sometime in early July I should be able to make some time
to take a look at it properly.  Maybe I can start with a "draft" patch,
and you can take it and fix it up and make it work.  Or maybe I'll find
it's actually too complicated, and then agree with you that relying on
the server to clean up after itself is the only option. :-)

Thank you, George. I definitely would like to take this work.
And before that, I think disable the log dirty could be OK for me(
after all, making vGPU live migratible requires more features added).

Yu


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.