[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4] gnttab: defer allocation of status frame tracking array



On 05.05.2021 12:49, Andrew Cooper wrote:
> On 04/05/2021 09:42, Jan Beulich wrote:
>> This array can be large when many grant frames are permitted; avoid
>> allocating it when it's not going to be used anyway, by doing this only
>> in gnttab_populate_status_frames(). While the delaying of the respective
>> memory allocation adds possible reasons for failure of the respective
>> enclosing operations, there are other memory allocations there already,
>> so callers can't expect these operations to always succeed anyway.
>>
>> As to the re-ordering at the end of gnttab_unpopulate_status_frames(),
>> this is merely to represent intended order of actions (shrink array
>> bound, then free higher array entries). If there were racing accesses,
>> suitable barriers would need adding in addition.
>>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> 
> Nack.
> 
> The argument you make says that the grant status frames is "large
> enough" to care about not allocating.  (I frankly disagree, but that
> isn't relevant to my to my nack).
> 
> The change in logic here moves a failure in DOMCTL_createdomain, to
> GNTTABOP_setversion.  We know, from the last minute screwups in XSA-226,
> that there are versions of Windows and Linux in the field, used by
> customers, which will BUG()/BSOD when GNTTABOP_setversion doesn't succeed.

So after you did re-state the Linux aspect of this on the call yesterday
I went and looked. In the first phase of Linux supporting v2 at all
(3.3 - 3.16) there indeed was a panic() upon certain kinds of failures.
Only up to 3.13 was there an actual attempt to use v2. Also, while there
was some code there to support actual v2 features (sub-page or
transitive grants), all of this was dead. Hence their claim

                /*
                 * If we've already used version 2 features,
                 * but then suddenly discover that they're not
                 * available (e.g. migrating to an older
                 * version of Xen), almost unbounded badness
                 * can happen.
                 */

was bogus at best. If there was no way at all for set_version to fail
prior to the change, here I could probably accept your position. But as
said before by Julien - there are pre-existing memory allocations (of
typically larger size) there, and hence any guest assuming the call
can't fail is flawed already anyway. And no, I don't view this as a
reason for us to try to eliminate all memory allocations from that
hypercall (which would in particular mean populating status frames
irrespective of whether a guest would ever switch to v2). Guest flaws
would better be addressed in the guests.

Upon re-introduction in 4.15 no such fatal behavior was put back in
place. I notice though that even up-to-date Linux has problematic
behavior around GNTTABOP_setup_table - the call is followed (after an
explicit -ENOSYS check) by "BUG_ON(rc || setup.status)". The amount of
memory needed here (including the status table) is potentially far
higher than for set_version. And what's important: setup_table gets
invoked only after set_version, so any table expansion would happen
here, with - if any table growing is needed at all - a far higher risk
for failure.

This ordering of operations (set_version before setup_table) as well
as the "error checking" after setup_table was also in effect up to
3.13. Therefore the same conclusion can be drawn there: The main risk
for crashing the guest due to memory shortage in Xen is there, not
with the version switch. What you've observed with XSA-226 (or rather,
I assume, with a custom extension of yours at the time, to prohibit use
of v2, which was upstreamed only later) was entirely unrelated to memory
shortage, but was instead a result of v2 suddenly becoming unavailable
altogether.

As to Windows, in the pvdrivers/win/ git repos I haven't been able to
find any use of GrantTableSetVersion(). Of course I can't exclude
other versions or other driver variants making use of set_version in an
inappropriate way. But as long as they don't grow the number of grant
table entries before such a call, the main risk of memory allocation
failure would again be with other hypercalls.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.