[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4] gnttab: defer allocation of status frame tracking array
On 05.05.2021 12:49, Andrew Cooper wrote: > On 04/05/2021 09:42, Jan Beulich wrote: >> This array can be large when many grant frames are permitted; avoid >> allocating it when it's not going to be used anyway, by doing this only >> in gnttab_populate_status_frames(). While the delaying of the respective >> memory allocation adds possible reasons for failure of the respective >> enclosing operations, there are other memory allocations there already, >> so callers can't expect these operations to always succeed anyway. >> >> As to the re-ordering at the end of gnttab_unpopulate_status_frames(), >> this is merely to represent intended order of actions (shrink array >> bound, then free higher array entries). If there were racing accesses, >> suitable barriers would need adding in addition. >> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > > Nack. > > The argument you make says that the grant status frames is "large > enough" to care about not allocating. (I frankly disagree, but that > isn't relevant to my to my nack). > > The change in logic here moves a failure in DOMCTL_createdomain, to > GNTTABOP_setversion. We know, from the last minute screwups in XSA-226, > that there are versions of Windows and Linux in the field, used by > customers, which will BUG()/BSOD when GNTTABOP_setversion doesn't succeed. So after you did re-state the Linux aspect of this on the call yesterday I went and looked. In the first phase of Linux supporting v2 at all (3.3 - 3.16) there indeed was a panic() upon certain kinds of failures. Only up to 3.13 was there an actual attempt to use v2. Also, while there was some code there to support actual v2 features (sub-page or transitive grants), all of this was dead. Hence their claim /* * If we've already used version 2 features, * but then suddenly discover that they're not * available (e.g. migrating to an older * version of Xen), almost unbounded badness * can happen. */ was bogus at best. If there was no way at all for set_version to fail prior to the change, here I could probably accept your position. But as said before by Julien - there are pre-existing memory allocations (of typically larger size) there, and hence any guest assuming the call can't fail is flawed already anyway. And no, I don't view this as a reason for us to try to eliminate all memory allocations from that hypercall (which would in particular mean populating status frames irrespective of whether a guest would ever switch to v2). Guest flaws would better be addressed in the guests. Upon re-introduction in 4.15 no such fatal behavior was put back in place. I notice though that even up-to-date Linux has problematic behavior around GNTTABOP_setup_table - the call is followed (after an explicit -ENOSYS check) by "BUG_ON(rc || setup.status)". The amount of memory needed here (including the status table) is potentially far higher than for set_version. And what's important: setup_table gets invoked only after set_version, so any table expansion would happen here, with - if any table growing is needed at all - a far higher risk for failure. This ordering of operations (set_version before setup_table) as well as the "error checking" after setup_table was also in effect up to 3.13. Therefore the same conclusion can be drawn there: The main risk for crashing the guest due to memory shortage in Xen is there, not with the version switch. What you've observed with XSA-226 (or rather, I assume, with a custom extension of yours at the time, to prohibit use of v2, which was upstreamed only later) was entirely unrelated to memory shortage, but was instead a result of v2 suddenly becoming unavailable altogether. As to Windows, in the pvdrivers/win/ git repos I haven't been able to find any use of GrantTableSetVersion(). Of course I can't exclude other versions or other driver variants making use of set_version in an inappropriate way. But as long as they don't grow the number of grant table entries before such a call, the main risk of memory allocation failure would again be with other hypercalls. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |