[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3] x86/mm: Short circuit damage from "fishy" ref/typecount failure
On 25.01.2021 18:59, Andrew Cooper wrote: > On 20/01/2021 08:06, Jan Beulich wrote: >> On 19.01.2021 19:09, Andrew Cooper wrote: >>> On 19/01/2021 16:48, Jan Beulich wrote: >>>> On 19.01.2021 14:02, Andrew Cooper wrote: >>>>> This code has been copied in 3 places, but it is problematic. >>>>> >>>>> All cases will hit a BUG() later in domain teardown, when a the missing >>>>> type/count reference is underflowed. >>>> I'm afraid I could use some help with this: Why would there >>>> be a missing reference, when the getting of one failed? >>> Look at the cleanup logic for the associated fields. >>> >>> Either the plain ref fails (impossible without other fatal refcounting >>> errors AFAICT), or the typeref fails (a concern, but impossible AFAICT). >> In principle I would agree, if there wasn't the question of >> count overflows. The type count presently is 56 bits wide, >> while the general refcount has 54 bits. It'll be a long time >> until they overflow, but it's not impossible. The underlying >> problem there that I see is - where do we draw the line >> between "can't possibly overflow in practice" (as we would >> typically assume for 64-bit counters) and "is to be expected >> to overflow (as we would typically assume for 32-bit >> counters)? > > Ok fine - I was treating 54 bits as "not going to happen in practice". > > A PV guest needs 2^43 pages of RAM to turn into pagetables to approach > the general refcount limit. This is more RAM than most people can > accord, and this is way in excess of our security supported limits. > > > Errors in this area are already hit BUGs in loads of cases, because that > is less bad than the alternatives. > > In principle, and as previously discussed, some issues in this area > could be fixed by porting refcount_t from PaX/Linux KSPP which will turn > refcount overflows into memory leaks, which is an even less bad alternative. > >> >> Also, as far as "impossible" here goes - the constructs all >> anyway exist only to deal with what we consider impossible. >> The question therefore really is of almost exclusively >> theoretical nature, and hence something like a counter >> possibly overflowing imo needs to be accounted for as >> theoretically possible, albeit impossible with today's >> computers and realistic timing assumptions. If a counter >> overflow occurred, it definitely wouldn't be because of a >> bug in Xen, but because of abnormal behavior elsewhere. >> Hence I remain unconvinced it is appropriate to deal with >> the situation by BUG(). > > I'm not sure how to be any clearer. > > I am literally not changing the current behaviour. Xen *will* hit a > BUG() if any of these domain_crash() paths are taken. > > If you do not believe me, then please go and actually check what happens > when simulating a ref-acquisition failure. Well, okay, if you don't want to share the knowledge you've gained, I will have to go this route. Will likely take longer though than if you had tried to clarify how a ref would go "missing" and hence allow for an underflow. This isn't a matter of believing you, but one of wanting to understand the situation you say you've run into. Jan > What I am doing is removing complexity (the point of the change) which > gives a false sense of the error being survivable. > > If you want to do something other than BUG() in these cases, then you > need to figure some way for the teardown logic to identify which ref > went missing, but this would be a different, follow-on patch. > >> But yes, if otoh we assume the failure here to be the result >> of a bug elsewhere in Xen (and not an overflow), then BUG() >> may be warranted. Yet afaic these constructs weren't meant >> to deal with bugs elsewhere in Xen, but with the >> "impossible". So if we change our collective mind here, I >> think the conversion to BUG() would then need accompanying >> by respective commentary. > > BUG() is, and has always been, Xen's way of dealing with impossibles, > particularly when it comes to memory handling. > > This isn't a "changing minds" occasion. Removals of BUG()s elsewhere > pertains to logical error based on guest state, which is indeed > inappropriate error handling. > > ~Andrew >
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |