[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86: enable interrupts around dump_execstate()


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 15 Sep 2022 10:01:11 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=v2LgxLd5d+p++ENnAU5rvHkmCcSw+f/z9x7e8kBgz48=; b=FzfiDdrphWqSRgF/x0JHCg+kdJc6Os39zLkcwD+YiWbhOtZ7hZAyH3cNOYGi5Ku9/Bm0CS7LmLPnLwj50FHe11wimTZonsvw4+FPGqzY0xtcVnKCXTf8xbQ2sNYY/S4a4Uy59QoYWiDN6PDTVDNYguHfjY3kHghNlfNNL9hQrn6Ssiyuzgl8GE53s1no//Bh+q5n2hzrk6m4zbIdHa6fUExDo0OQ2lpqIeJDrcLuFn7GquIG/siSa9rE7e2QBays+79tOEATbMpEYfobMarMaejjImtCQRVqS4U5699iCtgin+RydHQGyuPzkKnG83F+neTLORIphF4ElovVhzsrFA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gD0wsQpRo/iCpELJojLLe92xQcQ8T7MxLuuzpWRQ1wmAOmDZUTeuwKttdATYNRPBBb2DcCr9lCKUknn+QRKGdr2Bh+3grxhTxvxwHFWrYbceUQfQPvSvMW4eOp7rcRiff/kMdQxYCyPt2hLoy+HGn5p97ZQjShBqQm6L2JL1SXh/Ak+CENAKQcXj5TF2fZZ9kCZ1mJ1yodw87a0il247BTi2j3HMUfB/ROrrlnUhniBMbwBvLMAoq8clC+TyMw3TfWWZya0w2z7oQlUjV45OXayGvKbNWa4MHIMRRWxHm02r+rD9Vt+lsGTWyxd2qbiSBx7Rzag0ixsYfe6EyxUo7A==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Thu, 15 Sep 2022 08:01:21 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.09.2022 16:23, Roger Pau Monné wrote:
> On Wed, Sep 14, 2022 at 12:13:49PM +0200, Jan Beulich wrote:
>> On 14.09.2022 11:13, Roger Pau Monné wrote:
>>> On Wed, Sep 14, 2022 at 10:31:34AM +0200, Jan Beulich wrote:
>>>> On 14.09.2022 10:14, Jan Beulich wrote:
>>>>> On 13.09.2022 16:50, Roger Pau Monné wrote:
>>>>>> On Mon, Dec 13, 2021 at 04:12:55PM +0100, Jan Beulich wrote:
>>>>>>> show_hvm_stack() requires interrupts to be enabled to avoids triggering
>>>>>>> the consistency check in check_lock() for the p2m lock. To do so in
>>>>>>> spurious_interrupt() requires adding reentrancy protection / handling
>>>>>>> there.
>>>>>>
>>>>>> There's also an ASSERT(!in_irq()) in _percpu_write_lock() that will
>>>>>> trigger when trying to acquire the p2m lock from spurious_interrupt()
>>>>>> context, as p2m_lock() -> mm_write_lock() -> _mm_write_lock ->
>>>>>> percpu_write_lock().
>>>>>
>>>>> s/will/may/ since spurious_interrupt() doesn't itself use irq_enter(),
>>>
>>> do_IRQ() does call irq_enter(), and that's the caller of
>>> spurious_interrupt() AFAICT.
>>
>> Hmm, you're right. I was mislead by smp_call_function_interrupt()
>> explicitly using irq_{enter,exit}(). I guess that should have been
>> removed in b57458c1d02b ("x86: All vectored interrupts go through
>> do_IRQ()"). I guess I need to either open-code the variant of in_irq()
>> I'd need, or (perhaps better for overall state) explicitly irq_exit()
>> before the check and irq_enter() after the call. Thoughts?
> 
> Well, it's ugly but it's likely the easier way to get this working.

Just to clarify - the first of the options I did name is (of course) not
viable: If we open-coded a local_irq_count() == 1 check here, the
assertion you named would still trigger.

>>>>> but yes - we could nest inside a lower priority interrupt. I'll make
>>>>> local_irq_enable() depend on !in_irq().
>>>>
>>>> Upon further thought I guess more precautions are necessary: We might
>>>> have interrupted code holding the P2M lock already, and we might also
>>>> have interrupted code holding another MM lock precluding acquiring of
>>>> the P2M lock. All of this probably plays into Andrew's concerns, yet
>>>> still I don't view it as a viable route to omit the stack dump for HVM
>>>> domains, and in particular for PVH Dom0. Sadly I can't think of any
>>>> better approach ...
>>>
>>> Yes, I also had those concerns.  The mm locks are recursive, but
>>> spurious_interrupt() hitting in the middle of code already holding any
>>> mm lock is likely to end up triggering the mm lock order checker.
>>
>> Guarding against this is possible, while ...
>>
>>> One (likely very risky option ATM) is to introduce a per pCPU flag
>>> that when set will turn all mm locks into noops, and use it here in
>>> order to avoid any locking issues.  This could introduce two issues at
>>> least: first one is how resilient page walking routines are against
>>> page tables changing under their feet.  The second one is that any
>>> page table walker p2m helper should avoid doing modifications to the
>>> p2m, so no P2M_ALLOC or P2M_UNSHARE flags could be used.
>>
>> ... personally I view this as too risky.
> 
> Is the dump of the stack only used for the debug key handler, or there
> are other places this is also used?

It's called from show_execution_state(), which also dumps state for e.g.
crashes or WARN_ON()s.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.