[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.7 crash



On 01/06/2016 23:18, Julien Grall wrote:
> Hi Andrew,
>
> On 01/06/2016 22:24, Andrew Cooper wrote:
>> On 01/06/2016 21:45, Aaron Cornelius wrote:
>>>>
>>>>> However, since I only have 1 domain active at a time, I'm not sure
>>>>> why I
>>>> should run out of VM IDs.
>>>>
>>>> Sounds like a VMID resource leak.  Check to see whether it is freed
>>>> properly
>>>> in domain_destroy().
>>>>
>>>> ~Andrew
>>> That would be my assumption.  But as far as I can tell,
>>> arch_domain_destroy() calls pwm_teardown() which calls
>>> p2m_free_vmid(), and none of the functionality related to freeing a
>>> VM ID appears to have changed in years.
>>
>> The VMID handling looks suspect.  It can be called repeatedly during
>> domain destruction, and it will repeatedly clear the same bit out of the
>> vmid_mask.
>
> Can you explain how the p2m_free_vmid can be called multiple time?
>
> We have the following path:
>    arch_domain_destroy -> p2m_teardown -> p2m_free_vmid.
>
> And I can find only 3 call of arch_domain_destroy we should only be
> done once per domain.
>
> If arch_domain_destroy is called multiple time, p2m_free_vmid will not
> be the only place where Xen will be in trouble.

You are correct.  I was getting my phases of domain destruction mixed
up.  arch_domain_destroy() is strictly once, after the RCU reference of
the domain has dropped to 0.

>
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index 838d004..7adb39a 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -1393,7 +1393,10 @@ static void p2m_free_vmid(struct domain *d)
>>      struct p2m_domain *p2m = &d->arch.p2m;
>>      spin_lock(&vmid_alloc_lock);
>>      if ( p2m->vmid != INVALID_VMID )
>> -        clear_bit(p2m->vmid, vmid_mask);
>> +    {
>> +        ASSERT(test_and_clear_bit(p2m->vmid, vmid_mask));
>> +        p2m->vmid = INVALID_VMID;
>> +    }
>>
>>      spin_unlock(&vmid_alloc_lock);
>>  }
>>
>> Having said that, I can't explain why that bug would result in the
>> symptoms you are seeing.  It is also possibly that your issue is memory
>> corruption from a separate source.
>>
>> Can you see about instrumenting p2m_alloc_vmid()/p2m_free_vmid() (with
>> vmid_alloc_lock held) to see which vmid is being allocated/freed ?
>> After the initial boot of the system, you should see the same vmid being
>> allocated and freed for each of your domains.
>
> Looking quickly at the log, the domain is dom1101. However, the number
> maximum number of VMID supported is 256, so the exhaustion might be a
> race somewhere.
>
> I would be interested to get a reproducer. I wrote a script to cycle a
> domain (create/domain) in loop, and I have not seen any issue after
> 1200 cycles (and counting).

Given that my previous thought was wrong, I am going to suggest that
some other form of memory corruption is a more likely cause.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.