[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen BUG at page_alloc.c:1738 (Xen 4.5)

On 30/05/2015 23:07, M A Young wrote:
> On Fri, 29 May 2015, Andrew Cooper wrote:
>> On 29/05/15 12:17, M A Young wrote:
>>>>> I did a bit of testing - xen-4.5.1-rc1 built on Fedora 22 (gcc5) doesn't 
>>>>> boot for me, but if I replace xen.gz with one from the same code built on 
>>>>> Fedora 21 (gcc4) then it does boot. There are rpms and build logs 
>>>>> available via 
>>>>> http://copr.fedoraproject.org/coprs/myoung/xentest/build/93366/
>>>>> if anyone else wants to do some testing.
>>>>>   Michael Young
>>>> Do you have easy access to xen-syms from each build?
>>> Yes.
>> Thankyou very much.
>> GCC 5 is indeed miscompiling the code. Comparing the fc21 vs fc22 builds:
>> The C snippet from mmio_ro_do_page_fault():
>> struct page_info *page = mfn_to_page(mfn);
>> struct domain *owner = page_get_owner_and_reference(page);
>> if ( owner )
>>     put_page(page);
>> In fc21 is:
>> movabs $0xffff82e000000000,%rbp
>> shr    %cl,%rax
>> or     %rdx,%rax
>> shl    $0x5,%rax
>> add    %rax,%rbp
>> mov    %rbp,%rdi
>> callq  ffff82d080186900 <page_get_owner_and_reference>
>> test   %rax,%rax
>> mov    %rax,%r12
>> je     ffff82d080189c4e <mmio_ro_do_page_fault+0x11e>
>> mov    %rbp,%rdi
>> callq  ffff82d080188ec0 <put_page>
>> and in fc22 is:
>> movabs $0xffff82e000000000,%r8
>> shr    %cl,%rax
>> or     %rdx,%rax
>> shl    $0x5,%rax
>> lea    (%r8,%rax,1),%rdi
>> callq  ffff82d0801874f0 <page_get_owner_and_reference>
>> test   %rax,%rax
>> mov    %rax,%rbp
>> je     ffff82d08018ca14 <mmio_ro_do_page_fault+0x114>
>> mov    %r8,%rdi
>> callq  ffff82d080189a90 <put_page>
>> "lea (%r8,%rax,1),%rdi" in FC22 is slightly shorter than "add %rax,%rbp;
>> mov %rbp,%rdi" in FC21.  In both cases %rdi is now 'page' from the C
>> snippet.
>> In FC21, the result is stored in %rbp, then reloaded from %rbp into %rdi
>> for call to put_page().
>> However, in FC22, the result of the calculation is only held in %rdi,
>> and clobbered by the call to page_get_owner_and_reference().  When it
>> comes to call put_page(), %r8 is reloaded, which is still a pointer to
>> the base of the frametable, not the page we actually took a reference on.
>> FC22 is miscompiling the C to:
>> struct page_info *page = mfn_to_page(mfn);
>> struct domain *owner = page_get_owner_and_reference(page);
>> if ( owner )
>>     put_page(mfn_to_page(0));
>> which is wrong, and why free_domheap_pages() does legitimately complain
>> about the wonky refcount.
> With a bit of experimentation I have found that compiling with the 
> -fno-caller-saves flag gets this code segment back to the Fedora 21 
> version, thus avoiding the bug.

After sending this email, I wondered whether the optimiser as assuming
that %rdi was preserved.  Indeed, it turns out that the generated code
for page_get_owner_and_reference leaves %rdi unmodified, and safe for
reuse after return.

If the 'mov %r8,%rdi' were simply omitted, the code would work, as %rdi
still contains the correct result of the original calculation.

Therefore, I suspect that the bug is in the -fcaller-saves optimisation


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.