[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xen-unstable test] 162771: regressions - FAIL


  • To: Juergen Gross <jgross@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 14 Jun 2021 13:58:37 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GLUDRLYsIajKK0RYgEbfX6JIhgIHNKKzgzDM+8fsSzU=; b=LFkSk6dt7N91eOkhxe2jsYJteNB89BuVBvG1EY1Sp2mL3hcoSxQtSl2Qxbx3+sYwGlEcxjcNTi7m6Wrh9tMCC4fnvfFA88YlJsFYvu0WmzX8QDkxssC3kavtnM8mux2xqSAnoobrNfRI4lLvEEcNdzT5faRSLEfH7oUbU8OvRlPY7GyUS2SCqCZIVbniPRUnlurEwVJePkpEy+wL/m36cAXgWLYuLdMOEFZEj77zWz7HfBTKOxIBqt6n22NoVYXZg9YfheddAmWuvndZdakVbvM8+75KJ1gSUhkHoy8leic7K5kJ6TxVqIpT9W9aaiJUy0p3EiEjCLnibUQ1AEMasg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IW1QwQDZDp8Jkt8az0yD5hGZNwF2g6TDQdi1kycnAb1+i++wO5R5kEN4LI6C+NSl2hk1AngeK5l+WWA6mSvJdMMFhcBdePNXwgNnhwhSRp5tE8bL98DAbUmXf3rZVLRWVx8GGr9g+kdhUhdY/qKjip+SVkavD/E5I5wQvbAGCYKRnHzmb9lE7cBmPoMRa8+x6shTdKHoqrnvz4wwqhZMBPi3WZQocqnD3sN3NwjmTdYQTgec3KAu9RIFNeW142MAUOr62iERT3kF+r/G4q5ajuXEmzRIYhBIxphGWwumtnw4R0b8yeRE8V5U5gjBp+eFoZh1Iy7z5UVYP53FWkvR6w==
  • Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;
  • Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, osstest service owner <osstest-admin@xxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 14 Jun 2021 11:58:52 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.06.2021 08:41, Juergen Gross wrote:
> On 14.06.21 04:21, osstest service owner wrote:
>> flight 162771 xen-unstable real [real]
>> flight 162783 xen-unstable real-retest [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/162771/
>> http://logs.test-lab.xenproject.org/osstest/logs/162783/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>   test-amd64-amd64-xl-qemuu-ovmf-amd64 15 guest-saverestore fail REGR. vs. 
>> 162533
>>   test-amd64-i386-xl-qemuu-ovmf-amd64 15 guest-saverestore fail REGR. vs. 
>> 162533
>>   test-amd64-amd64-i386-pvgrub 17 guest-localmigrate       fail REGR. vs. 
>> 162533
>>   test-amd64-amd64-amd64-pvgrub 17 guest-localmigrate      fail REGR. vs. 
>> 162533
> 
> Hmm, this is rather unfortunate.
> 
> Those last 2 tests failed due to commit 7bd8989ab77b6ade3b, but just
> reverting that patch doesn't seem right to me either.
> 
> The Linux kernel has a bug here: it will initially set max_pfn in the
> shared_info page to the size of the p2m_list (so my reasoning for above
> patch was wrong in this case), but when growing the p2m_list (e.g. due
> to ballooning or grant mapping), it will store a real pfn number in
> max_pfn. But even this pfn might be wrong, as only the pfn leading to
> allocation of a new p2m page will be stored in max_pfn, any higher new
> pfn having its p2m entry in the new p2m page won't result in a new
> max_pfn entry.
> 
> As a result I think the only sane handling would be to assume the
> max_pfn value read from the shared_info page is really a pfn.

This would be contrary to the public interface header having

    /*
     * Number of valid entries in the p2m table(s) anchored at
     * pfn_to_mfn_frame_list_list and/or p2m_vaddr.
     */
    unsigned long max_pfn;

IOW the name containing "max" is misleading (should be "num" or
alike), but there's no room imo to change this.

> This
> value should be adjusted to specify the last pfn of the related p2m
> page, and the resulting last p2m page should be tolerated to not be
> valid.
> 
> Another variant would be to just revert above commit and modify the
> semantics of max_pfn in the shared_info page to really mean max_pfn+1.

But that's what it is already, according to the comment. Are you
suggesting there was code prior to the change you've quoted that
already violated this (in Xen or the tool stack, that is, not
the issue you suggest has been present in Linux)?

Jan

> This would result in possible migration failures of ballooned Linux
> systems as today.
> 
> Additionally I'll fix the Linux kernel, of course.
> 
> Any thoughts?
> 
> 
> Juergen
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.