[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xen-unstable-smoke test] 173492: regressions - FAIL


  • To: Julien Grall <julien@xxxxxxx>, Henry Wang <Henry.Wang@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Date: Wed, 12 Oct 2022 11:05:17 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=dZP7EKvOAwV6BeKqNmKjiGGeUYq+Wxw1VjmIE9d4GLw=; b=gv9FgiDu75CdgLX12vTjOIFWyPvSuYOtZM3Douw3IKgO9os9XJTGSkybqhdCxcz6ZDjyFLTozAqHrTwc1HZbynlregejGUx9S1wX3xI3NQkqd9vnOLftPJqj2gQ0Bk+VoIH6lgOr1z0L+t2pr4U8ilnCmY5Ch+0mAinWOzx++tAHAzBFHXdSiIakSF+AfczuYSjuRGyZ0W3A0+RCtHoql1JdQdwlehxJ/ViVmRjIxdih4QHEQBZp5Q8Ec0dZLwzQrLE05SBsAsTCHqkIdeiruC/YcfVw/EEcc2QNYKFYEY2lItBjzzRhRyMCDok7pfjegO1dWlz2G1P4YAOvX/SIWw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bL1qpgq8oXsn5clm5lyBcKxff4GRr2JY2xXpz9Np/F7bVNUrSqUTEK+OG/t3/BWP1VzlCqBHykWcRgjhKc67V4uk2gxy1SwD0EWdB0m+tpD9wtFPFHpkmvBAYo+fVEeWq/F9qnQmnQZOCUryqOOOuWuKqp7xzXCTJ4c0Wj1KMcJjU8TTscjG5fXKgcIhTCBCm066OE79dpYHzwFQ4RoQjhBb13vm7E4ttYK6Dm0fDVyNJ8JP2VRKQxqnEOSIWNDBgoJmn86geqjMRViSgLQAy2/+TlvLT9TTCErjJGmqnpdIZxhDbKhzopyZJj7kdPpSNmOwYTdpR19+jaCqqdUPtQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: osstest service owner <osstest-admin@xxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
  • Delivery-date: Wed, 12 Oct 2022 11:05:35 +0000
  • Ironport-data: A9a23:qpNvZ6PKjHUHIoXvrR0slsFynXyQoLVcMsEvi/4bfWQNrUp01jEEy jQZWTuDaP+OYzamKYxxaoqy8ktX7J7XmtdjHAto+SlhQUwRpJueD7x1DKtS0wC6dZSfER09v 63yTvGacajYm1eF/k/F3oDJ9CU6j+fQLlbFILasEjhrQgN5QzsWhxtmmuoo6qZlmtH8CA6W0 T/Ii5S31GSNhnglbwr414rZ8Ek15ayr4mtC1rADTasjUGH2xiF94K03fcldH1OgKqFIE+izQ fr0zb3R1gs1KD90V7tJOp6iGqE7aua60Tqm0xK6aID76vR2nQQg075TCRYpQRw/ZwNlPTxG4 I4lWZSYEW/FN0BX8QgXe0Ew/ypWZcWq9FJbSJQWXAP6I0DuKhPRL/tS4E4eE6Eh8PlvG1B30 98Hawo1XgKAttm1z+fuIgVsrpxLwMjDGqo64ykl5xeGSPEsTNbEXrnA4sJe0HEonMdSEP3CZ s0fLz1ycBDHZB4JMVASYH48tL7w2j+jLHsH8BTM/fFfD2v7lWSd1JDENtbPd8PMbsJShkuC/ UrN/njjAwFcP9uaodaA2iL237Cfw3qnMG4UPJLm2ed1kVG0/Vc0GEc2cFqXqMi2pmfrDrqzL GRRoELCt5Ma9lGvT9T7dw21pjiDpBF0c/pdFfcrrj6EzKX86hycQGMDS1ZpeNEg8cM7WzEu/ luIhM/yQyxitqWPTnCQ/avSqim9UQAfIHUefyYCQU0A6sP6vYAophvVS5BoF6vdpsbuBTj6z jSOrS4/r7Yel8gG0+O851+vqzW3vYTEVRJw7wnSX2SN/g5/fJSiZYGj9R7c9/koBISTQ0SFv XMEs9OD9+1IBpaI/BFhW80IFbCtovyDbzvVhAc3G4F7rmz2vXm+YYpX/TdyYl9zNdoJciPoZ 0mVvh5N4JhUPz2haqofj5+NNvnGBJPITbzNPs04pPIXOvCdqCfvEPlSWHOt
  • Ironport-hdrordr: A9a23:z/5/y64YiILSTgrN3QPXwWuBI+orL9Y04lQ7vn2ZFiY5TiXIra qTdaogviMc0AxhI03Jmbi7Scq9qeu1z+853WBjB8bZYOCAghrlEGgC1/qp/9SEIUHDH4FmpM BdmsRFaeEYSGIK9foSgzPIXOrIouP3lpxA7N22pxgCcegpUdAY0+4TMHf4LqQCfngjOXNPLu v42iMonVqdUEVSSv7+KmgOXuDFqdGOvonhewQ6Cxku7xTLpS+06ZbheiLonys2Yndq+/MP4G LFmwv26uGIqPeg0CLR0GfV8tB/hMbh8N1eH8aB4/JlagkEyzzYJ7iJaYfy+Qzdk9vfrGrCV+ O85CvICv4DqU85uFvF5ycFlTOQiQrGoEWStGNwyUGT3fARAghKRfapzLgpDCcwoSAbza5B+b MO0GSDu5VNCxTc2Cz7+tjTThlv0lG5uHw4jIco/jViuKYlGchsRLYkjTVoOYZFGDi/5JEsEe FoAs2Z7PFKcUmCZ3ScumV02tSjUnk6Ax/DGyE5y4eo+ikTmGo8w1oTxcQZkHtF/JUhS4Nc7+ CBNqhzjrlBQsIfcKo4DuYcRsm8DHDLXHv3QSqvCEWiELtCN2PGqpbx7rlw7Oa2eIYQxJ93g5 jFWEMwjx9HR6svM7z64HRmyGG/fIzmZ0Wd9ih33ekIhpTsALz2LCaEVFci18O9vvR3OLyoZ8 qO
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHY3Y6y8LcGiSS6R0KH98TlKj582q4KTrGAgAABDACAADhQgIAAB8YAgAAFwQCAAARSAA==
  • Thread-topic: [xen-unstable-smoke test] 173492: regressions - FAIL

On 12/10/2022 11:49, Julien Grall wrote:
> Hi Andrew,
>
> On 12/10/2022 11:29, Andrew Cooper wrote:
>> On 12/10/2022 11:01, Julien Grall wrote:
>>> (+ Bertrand & Stefano)
>>>
>>> Hi Henry,
>>>
>>> On 12/10/2022 07:39, Henry Wang wrote:
>>>>> -----Original Message-----
>>>>> Subject: Re: [xen-unstable-smoke test] 173492: regressions - FAIL
>>>>>
>>>>> On 11.10.2022 18:29, osstest service owner wrote:
>>>>>> flight 173492 xen-unstable-smoke real [real]
>>>>>> http://logs.test-lab.xenproject.org/osstest/logs/173492/
>>>>>>
>>>>>> Regressions :-(
>>>>>>
>>>>>> Tests which did not succeed and are blocking,
>>>>>> including tests which could not be run:
>>>>>>    test-arm64-arm64-xl-xsm      14 guest-start              fail
>>>>>> REGR. vs. 173457
>>>>>
>>>>> Parsing config from /etc/xen/debian.guest.osstest.cfg
>>>>> libxl: debug: libxl_create.c:2079:do_domain_create: ao
>>>>> 0xaaaacaccf680:
>>>>> create: how=(nil) callback=(nil) poller=0xaaaacaccefd0
>>>>> libxl: detail: libxl_create.c:661:libxl__domain_make: passthrough:
>>>>> disabled
>>>>> libxl: debug: libxl_arm.c:148:libxl__arch_domain_prepare_config:
>>>>> Configure
>>>>> the domain
>>>>> libxl: debug: libxl_arm.c:151:libxl__arch_domain_prepare_config:  -
>>>>> Allocate
>>>>> 0 SPIs
>>>>> libxl: error: libxl_create.c:709:libxl__domain_make: domain creation
>>>>> fail: No
>>>>> such file or directory
>>>
>>> So this is -ENOENT which could be returned by the P2M is it can't
>>> allocate a page table (see p2m_set_entry()).
>>>
>>>>> libxl: error: libxl_create.c:1294:initiate_domain_create: cannot
>>>>> make domain:
>>>>> -3
>>>>>
>>>>> Later flights don't fail here anymore, though.
>>>>>
>>>>>>    test-armhf-armhf-xl          14 guest-start              fail
>>>>>> REGR. vs. 173457
>>>>>
>>>>> Similar log contents here, but later flights continue to fail the
>>>>> same way.
>>>>>
>>>>> I'm afraid I can't draw conclusions from this; I haven't been able
>>>>> to spot
>>>>> anything helpful in the hypervisor logs. My best guess right now is
>>>>> the use
>>>>> of some uninitialized memory, which just happened to go fine in the
>>>>> later
>>>>> flights for 64-bit.
>>>
>>> It looks like the smoke flight failed on laxton0 but passed on
>>> rochester{0, 1}. The former is using GICv2 whilst the latter are using
>>> GICv3.
>>>
>>> In the case of GICv2, we will create a P2M mapping when the domain is
>>> created. This is not necessary in the GICv3.
>>>
>>> IIRC the P2M pool is only populated later on (we don't add a few pages
>>> like on x86). So I am guessing this is why we are seen failure.
>>>
>>> If that's correct, then this is a complete oversight from me (I
>>> haven't done any GICv2 testing) while reviewing the series.
>>>
>>> The easy way to solve it would be to add a few pages in the pool when
>>> the domain is created. I don't like it, but I think there other
>>> possible solutions would require more work as we would need to delay
>>> the mappings.
>>
>> Honestly, I've considered doing this on x86 too.
>
> AFAICT, this is already the case for HAP (see call to
> hap_set_allocation() in hap_enable()). 256 pages will be pre-allocated.

Right, but it's asymmetric with shadow.  This wants fixing and simplifying.

>
>>
>> There are several things which want allocating in domain_create(), but
>> are deferred to max_vcpus() because they require the P2M having a
>> non-zero allocation.  This in turn means we've got a load of checks in
>> paths where we'd ideally not have them.
>>
>> We already have a calculation of the absolutely minimum we will ever
>> permit the p2m pool to be.  IMO we ought to allocate this minimum size
>> in domain_create().
>
> It depends on the number. At the moment domain_create() is not
> preemptible, so we don't want to allocate too many pages (I think even
> 256 pages could be risky on some Arm platform).
>
> Maybe the solution is to have domain_create() preemptible. But it is
> not something that could be done in the 4.17 time frame.

domain_create() can't be pre-emptible in its current form, because it
depends on "atomically" taking the domid from not existing to existing. 
Specifically, until the hypercall completes, other hypercalls can't find
a struct domain* for the domid.

This is necessary, because we guarantee that when you can look up a
domain by domid, e.g. the predicates work on it, and d->max_vcpus is
nonzero, etc.

In some future where the error paths have been made idempotent and we
have a clean split between teardown and destroy, we probably can alter
the existing creation path to do a more basic initial setup (which can
be cleaned up by the destroy logic), then insert the domain into dom
hashtable and automatically continue into a different subop and perform
more long-running setup.

But yeah - absolutely definitely not 4.17 content.

~Andrew

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.