[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xen-unstable-smoke test] 173492: regressions - FAIL



Hi Andrew,

On 12/10/2022 11:29, Andrew Cooper wrote:
On 12/10/2022 11:01, Julien Grall wrote:
(+ Bertrand & Stefano)

Hi Henry,

On 12/10/2022 07:39, Henry Wang wrote:
-----Original Message-----
Subject: Re: [xen-unstable-smoke test] 173492: regressions - FAIL

On 11.10.2022 18:29, osstest service owner wrote:
flight 173492 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/173492/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
   test-arm64-arm64-xl-xsm      14 guest-start              fail
REGR. vs. 173457

Parsing config from /etc/xen/debian.guest.osstest.cfg
libxl: debug: libxl_create.c:2079:do_domain_create: ao 0xaaaacaccf680:
create: how=(nil) callback=(nil) poller=0xaaaacaccefd0
libxl: detail: libxl_create.c:661:libxl__domain_make: passthrough:
disabled
libxl: debug: libxl_arm.c:148:libxl__arch_domain_prepare_config:
Configure
the domain
libxl: debug: libxl_arm.c:151:libxl__arch_domain_prepare_config:  -
Allocate
0 SPIs
libxl: error: libxl_create.c:709:libxl__domain_make: domain creation
fail: No
such file or directory

So this is -ENOENT which could be returned by the P2M is it can't
allocate a page table (see p2m_set_entry()).

libxl: error: libxl_create.c:1294:initiate_domain_create: cannot
make domain:
-3

Later flights don't fail here anymore, though.

   test-armhf-armhf-xl          14 guest-start              fail
REGR. vs. 173457

Similar log contents here, but later flights continue to fail the
same way.

I'm afraid I can't draw conclusions from this; I haven't been able
to spot
anything helpful in the hypervisor logs. My best guess right now is
the use
of some uninitialized memory, which just happened to go fine in the
later
flights for 64-bit.

It looks like the smoke flight failed on laxton0 but passed on
rochester{0, 1}. The former is using GICv2 whilst the latter are using
GICv3.

In the case of GICv2, we will create a P2M mapping when the domain is
created. This is not necessary in the GICv3.

IIRC the P2M pool is only populated later on (we don't add a few pages
like on x86). So I am guessing this is why we are seen failure.

If that's correct, then this is a complete oversight from me (I
haven't done any GICv2 testing) while reviewing the series.

The easy way to solve it would be to add a few pages in the pool when
the domain is created. I don't like it, but I think there other
possible solutions would require more work as we would need to delay
the mappings.

Honestly, I've considered doing this on x86 too.

AFAICT, this is already the case for HAP (see call to hap_set_allocation() in hap_enable()). 256 pages will be pre-allocated.


There are several things which want allocating in domain_create(), but
are deferred to max_vcpus() because they require the P2M having a
non-zero allocation.  This in turn means we've got a load of checks in
paths where we'd ideally not have them.

We already have a calculation of the absolutely minimum we will ever
permit the p2m pool to be.  IMO we ought to allocate this minimum size
in domain_create().

It depends on the number. At the moment domain_create() is not preemptible, so we don't want to allocate too many pages (I think even 256 pages could be risky on some Arm platform).

Maybe the solution is to have domain_create() preemptible. But it is not something that could be done in the 4.17 time frame.

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.