[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XTF-on-ARM: Bugs


  • To: Julien Grall <julien@xxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Michal Orzel <Michal.Orzel@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Christopher Clark <christopher.w.clark@xxxxxxxxx>, Daniel Smith <dpsmith@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxx>
  • From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Date: Tue, 21 Jun 2022 13:30:12 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=pHMzBsE7t20XUGkgBC8VAO2Azuil1gGzLGSC9BTCGpQ=; b=M6sxR0b+GYc3EZnHiO5bjDra1tssy4a1b5H+rRZQEQg1uNXMyodMGIAvi6PlZoxXEj9ZmY8EIGWCYJuTEuQyTDRfYBvdK+O6bz958EPiem/3f9isp98E2JChL654DjeI426VyWli9v/YejLgnUjpf3sj0KkkX1SWJ+wQlAYsmQF7ZR8W32L1OfqaLNhVtXkfFv9AOzSoNQEvSRy1XDve/kpql4BlR8NrIgdFyLxWmIiMPbxpMErYPaS3TlinniFSOKhRf9tvsEfqy8QgL/qGHDv3SIhsgZylXPoOUzHt/rYoDLlSjWKvzp9dk5wOq+H9FvJTgf8ZqbCQlFoiksA5uA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cuHk1ld4zb0dQ5EgrsI72EzRTpM2YaMPFTWdMenm3y7qXVLmI84AQThAPDm6PPpgHsJt4auy7fA1g50Od+4dvJf4OoZmH/Tc37Z/Qy58cOFiDDRKZGacPiDn9Mhh1foyVFjxemSzLRTbr0D2qR0D4hIf3YI4XhyEKdU7Sgt4Z2lQz6XlI4wzlUVrDIMkDn8UKXdt/S6EuY3X5Ir7PPmZMNR/CH88Sd2uB35HUzZo6Fe4U/n/F/koZBXpdqjgmmgxuHPR5wKapgv6Ua/lCADC0GIeZVWo63ruSDepPxoKEfiZGIPI237wzWH2QxgMDOcm7HTWaWWh2YlzZVa8oIH3hQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Delivery-date: Tue, 21 Jun 2022 13:30:44 +0000
  • Ironport-data: A9a23:5/Bbf6qIlbvOex1Ey6pJmfAx+1heBmKVZBIvgKrLsJaIsI4StFCzt garIBmCM/iLM2WnLotzO4i/pE4P68fcyodiQAdo/npnEXxE+ZuZCYyVIHmrMnLJJKUvbq7GA +byyDXkBJppJpMJjk71atANlVEliefQAOCU5NfsYkidfyc9IMsaoU8lyrRRbrJA24DjWVvT4 4qq+qUzBXf+s9JKGjNMg068gEsHUMTa4Fv0aXRnOJinFHeH/5UkJMp3yZOZdhMUcaENdgKOf M7RzanRw4/s10xF5uVJMFrMWhZirrb6ZWBig5fNMkSoqkAqSicais7XOBeAAKv+Zvrgc91Zk b1wWZKMpQgBL/HPyboFAh9jIwJhNvV01PzHCme8iJnGp6HGWyOEL/RGKmgTZNdd1sMpRGZE+ LofNSwHaQ2Fi6Su2rWnR+Jwh8Mlas72IIcYvXImxjbcZRokacmbH+OWupkFjXFp2Zom8fX2P qL1bRJGahjabgIJEVAQEJ8kx8+jh2Xlci0eo1WQzUYyyzeInVUggOO3WDbTUuyQHvl/h1ikn T/P2kfnAgABNNWx7xPQpxpAgceKx0sXQrk6FqC89/NsqE2ewCoUEhJ+fUu2p7y1h1CzX/pbK lcI4Ww+oK4q7kupQ9LhGRqirxaslBMGR8BZFeF8zQiX07fV+C6QHG1CRTlEAPQDtcQ2TDhs8 UWbktfBDCZq9raSTBq17ayIpDm/PSwUK24qZiIeSwYBpd75r+kbsBXLSdpyFb+vuff8Ezrw3 jOioTA3gvMYistj/66751HcnzW0ppXTCBFz7QHeRGGN4QZwZYrjbIutgXDX9e1FLZqZZlCZs WIYhtOF6+QTEZCKkjfLS+IIdIxF/N6AOTzYxFJqQZ8o8m33/2b5JN8KpjZjOE1uL8AIPyfzZ 1Pesh9Q45kVO2a2aahwYMS6DMFCIbXcKOkJn8v8NrJmCqWdvifelM2yTSZ8B1zQrXU=
  • Ironport-hdrordr: A9a23:JZ7tX6OFVOJMQcBcT2P155DYdb4zR+YMi2TDiHoddfUFSKalfp 6V98jzjSWE8Ar4wBkb6J290dq7MAjhHPlOkMUs1NaZLUPbUQ6TQL2KgrGSpwEIdxeeygcZ79 YYT0EcMqy+MbEZt7ec3ODQKb9Jr7e6GeKT9IHjJhxWPGJXgtRbnmJE43GgYy9LrWd9ZaYRJd 653I5qtjCgcXMYYoCQHX8eRdXOoNXNidbPfQMGLwRP0njOsRqYrJrBVzSI1BYXVD1ChZ0493 LergD/7qK/99mm1x7n0XPJ5Zg+oqqg9jIDPr3OtiEmEESotu+aXvUkZ1REhkFznAib0idprD ALmWZnAy080QKJQoj/m2qW5+Cp6kdS15al8y7XvZKrm72HeNpxYfAx+b5xY1/X7VEts8p717 8O12WFt4BPBReFhyjl4cPUPisa33ZcjEBS5tL7tUYvJ7f2qYUh3rA37QdQCtMNDSj64IcoHK 1nC9zd/u9fdRefY2rCtmdizdSwVjBrdy32CXQqq4iQyXxbjXp5x0wXyIgWmWoB7os0T91B6/ 7fOqplmblSRosdbL57Bu0GXcyrY1a9CS7kISaXOxDqBasHM3XCp9r+56g0/vijfNgSwJ47iP 36ISdlXK4JCjfT4OG1rex2G0r2MRuAtBzWu7Fjzok8vKHgT7z2NiDGQEwykqKb0ociPvE=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYhWH1+fjNtvukV0adEqGEdTB7sq1ZxCsAgAAXIgA=
  • Thread-topic: XTF-on-ARM: Bugs

On 21/06/2022 13:07, Julien Grall wrote:
>
>
> On 21/06/2022 12:27, Andrew Cooper wrote:
>> Hello,
>
> Hi,
>
>> I tried to have a half hour respite from security and push forward
>> with XTF-on-ARM, but the result was a mess.
>>
>> https://github.com/andyhhp/xtf/commit/bc86e2d271f2107da9b1c9bc55a050dbdf07c6c6
>> is the absolute bare minimum stub VM, which has a zImage{32,64}
>> header, sets up the stack, makes one CONSOLEIO_write hypercall, and
>> then a clean SCHEDOP_shutdown.
>>
>> There are some bugs:
>>
>> 1) kernel_zimage32_probe() rejects relocatable binaries, but if I
>> skip the check it works fine.
>
> Hmmmm... which check are you referring to?

if ( (end - start) > size )
    return -EINVAL;

Although now I think about it, the problem is subtly different.

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg
Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00     
0   0  0
  [ 1] .text             PROGBITS        40000000 010000 000094 00  AX 
0   0  4
  [ 2] .data             PROGBITS        40001000 011000 000000 00  WA 
0   0  1
  [ 3] .rodata           PROGBITS        40001000 011000 000012 00   A 
0   0  4
  [ 4] .bss              NOBITS          40002000 011012 001000 00  WA 
0   0  4

end is calculated as 0x3000 which includes the bss (inc stack which is
bss page aligned), while the raw binary size is 0x1012 because it stops
at the end of .rodata.

>
>>
>> Furthermore, kernel_zimage64_probe() ignores the header and assumes
>> the binary is relocatable.
>
> Are you referring to bit 3 "Kernel physical placement"?

No. This:

/* Currently there is no length in the header, so just use the size */
start = 0;
end = size;

Which isn't true even for the v0 header.  The field named text_offset in
Xen's code is start, and res1 is end (or size for relocatable).

Kernel placement only pertains to whether the image should be 2M aligned
or not.

>
>> Both probe functions fail to check the endianness marker.
>
> AFAIU the header is little endian. So it is not clear to me why we
> should check the endianess marker?

Not the endieness of the header, the endianness of the image.  Both
headers have a field which should ought to be checked for != LE seeing
as Xen doesn't support big endian domains yet.

>
>>
>> 2) I'm using qemu-system-arm 4.2.1 (Debian 1:4.2-3ubuntu6.21), with
>> some parameters cribbed from the Gitlab CI smoke test, but
>> ctxt_switch_to() exploded with undef on:
>>
>> WRITE_CP32(n->arch.joscr, JOSCR);
>> WRITE_CP32(n->arch.jmcr, JMCR);
>>
>> I'm not sure what these are (beyond Jazelle conf register), but I
>> commented them out and it made further progress.  I have no idea if
>> this is a Xen bug, qemu bug, or user error, but something is clearly
>> wrong here.
>
> I suspect the QEMU version is a bit too old to support 32-bit
> virtualization. Can you try a newer one?

Not easily right now, but (other than these two issues), it appears to
work fine.  The 32bit XTF guest is executing correctly (to a first
approximation) based on the fact that it does make the two expected
hypercalls.

At a guess, it's
https://patchwork.ozlabs.org/project/qemu-devel/patch/20191216110904.30815-25-peter.maydell@xxxxxxxxxx/

>
>>
>> 3) For test-arm64-stub, I get this:
>>
>> (XEN) d0: extended region 1: 0x70000000->0x80000000
>> (XEN) Loading zImage from 0000000048000000 to
>> 0000000050000000-0000000050001012
>> (XEN) Loading d0 DTB to 0x0000000058000000-0x0000000058001c85
>> ...
>> (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch
>> input)
>> (XEN) Freed 324kB init memory.
>> (XEN) *** Got CONSOLEIO_write (18 bytes)
>> Hello from ARM64
>> (XEN) *** CONSOLEIO_write done
>> (XEN) arch/arm/traps.c:2054:d0v0 HSR=0x000000939f0045
>> pc=0x00000050000098 gva=0x80002ffc gpa=0x00000080002ffc
>
> Looking at the log above, GPA belong to neither the kernel, extended
> region or DTB.
>
>> qemu-system-aarch64: terminating on signal 2
>>
>> i.e. the CONSOLEIO_write hypercall completes successfully, but a trap
>> occurs before the SCHEDOP_shutdown completes.  The full (tiny)
>> binaries are attached, but it seems to be faulting on:
>>
>>      40000098:    b81fcc3f     str    wzr, [x1, #-4]!
>>
>> which (I think) is the store of 0 to the stack for the schedop
>> shutdown reason.
>
> AFAICT the stack is meant to be right next after the kernel. However,
> the fault above suggest that the value is not even close.
>
>>
>> 4) For test-arm32-stub under either the 32bit or 64bit Xen, I get:
>>
>> (XEN) Freed 348kB init memory.
>> (XEN) *** Got CONSOLEIO_write (18 bytes)
>> (XEN) *** got fault
>> (XEN) *** Got SCHEDOP_shutdown, 0
>
> Where are those messages coming from?

My debugging, so I can see what's going on.  The "got fault" was
copy_from_user() EFAULT path.

>
>> (XEN) Hardware Dom0 halted: halting machine
>>
>> which is weird.  The CONSOLEIO_write fails to read the passed
>> pointer, despite appearing to have a ip-relative load to find the
>> string, while the SCHEDOP_shutdown passes its parameter fine (it's a
>> stack relative load).
>
> From a brief look, your code is still running with MMU off and Cache
> "off" (on armv8, it is more a bypass "cache" rather than off).
>
> This means that you ought to be a lot more careful when
> reading/writing value to avoid reading any stall data.

There are no relocation/etc so everything has well defined behaviour
even when the caches are off.

This is the point of the minimum viable binary.  It's to have something
trivial we can develop the build system around.

>
>> Other observations:
>>
>> * There is no documented vCPU starting state.
>
> See
> https://github.com/torvalds/linux/blob/master/Documentation/arm64/booting.rst.

What's it got to do with Xen's vCPU starting state?  Also, that's
clearly not relevant for arm32 even if the implication is "Xen only
speaks the Linux ABI".

It needs to be in docs/ (or public at a stretch) and not in the heads of
the maintainers.

>
>> * Qemu is infinitely easier to to use (i.e. no messing with dtb/etc)
>> as -kernel xen -initrd test-$foo with a oneliner change to the dtb
>> parsing to treat ramdisk and no kernel as the dom0 kernel.  Maybe a
>> better change would be to modify qemu to understand multiple -kernel's.
>> * Xen can't load ELFs.
>
> The support was dropped in 2018 because it was bogus and not used:
>
> https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00242.html
>
>
> Personally, I think that zImage/Image is simple enough that
> re-introducing ELF is not worth it. But I would be OK to consider
> patches if you feel like writing them.

There is a massive usability improvement from being able to point normal
toolchain tools at the same binary you're trying to load.

~Andrew

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.