[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Xen on AWS EC2 Graviton 2 metal instances (c6g.metal)



Hi Dan,

Thanks for the report.

On 26/09/2023 20:41, Driscoll, Dan wrote:
        First off - sorry for the very long email, but there are a lot of 
details related to this topic and I figured more details might be better than 
less but I could be wrong here....

        Within Siemens Embedded, we have been doing some prototyping using Xen 
for some upcoming customer related work - this email thread attempts to explain 
what has been done here and our analysis of the problems we are having.

        We have done some initial prototyping to get Xen running on an AWS Graviton 2 
instance using an EC2 Arm64 "metal" instance (c6g.metal - no AWS hypervisor) 
and ran into some problems during this prototyping.

        Since the Edge Workload Abstraction and Orchestration Layer (EWAOL) 
that is part of SOAFEE already has some enablement of Xen in various 
environments (including an Arm64 server environment), we used this as a 
starting point.

        We were able to successfully bring up Xen and a Yocto dom0 and multiple 
domu Yocto guests on an Arm AVA server (AVA Developer Platform - 32 core 
Neoverse N1 server) following documented steps with some minimal configuration 
changes (we simply extended the configuration to include 3 Linux guests): 
https://ewaol.docs.arm.com/en/kirkstone-dev/manual/build_system.html#build-system

        So, this specific EWAOL support has all the proper bitbake layers to 
generate images for both bare-metal (Linux running natively) and a 
virtualization build (using Xen) for AVA and also a Neoverse N1 System 
Development Platform (N1SDP), but we only verified this on AVA.
c6g.medium
        AWS also has support for EWAOL on Graviton 2, but the only supported 
configuration is a bare-metal configuration (Linux running natively) and the 
virtualization build hasn't been implemented in the bitbake layers in their 
repo - here is the URL for information / instructions on this support: 
https://github.com/aws4embeddedlinux/meta-aws-ewaol
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/grub.html
        As part of our effort to bring this up, we did a VERY minimal patch to 
the repo used for the AWS EWAOL to generate a virtualization build (attached 
meta-aws-ewaol.patch).  The resultant build of the AWS EWAOL support with this 
patch applied does result in Xen being built as well as a dom0 Yocto kernel, 
but there is definitely missing support to properly build everything for this 
virtualization layer.  Following the instructions for meta-aws-ewaol,  we 
generated an AMI and started an EC2 instance with this AMI (c6g.metal type).  
The resultant image does boot, but it boots into the dom0 Linux kernel with 
problems recorded in the boot log related to Xen (see dom0-linux-boot.txt).

        Looking more closely at the EFI partition, it was clear that systemd-boot was 
being used and it was set-up to boot the dom0 Linux kernel and not boot into Xen - the 
Xen EFI images were not present in the EFI partition and obviously no launch entries 
existed for Xen.  To rectify this, the Xen EFI image that were built as part of the AWS 
EWAOL build mentioned above where placed in the EFI partition, along with a Xen config 
file that provided the dom0 Linux kernel image details.  A new entry was added into the 
EFI image for Xen and the launch conf file was updated to boot Xen instead of dom0 Linux. 
 This resulted in the EC2 instance becoming "bricked" and no longer accessible.
Details on the EFI related content and changes we made are captured in the meta-aws-ewaol-efi-boot-changes.txt file attached above. The next step was comparing the AVA Xen output that was working and we noticed a few differences - the AVA build did enable ACPI and UNSUPPORTED kconfig settings whereas the AWS Xen build did not. So, we tried again to bring up another EC2 metal instance using the same AMI as before and utilized the AVA Xen EFI image instead and same Xen config file. The result was the same - a "bricked" instance. We will likely try to use the entire AVA flow on AWS Graviton next as it is using GRUB 2 instead of systemd-boot and we hope to maybe extend or enable some of the debug output during boot. The AWS EC2 instances have a "serial console", but we have yet to see any output on this console prior to Linux boot logs - no success in getting EC2 serial output during EFI booting.

That's interesting. The documentation for AWS [1] suggests that the logs from boot should be seen. They even have a page for troubleshooting using GRUB [2].

I just launched a c6g.metal and I could access the serial console but then it didn't work across reboot.

I have tried a c6g.medium and the serial was working across reboot (I could see some logs). So I wonder whether the serial console is there is a missing configuration for baremetal?

We have had a call and some email exchanges with AWS on this topic (Luke Harvey, Jeremy Dahan, Robert DeOliveira, and Azim Siddique) and they said there have been multiple virtualization solutions successfully booted on Graviton 2 metal instances, so they felt that Xen should be useable once we figured out configuration / boot details. The provided some guidance how we might go about some more exploration here, but nothing really specific to supporting Xen.

To be honest, without a properly working serial console, it is going to be very difficult to debug any issue in Xen.

Right now, it is unclear whether Xen has output anything. If we can confirm the serial console has intended and then are still no logs, then I would suggest to enable earlyprintk in Xen. For your Graviton2, I think the following lines in xen/.config should do the trick:

CONFIG_DEBUG=y
CONFIG_EARLY_UART_CHOICE_PL011=y
CONFIG_EARLY_UART_PL011=y
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_UART_BASE_ADDRESS=0x83e00000
CONFIG_EARLY_UART_PL011_BAUD_RATE=115200

I have attached the following files for reference: * meta-aws-ewaol.patch - patch to AWS EWAOL repo found at https://github.com/aws4embeddedlinux/meta-aws-ewaol
        * meta-aws-ewaol-efi-boot-changes.txt - Description of EFI related 
changes made to AWS EWAOL EFI partition in attempt to boot Xen
        * ava.xen.config - config file for Xen build for AVA using EWAOL 
virtualization build
        * aws.xen.config - config file for Xen build for AWS using EWAOL 
virtualization build
        * xen-4.16.1.cfg - Xen config file placed in root of EFI boot partition 
alongside xen-4.16.1.efi image

May I ask why you are using 4.16.1 rather than 4.17? In general I would recommend to use the latest stable version or even a staging (the on-going development branch) for bring-up because we don't always backport everything to stable branch. So a bug may have been fixed in newer revision.

That said, skimming through the logs, I couldn't spot any patches that may help on Graviton 2.

Best regards,

[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-serial-console.html
[2] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/grub.html


Dan Driscoll
Distinguished Engineer
Siemens DISW - Embedded Platform Solutions

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.