[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [OSSTEST PATCH] README.hardware-acquisition [and 1 more messages]
Ian Jackson writes ("Re: [OSSTEST PATCH] README.hardware-acquisition [and 1 more messages]"): > So overall, for the reasons I explain, I'm going to commit this > document (subject to the other comments etc.) *with* the requirement > that hardware must be supported by Debian (at least, in -backports). This didn't happen. THere was considerable further discussion. The fact that various kinds of uncertainty meant this document didn't get committed is now blocking us giving the go-ahead for some new hardware acquisition: Ie, I can't answer the question "should we accept hardware XYZ" without reference to at least an implied a checklist like this. Having written it down I ought to use the one I've written down, because to do otherwise is simply to pointlessly invite mistakes. And if I'm to use a written-down checklist it should be one which is actually official. Accordingly, I intend to commit this to osstest now. Juergen, this is just a document: can I have your release ack for it ? I will then reply separately about the specific new hardware, using the checklist as a guide. Obviously a checklist is always a guidelines document: if we find that a point is best answered a different way than the checklist expects, or that the checklist ought to be changed, then changes to the checklist are a reasonable part of the outcome of such a process; that would be in the form of further patches to this document in osstest. Ian. From fae48bd584a0b58934a2df97b6db1d06eacf1724 Mon Sep 17 00:00:00 2001 From: Ian Jackson <ian.jackson@xxxxxxxxxxxxx> Date: Tue, 30 Oct 2018 16:12:27 +0000 Subject: [OSSTEST PATCH] README.hardware-acquisition New document-cum-checklist, for helping with hardware procurement. Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx> CC: infra@xxxxxxxxxxxxxx CC: George Dunlap <dunlapg@xxxxxxxxx> CC: Stefano Stabellini <sstabellini@xxxxxxxxxx> CC: Julien Grall <julien.grall@xxxxxxx -- v2: Add caveats about the Xen ARM Linux branch Say something, albeit rather vague, about device trees --- README.hardware-acquisition | 317 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 317 insertions(+) create mode 100644 README.hardware-acquisition diff --git a/README.hardware-acquisition b/README.hardware-acquisition new file mode 100644 index 00000000..0a429db3 --- /dev/null +++ b/README.hardware-acquisition @@ -0,0 +1,317 @@ +==================================== +# HARDWARE ACQUISITION FOR OSSTEST # +==================================== + +This document can be used as a checklist when procuring hardware for +an osstest instance. A few of the points have details specific to the +Xen Project test lab in Massachusetts, but most of it will be relevant +to all osstest installations. + + +POWER +===== + +osstest needs to turn each host on and off under program control. + +When a host is power cycled, all state in it must be reset. This +includes onboard control and management software (eg IPMI), since such +systems can be buggy and bugs in them can be provoked by bugs in +system software (ie, buggy versions of Xen can break the LOM, even if +the LOM, unusually, is not simply flaky). + +However, it is often necessary to use the LOM (Lights Out Management) +as part of the poweron/poweroff sequence as otherwise some machines +draw enough current to wear out our mains PDU contacts too quickly. + +(I use the English word `mains' for the single phase 110V/220V-240V AC +electrical power supply prevalent in datacentres.) + +Requirements for typical server hardware +---------------------------------------- + + * If the system has a LOM it should be driveable with Free Software, + eg via the IPMI protocol. + + * Redundant PSUs are not required. + + * Provisioning: One PDU port is required per host. + +Requirements for embedded or devboard hardware +---------------------------------------------- + + * There must be arrangements to control the actual power supply + to each board (node). Options include: + + (i) Each node has a separate mains power supply, each of which + we will plug into a PDU port. + + (ii) A separate management or PDU board or backplane, which + has one single mains power input and which has relays + or similar to control power to individual nodes. + The management system must have its own separate network + connection and not be at risk of corruption from + bad software on nodes. + + * Provisioning: + + Number of PDU ports required depends on the approach taken. + + With a separate PDU controller, a switch port is required. + + +SERIAL +====== + +We always use hardware serial for console output. This is essential +to capture kernel and hypervisor crash messages, including from early +boot; as well as bootloader output, and so on. We use our own serial +concentrator hardware, separate from the systems under test. Built-in +console-over-LAN systems (eg IPMI serial over LAN) are not reliable +enough for our purposes. + +Requirements for typical server hardware +---------------------------------------- + + * At least one conventional RS232 UART, accessible to system + software in the conventional way. + + * For ARM, supported as console by both Xen[1] and Linux[2]. + + * Presented on a standard 9-pin D connector. (RJ45 is acceptable + if we know the pinout.) + + * Provisioning: one serial concentrator port required per host. + +Requirements for a embedded or devboard hardware +------------------------------------------------ + + * At least one suitable UART + + * Supported in software by both Xen[1] and Linux[2] + + * With suitable physical presentation: + (i) + + Proper RS232 (full voltage, not TTL or 3.3V) + + presented on a 9-pin D or RJ45 connector + + with known pinout; + or + (ii) + + Connected somehow to a USB-to-serial adapter + + Adapter supported by Linux[2] + + Multiple adapters, giving one physical USB port + for all nodes (ie built-in hub) preferred + or + (iii) Some other suitable arrangement to be discussed. + + * Provisioning: Requires serial concentrator port(s) and/or spare USB + port(s) on appropriate infrastructure host(s). + + +PHYSICAL PRESENTATION +===================== + + * All equipment should be mounted inside one or more 19" rack + mount cases. + + * In as few U as possible: usually 1U (or, exceptionally, maybe 2U) + for a single server-type host. + + * Forbidden: External power adapters (laptop-style mains power supply + bricks); external USB hubs; any equipment not physically + restrained. There is no shelf in the rack. + + * Pair principle: Every host or node must be part of a set of several + identical hosts. This allows us to distinguish hardware faults + from software bugs. (In the cases of chassis with backplane, one + backplane is OK.) Conversely, we want diversity to find the most + host-specific bugs, so usually around two of each type is best. + + * Provisioning: Enough rack space must be available. + + +MASS STORAGE +============ + +Each host needs some locally attached mass storage of its own. + +Requirements for typical server hardware +---------------------------------------- + + * SATA controller supported by Linux[2] + + * If SATA controller has multiple modes (eg, AHCI vs RAID) + it is sufficient for it to be supported in one mode. + + * Storage redundancy is not required: one disk will do. + + * SSD is not required: rotating rust is cheaper and will do. + +Requirements for embedded or devboard hardware +---------------------------------------------- + + * Some mass storage supported by Linux[2]. Best is an onboard SATA + controller, connected to a SATA HDD in the same enclosure. + High-endurance flash drives are another possibility. + + * If the hardware always starts by boot from a mass storage device, + that boot device must be physically read-only and separate from the + primary mass storage. See BOOT ARRANGEMENTS. + + +REMOTE FIRMWARE ACCESS VIA SERIAL +================================= + +Configuration of the primary system firmware must be possible remotely +using only the power and serial accesses just described. +Specifically, interaction with the firmware via the serial port. + +Requirements for typical server hardware with UEFI or BIOS +---------------------------------------------------------- + + * `BIOS' configuration (including the UEFI equivalent) accessible and + useable via BIOS `serial console redirection'. + + * UEFI shell (if provided) also available via serial. + + * Specifically, boot order configuration available via serial. + + +Requirements for embedded or devboard hardware +---------------------------------------------- + + * See BOOT ARRANGEMENTS. + + +BOOT ARRANGEMENTS, NETBOOT +========================== + +Every host must netboot as its first boot source. The netboot +configuration must be able to `chain' to the local writeable mass +storage. This ensures that a host can be completely wiped, even if +bad software has corrupted the mass storage. + +Requirements for typical server hardware with UEFI or BIOS +---------------------------------------------------------- + + * PXE and/or UEFI netboot. + +Requirements for embedded or devboard hardware +---------------------------------------------- + + * Some firmware must be available and provided which is capable of + netbooting Xen[1] and Linux[2], under control from the netboot + server. A suitable version of u-boot can meet this need. + + * The firmware which performs the netbooting must be on a read-only + storage device (flagged as such in hardware, not software) so that + it cannot be corrupted by system software. So it must be on a + separate physical storage device to the primary mass storage (see + MASS STORAGE, above). + + * This firmware will not usually be updated. + + +NETWORKING +========== + +Requirements +------------ + + * Each host must have at least one RJ45 ethernet port compatible + with ordinary 100Mbit ethernet. xxx + + * The primary ethernet port must be compatible with Linux[2]. + + * In the case of a chassis with backplane, it is acceptable if the + chassis contains an ethernet switch, provided that it is a normal + and reliable ethernet switch (not a proprietary interconnect). + + * In the case of a system with IPMI or similar LOM, it is best if the + LOM has its own physical ethernet port. + + +CPU, CHIPSET, MOTHERBOARD, ETC. +=============================== + +General advice and preferences +------------------------------ + + * We prefer multicore, multisocket and NUMA systems because they + expose a greater variety of exciting bugs. But we don't care much + about performance and we want a wide variety of different hosts. + We want a mixture of systems with different CPU variants and + feature support. + + * Memory requirements are modest. 8G or 16G per host is fine. xxx + +Compatibility with Xen and Linux - requirements +----------------------------------------------- + +(Normally these issues are not a problem for x86, except perhaps for +the network and storage controllers - see MASS STORAGE and NETWORKING, +above.) + + * [1] Xen: The CPU and other hardware must be supported by current + versions of xen-unstable, at the very least. + + * [2] Linux: The CPU and other hardware must be supported by existing + widely available versions of Linux. There are two principal + requirements: + + + Baremetal boot from Debian stable or stable-backports: + + A suitable Linux kernel binary which can boot baremetal on the + proposed hardware must be available from Debian (at least + `stable', or, if that is not possible `stable-backports'). It is + not OK to require a patched version of Linux, or a version of + Linux built from a particular git branch, or some such. If the + required kernel is not available in Debian, the vendor should + first work with the Debian project to ensure and validate that + the Debian stable-backports kernel binaries boot on the proposed + hardware. + + + Boot under Xen with Linux kernel built from source code. + + For x86, recent Linux LTS or mainline kernel source code must be + able to boot under Xen, on the proposed hardware. + + For ARM, there is a special Xen ARM kernel branch. The proposed + hardware must be able to boot that version of Linux under Xen. + + If the Xen ARM Linux branch does not support the proposed + hardware yet, the hardware should not be accepted until that is + remedied. Where this involves adding kernel patches to that + branch this is subject to the approval of its maintainers, + considering the need to keep it very close to upstream. + + * Board-specific Linux and Xen versions are not acceptable. + + * Hardware vendor offering a "board support package" is a red flag. + We will not be using a "board support package". If we are offered + one we will need explicit confirmation, and perhaps verification, + of the points above. + + * For ARM systems using Device Tree: check what DT is expected to be + used, and where and how we are expecting osstest to get it from. + + +RELIABILITY +=========== + + * osstest stresses systems in unusual ways. The need to completely + wipe the machine for each test means test hosts are power cycled + more often than usual. + + * Random failures due to unreliable hardware are not tolerable. Some + hosts do not boot reliably. Even a very small probability of a + random boot failure, per boot, is intolerable in this CI + environment: hosts are rebooted many times a day, and a random boot + failure looks just like a `hypervisor could not boot' bug. (The + same bug would not be noticeable in a server farm where hosts are + nearly never rebooted.) + + +NON-REQUIREMENTS +================ + + * No VGA console needed. + * Redundant PSUs are not needed (see POWER, above). + * RAID is not needed (or wanted) (see MASS STORAGE, above). -- 2.11.0 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |