[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sketch of an idea for handling the "mixed workload" problem



On Mon, Jan 22, 2024 at 12:50 PM Marek Marczykowski-Górecki
<marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Jan 22, 2024 at 12:25:58PM +0000, George Dunlap wrote:
> > On Mon, Jan 22, 2024 at 12:17 PM Marek Marczykowski-Górecki
> > <marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Mon, Jan 22, 2024 at 11:54:14AM +0000, George Dunlap wrote:
> > > > The other issue I have with this (and essentially where I got stuck
> > > > developing credit2 in the first place) is testing: how do you ensure
> > > > that it has the properties that you expect?
> > >
> > > Audio is actually quite nice use case at this, since it's quite
> > > sensitive for scheduling jitter. I think even a simple "PCI passthrough a
> > > sound card and play/record something" should show results. Especially
> > > you can measure how hard you can push the system (for example artificial
> > > load in other domains) until it breaks.
> >
> > Are we going have a gitlab runner which says, "Marek sits in front of
> > his test machine and listens to audio for pops"? :-)
>
> Kinda ;)
> We have already audio tests in qubes CI. They do more or less the above,
> but using our audio virtualization. Play something, record in another
> domain, and compare. Running the very same thing in gitlab-ci may be too
> complicated (require bringing in some qubes infrastructure to make PV
> audio work), but maybe similar test can be done based on qemu-emulated
> audio or other pv audio solution?
>
> > > > How do you develop a
> > > > "regression test" to make sure that server-based workloads don't have
> > > > issues in this sort of case?
> > >
> > > For this I believe there are several benchmarking methods already,
> > > starting with old trusty "Linux kernel build time".
> >
> > First of all, AFAICT "Linux kernel bulid time" is not representative
> > of almost any actual server workload; and the end-to-end throughput
> > completely misses what most server workloads will actually care about,
> > like latency.
> >
> > Secondly, what you're testing isn't the performance of a single
> > workload on an empty system; you're testing how workloads *interact*.
> > If you want ideal throughput for a single workload on an empty system,
> > use the null scheduler; more complex schedulers are only necessary
> > when multiple different workloads interact.
>
> I should have clarified I meant `make -jNN`. But still, that's the same
> workload on multiple vCPUs.

See, you're still not getting it. :-)

What you need is not multiple vcpus across a single VM, but multiple
instances of different workloads across different VMs.  For example:

1. One VM running kernbench
2. two VMs running kernbench, but not competing for vcpu
3. four VMs running kernbench, competing for vcpus
4. three VMs running kernbench, and one playing audio
5. four VMs running kernbench, one of which is *also* playing audio

And then you have to collect several metrics:

1. Total kernbench throughput of entire system
2. Kernbench performance of each VM, compared with expected "fair share"
3. Some measure of latency for the audio VM

And figure out how to compare trade-offs -- how much total throughput
hit should we tolerate to increase fairness?  How much fairness hit
should we take to decrease latency?

And as I said, kernbench isn't really a great server workload; you
should do something request-based, measuring both throughput and
latency.

 -George



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.