[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] expose MWAIT to dom0



> From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx]
> Sent: Thursday, August 25, 2011 8:37 PM
> 
> >>> On 21.08.11 at 07:26, "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> >>  From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx]
> >> Sent: Friday, August 19, 2011 11:02 PM
> >> > >> Yet another idea - why don't we simply pass the buffer passed to
> >> > >> arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the
> >> > >> bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH
> >> > >> (which I don't think Xen really supports at present).
> >> > >>
> >> > >> Or really, depending on who controls what, the P, C, and T bits should
> >> > >> be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently
> >> > >> does, and then let Xen override the bits it ought to control).
> >> > >
> >> > > _PDC is encoded in AML language, and requires an ACPI parser which
> >> > > is one thing we avoid in Xen. If Xen want to override those bits, then
> >> > > whole ACPI component needs move down to Xen too.
> >> >
> >> > No, I'm not saying the evaluation should be happening there. Below is
> >> > a draft hypervisor patch (only compile tested so far).
> >>
> >> Attached a patch that actually works (with a minimal Dom0 addition).
> >>
> >
> > yes, this change looks more straightforward. :-)
> 
> With that in, we still have more deficiencies compared to native Linux.

definitely there'll be even more than what's revealed today, due to the
way that dom0 ACPI processor driver is tightly bound. there're lots of
factors in dom0 itself which may impact the verification/filtering on
Cx entries provide by BIOS, while some of which should be avoided from
Xen p.o.v, such as the 2nd example you just found. The more severe is
that to work around those factors adds intrusive Xen awareness into
generic ACPI processor driver, e.g. 

@@ -780,7 +780,7 @@ static int acpi_processor_get_power_info
                          current_count));
 
        /* Validate number of power states discovered */
-       if (current_count < 2)
+       if (current_count < 1 + !processor_pm_external())
                status = -EFAULT;
 
       end:

More changes like above are added, less possibilities for Xen PM
changes to be accepted into upstream. Also such specific changes
made on one dom0 version may be invalid in a new version quickly.
Above change is one example which doesn't hold true in newer
kernel. 

When working with Konrad on rebasing xen PM patches to latest
Linux 3.0.0. we tried hard to avoid intrusive changes in generic
ACPI processor driver, by trying to invoke existing interfaces in
higher level as possible. The end result is that we skip handling
those corner cases like above example for now, by at least making
Xen PM working on majority boxes. Later after Xen PM is accepted
upstream with more Xen awareness in Linux ACPI people, those
corner cases handling may be improved gradually.
 
Another option Yang currently is working on is to port native intel-idle
driver to Xen, which should avoid nasty dependency on dom0 ACPI
bits and immune to various BIOS bugs.

> 
> For one, we don't use mwait when ACPI doesn't tell us to, while Linux
> does (in the intel_idle driver for deeper C-states, and for C1 also via
> mwait_idle()). This is likely a bit more work, but it should be possible to
> construct C-state information from CPUID leaf 5 (and, if valid, ignore
> information passed down from Dom0), which would match intel_idle's
> taking precedence over acpi_idle in Linux.

yes. This should be a desired feature in Xen, with some limitations:
        - not work with CPU hotplug
        - not work with old boxes (starting from Nehalem)
        - not work with Px/Cx state changes (_PPC, _CST e.g. from Node Manager)

So this will be a supplemented option to existing acpi_idle, and should
work on most cases when above 3 factors are not concerned.

> 
> Second, if only C1 gets announced by ACPI, we end up not using it
> because Dom0 simply neglects to let the hypervisor know. This is
> because acpi_processor_get_power_info_cst() (back to at least
> 2.6.16) returns -EFAULT if less than two C-states were found. Simply
> prefixing the check with "!processor_pm_external() && " fixes this
> (but I don't know whether something similar could be done in Jeremy's
> tree).

this is a very temporary problem which disappears quickly in subsequent
versions. But if just taking 2.6.18-xen, it's a right fix.

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.