[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X table and pending bit array



>>> On 26.08.10 at 10:41, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:
>>From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx] 
>>>>> On 26.08.10 at 08:24, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx> wrote:
>>>>From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
>>>>[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Jan Beulich
>>>>An alternative would be to determine and insert the address ranges
>>>>earlier into mmio_ro_ranges, but that would require a hook in the
>>>>PCI config space writes, which is particularly problematic in case
>>>>MMCONFIG accesses are being used.
>>>
>>> I noticed you stated in your previous mail that this should be done in
>>> hypervisor, not tools. Is it because tools is not trusted by xen hypervisor?
>>> If tools can be trusted, is it possible to achieve this in tools: Tools tell
>>> xen hypervisor the MMIO range that is read-only to this guest after the 
>>> guest
>>> is created, but before the domain is unpaused.
>>
>>Doing this from the tools would have the unfortunate side effect that
>>Dom0 (or the associated stubdom) would continue to need special
>>casing, which really it shouldn't for this purpose (and all the special
>>casing in the patch is really just because qemu wants to map writably
>>the ranges in question, and this can only be removed after the
>>hypervisor handling changed).
> 
> Agree that doing this from tools can't remove dom0's write access.
> Maybe we can combine this together. Tools for normal guest, while your 
> change to msix_capability_init() is for dom0, the flow is followed:
> 1) When xen boot, xen will scan all PCI hierarchy, and get the MSI-X address. 
> Mark that range to dom0's mmio_ro_range. (Maybe a seperetaed list is needed 
> to track which MMIO range to which device).

That won't work for any devices that BIOS didn't assign resources to.

> 2) When a MSI-x interrupt is started, we check the corresponding BAR to see 
> if the range is changed by dom0. If yes, update dom0's mmio_ro_range.  We can 
> also check if the assigned domain's mmio_ro_range cover this.
> 3) When tools create domain, tools will remove the mmio range for the guest.

This (and therefore 2 above) won't work: You must not disallow the
guest access to this space altogether - it may validly want to read the
PBA. Remember that the code to remove pv guests' access to these
two ranges altogether got reverted due to causing problems with
real world devices/drivers.

Also, if the check would be done only when the interrupt is being
started, we would still have the problem of potentially needing to
change existing mappings.

>>least one MSI-X interrupt got enabled. Plus (while not the case with
>>the current Linux implementation) a (non-Linux or future Linux)
>>version may choose to (re-)assign device resources only when the
>>device gets enabled, which would be after the guest was already
>>launched.
> 
> I'm a bit confused. Are you talking about guest re-assign device resources or 
> dom0?

Dom0.

> If you are talking about guest, I think that's easy to handle , and we 
> should anyway do that. Especially it should not impact physical resource.
> If you are talking aboud dom0, I can't think out while guest enable device 
> will cause re-assignment in dom0.

Because pciback does the actual enabling on behalf of the guest.
Any resource adjustments done when memory decoding gets
enabled won't be known at the time the guest starts.

>>Shadows and p2m table need updating in this case, but MSI information
>>doesn't afaict (it gets proagated only when the first interrupt is being
>>set up).
> 
> Maybe I didn't state my point clearly. What I mean is, since xen hypervisor 
> knows about PCI hierarchy, they can setup the mmio_ro_range when xen boot. 
> The only concerns for this method is situation a and situation b, which will 
> update the PCI device's resource assignment, while xen hypervisor have no 
> idea and can't update mmio_ro_range.

Again - knowing about the PCI hierarchy doesn't mean knowing about
all resources. One option clearly is to require a (new) callout from Dom0
when any resources get adjusted. What I don't like about this is that
all existing Dom0 kernels would need fixing, i.e. I'd prefer a solution
that is contained to hypervisor+tools.

>>> I'm not sure if we can add one more p2m type like p2m_mmio_ro? And expand
>>> the p2m_is_readonly to cover this also? PAE xen may have trouble for it, but
>>> at least it works for x86_64, and some wrap function with #ifdef X86_64 can
>>> handle the difference.
>>
>>That would be a possible alternative, but I'm afraid I wouldn't dare to
>>make such a change.
> 
> Which change? Would following code works without much issue?

I don't know. I would just be afraid that there are other places in
the code checking explicitly (or, worse, implicitly) for p2m_mmio
would need updating. And not knowing well both shadow and p2m
code, that's not something I would want to do on my own.

> I agree that dom0's writing access should also be avoided . The concern for 
> global mmio_ro_ranges is, the ranges maybe a bit big, if several SR-IOV card 
> populated, each support several VF. But I have no data how bit impact would 
> it be.

A potential later optimization for this would be to make Dom0 try
co-locate all these regions.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.