[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH v2 3/3] tools, libxl: handle the iomem parameter with the memory_mapping hcall

On Fri, 2014-03-14 at 13:15 +0100, Dario Faggioli wrote:
> > > In Arianna's case, it think it would be more than fine to implement it
> > > that way, and call it from within the OS, isn't this the case, Arianna?
> > 
> > It's certainly an option, and it would make a lot of the toolstack side
> > issues moot but I'm not at all sure it is the right answer. In
> > particular although it might be easy to bodge a mapping into many OSes I
> > can imagine getting such a think into something generic like Linux might
> > be more tricky -- in which case perhaps the toolstack should be taking
> > care of it, and that does have a certain appeal from the simplicity of
> > the guest interface side of things.
> > 
> If the toolstack needs to support iomem, yes.
> A way to see it could be that, as of now, we have embedded OSes asking
> for it already, and, at the same time, it's unlikely that more generic
> system such as Linux would want something similar, for one because
> they'll have full DT/ACPI support for this.
> The fact that, if what you say below is true, "iomem" does not work at
> all, and no one complained from the Linux world so far, seems to me to 

iomem *does* work just fine for x86 PV guests, which is what it was
added for. Apparently it was never extended to HVM guests.

> > > One thing I don't see right now is, in the in-kernel case, what we
> > > should do when finding the "iomem=[]" option in a config file.
> > 
> > Even for an x86 HVM guest with iomem there is no call to
> > xc_domain_memory_mapping (even from qemu) it is called only for PCI
> > passthrough. I've no idea if/how iomem=[] works for x86 HVM guests.
> > Based on a cursory glance it looks to me like it wouldn't and if it did
> > work it would have the same problems wrt where to map it as we have
> > today with the ARM guests, except perhaps on a PC the sorts of things
> > you would pass through with this can be done 1:1 because they are legacy
> > PC peripherals etc.
> > 
> Aha, interesting! As said to Julien, I tried to look for how these iomem
> regions get to the xc_domain_memory_map in QEMU and I also found nothing
> (except for PCI passthrough stuff)... I was assuming I was missing
> something... apparently, I wasn't. :-)
> I can try (even just out of curiosity!) to dig in the tree and see what
> happened with this feature...
> BTW, doesn't this make this discussion even more relevant? I mean, if
> it's there but it's not working, shouldn't we make it work (for some
> definition of "work"), if we decide it's useful, or kill it, if not?

Apparently no one is using this with x86 HVM guests, so apparently no
one cares. If there are users then we should fix it. Maybe we will fix
this as a side effect of making it work on ARM, I don't know.

> > I think we just don't know what the right answer is yet and I'm unlikely
> > to be able to say directly what the right answer is. I'm hoping that
> > people who want to use this low level functionality can provide a
> > consistent story for how it should work (a "design" if you like) to
> > which I can say "yes, that seems sensible" or "hmm, that seems odd
> > because of X". At the moment X is "the 1:1 mapping seems undesirable to
> > me". There have been some suggestions for how to fix that, someone with
> > a horse in the race should have a think about it and provide an
> > iteration on the design until we are happy with it.
> > 
> Again, I'll look more into the history of this feature.
> In the meantime, the man page entry says:
> "Allow guest to access specific hardware I/O memory pages.
> B<IOMEM_START> is a physical page number. B<NUM_PAGES> is the number of
> pages beginning with B<START_PAGE> to allow access. Both values must be
> given in hexadecimal.
> It is recommended to use this option only for trusted VMs under
> administrator control."
> For one, the "Allow guest to access" there leaves a lot of room for
> interpretation, I think.

Not if you think about it in terms of being a PV guest option, where the
mapping just happens when the guest makes a PTE to map it.

Probably this option is currently misplaced in the man page, it should
be PV specific.

>  It doesn't say anything about mapping, so one
> can interpret it as 'this only grant you mapping permission, up to you
> to map'. However, it says "access", so one can interpret it as 'if I can
> access it, it's mapped already'.
> Wherever this discussion will land, I think that, if we keep this
> option, we should make the documentation less ambiguous.
> That being said, allow me to play, in the rest of this e-mail, the role
> of one which expects the mapping to be actually instated by just
> specifying "iomem=[]" in the config file, which is (correct me guys if
> I'm wrong) what Eric, Viktor and Arianna thought when reading the entry
> above.
> So:
>  1. it says nothing about where the mapping ends up in guest's memory,

it currently says nothing because it is a PV feature.

>  2. it looks quite clear (to me?) that this is a raw/low level feature,
>     to be used under controlled conditions (which an embedded product
>     often is)


> To me, a legitimate use case is this: I want to run version X of my non
> DT capable OS on version Z of Xen, on release K of board B. In such
> configuration, the GPIO controller is at MFN 0x000abcd, and I want only
> VM V to have direct access to it (board B dos not have an IOMMU).
> I would also assume that one is in full control of the guest address

If "one" here is the user then I don't think so.

A given version of Xen will provide a particular memory layout to the
guest. If you want to run non-single image OSes (i.e. things without
device tree) on Xen then you will need to build a specific kernel binary
for that version of Xen hardcoding the particular layout of that version
of Xen. If you upgrade Xen then you will need to rebuild your guest to
use the correct address layout.

If the user doesn't want that, i.e. they want a single binary to run on
multiple versions of Xen, then they had better implement device tree
support in their kernel.

As a second aspect of this it means that the user will need to change
their config file to update their iomem mappings to new safe values when
they change the version of Xen. This is possibly even true if they use
device tree (in which case maybe we want a way for them to add
descriptions of these things to dt).

> space, so it is be ok to hardcode the addresses of registers somewhere.
> Of course, that may be an issue, when putting Xen underneath, as Xen's
> mangling with the such address space can cause troubles.
> Arianna, can you confirm that this is pretty much the case of Erika, or
> amend what I did get wrong?
> I certainly don't claim to have the right answer but, in the described
> scenario, either:
>  1) the combination of iomem=[ MFN,NR@PFN ]", defaulting to 1:1 if  
>     "@PFN is missing, and e820_host
>  2) calling (the equivalent of) XEN_DOMCTL_memory_map from the guest 
>     kernel
> would be good solutions, to the point that I think we could even support
> both. The main point being that, I think, even in the worst case, any
> erroneous usage of either, would "just" destroy the guest, and that's
> acceptable.

I don't think we want both and I'm leaning towards #1 right now, but
with the e820_host thing being unnecessary in the first instance.

> Actually, if going for 1), I think (when both the pieces are there)
> things should be pretty safe. Furthermore, implementing 1) seems to me
> the only way of having the "iomem=[]" parameter causing both permissions
> granting and mapping. Downside is (libxl side of) the implementation
> indeed looks cumbersome.
> If going for _only_ 2), then "iomem=[]" would just be there to ensure
> the future mapping operation to be successful, i.e., for granting
> mapping rights, as it's doing right now. It would be up to the guest
> kernel to make sure the MFN it is trying to map are consistent with what
> was specified in "iomem=[]". Given the use case we're talking about, I
> don't think this is an unreasonable request, as far as we make the iomem
> man entry more clearly stating this.

My worry with this one is that it makes might make it harder to DTRT in
the future, e.g. by adding device tree nodes to represent things mapped
with iomem=[], by committing us to a world where the guest makes these
mappings and not the tools.

> Again, Arianna, do you confirm both solution are possible for you?
> I certainly agree that the thread could benefit from some opinion from
> people actually wanting to use this. In addition to Arianna, I have
> pinged Eirc and Viktor repeatedly... let's see if they have time to let
> us all know something more about their own needs and requirements wrt
> this.
> > > Also, just trying to recap, for Arianna's sake, moving the
> > > implementation of the DOMCTL in common code (and implementing the
> > > missing bits to make it works properly, of course) is still something we
> > > want, right?
> > 
> > *If* we go the route of having the kernel make the mapping then there is
> > no need, is there?
> > 
> Yeah, well, me not being sure is the reason I asked... And Julien said
> he thinks we still want it... :-P

The answer to this question will necessarily depend on the design
decisions made above. It will either be needed or not.

> As I was saying above, I think there is room for both, but I don't mind
> picking up one. However, if we want to fix iomem=[] and go as far as
> having it doing the mapping, then I think we all agree we need the
> So, looks like the discussion resolves to something like:
>  - do we need the DOMCTL for other purposes than iomem=[] ?
>  - if no, what do we want to do with iomem=[] ?

Please come up with a design which answers this, I've given my opinions
above but if you think some other design is better then argue for it.

> Sorry to have brought you all deep down into this can of worms! :-/
> Regards,
> Dario

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.