[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 21/24] ARM: vITS: handle INVALL command



On Fri, 11 Nov 2016, Stefano Stabellini wrote:
> On Fri, 11 Nov 2016, Julien Grall wrote:
> > Hi Stefano,
> > 
> > On 10/11/16 20:42, Stefano Stabellini wrote:
> > > On Thu, 10 Nov 2016, Julien Grall wrote:
> > > > On 10/11/16 00:21, Stefano Stabellini wrote:
> > > > > On Fri, 4 Nov 2016, Andre Przywara wrote:
> > > > > > On 24/10/16 16:32, Vijay Kilari wrote:
> > > > > > > On Wed, Sep 28, 2016 at 11:54 PM, Andre Przywara
> > > > > > >         AFAIK, the initial design is to use tasklet to update
> > > > > > > property
> > > > > > > table as it consumes
> > > > > > > lot of time to update the table.
> > > > > > 
> > > > > > This is a possible, but premature optimization.
> > > > > > Linux (at the moment, at least) only calls INVALL _once_, just after
> > > > > > initialising the collections. And at this point no LPI is mapped, so
> > > > > > the
> > > > > > whole routine does basically nothing - and that quite fast.
> > > > > > We can later have any kind of fancy algorithm if there is a need 
> > > > > > for.
> > > > > 
> > > > > I understand, but as-is it's so expensive that could be a DOS vector.
> > > > > Also other OSes could issue INVALL much more often than Linux.
> > > > > 
> > > > > Considering that we might support device assigment with ITS soon, I
> > > > > think it might be best to parse per-domain virtual tables rather than
> > > > > the full list of physical LPIs, which theoretically could be much
> > > > > larger. Or alternatively we need to think about adding another field 
> > > > > to
> > > > > lpi_data, to link together all lpis assigned to the same domain, but
> > > > > that would cost even more memory. Or we could rate-limit the INVALL
> > > > > calls to one every few seconds or something. Or all of the above :-)
> > > > 
> > > > It is not necessary for an ITS implementation to wait until an 
> > > > INVALL/INV
> > > > command is issued to take into account the change of the LPI 
> > > > configuration
> > > > tables (aka property table in this thread).
> > > > 
> > > > So how about trapping the property table? We would still have to go
> > > > through
> > > > the property table the first time (i.e when writing into the
> > > > GICR_PROPBASER),
> > > > but INVALL would be a nop.
> > > > 
> > > > The idea would be unmapping the region when GICR_PROPBASER is written. 
> > > > So
> > > > any
> > > > read/write access would be trapped. For a write access, Xen will update
> > > > the
> > > > LPIs internal data structures and write the value in the guest page
> > > > unmapped.
> > > > If we don't want to have an overhead for the read access, we could just
> > > > write-protect the page in stage-2 page table. So only write access would
> > > > be
> > > > trapped.
> > > > 
> > > > Going further, for the ITS, Xen is using the guest memory to store the 
> > > > ITS
> > > > information. This means Xen has to validate the information at every
> > > > access.
> > > > So how about restricting the access in stage-2 page table? That would
> > > > remove
> > > > the overhead of validating data.
> > > > 
> > > > Any thoughts?
> > > 
> > > It is a promising idea. Let me expand on this.
> > > 
> > > I agree that on INVALL if we need to do anything, we should go through
> > > the virtual property table rather than the full list of host lpis.
> > 
> > I agree on that.
> > 
> > > 
> > > Once we agree on that, the two options we have are:
> > 
> > I believe we had a similar discussion when Vijay worked on the vITS (see 
> > [1]).
> > I would have hoped that this new proposal took into account the constraint
> > mentioned back then.
> > 
> > > 
> > > 1) We let the guest write anything to the table, then we do a full
> > > validation of the table on INVALL. We also do a validation of the table
> > > entries used as parameters for any other commands.
> > > 
> > > 2) We map the table read-only, then do a validation of every guest
> > > write. INVALL becomes a NOP and parameters validation for many commands
> > > could be removed or at least reduced.
> > > 
> > > Conceptually the two options should both lead to exactly the same
> > > result. Therefore I think the decision should be made purely on
> > > performance: which one is faster?  If it is true that INVALL is only
> > > typically called once I suspect that 1) is faster, but I would like to
> > > see some simple benchmarks, such as the time that it takes to configure
> > > the ITS from scratch with the two approaches.
> > 
> > The problem is not which one is faster but which one will not take down the
> > hypervisor.
> > 
> > The guest is allowed to create a command queue as big as 1MB, a command use 
> > 32
> > bytes, so the command queue can fit up 32640 commands.
> > 
> > Now imagine a malicious guest filling up the command queue with INVALL and
> > then notify the via (via GITS_CWRITER). Based on patch #5, all those 
> > commands
> > will be handled in one. So you have to multiple the time of one command by
> > 32640 times.
> > 
> > Given that the hypervisor is not preemptible, it likely means a DOS.
> 
> I think it can be made to work safely using a rate-limiting technique.
> Such as: Xen is only going to emulate an INVALL for a given domain only
> once every one or two seconds and no more often than that. x86 has
> something like that under xen/arch/x86/irq.c, see irq_ratelimit.
> 
> But to be clear, I am not saying that this is necessarily the best way
> to do it. I would like to see some benchmarks first.
> 
> 
> > A similar problem would happen if an vITS command is translate to an ITS
> > command (see the implementation of INVALL). Multiple malicious guest could
> > slow down the other guest by filling up the host command queue. Worst, a
> > command from a normal guest could be discarded because the host ITS command
> > queue is full (see its_send_command in gic-its.c).
>  
> Looking at the patches, nothing checks for discarded physical ITS
> commands. Not good.
> 
> 
> > That's why in the approach we had on the previous series was "host ITS 
> > command
> > should be limited when emulating guest ITS command". From my recall, in that
> > series the host and guest LPIs was fully separated (enabling a guest LPIs 
> > was
> > not enabling host LPIs).
> 
> I am interested in reading what Ian suggested to do when the physical
> ITS queue is full, but I cannot find anything specific about it in the
> doc.
> 
> Do you have a suggestion for this? 
> 
> The only things that come to mind right now are:
> 
> 1) check if the ITS queue is full and busy loop until it is not (spin_lock 
> style)
> 2) check if the ITS queue is full and sleep until it is not (mutex style)

Another, probably better idea, is to map all pLPIs of a device when the
device is assigned to a guest (including Dom0). This is what was written
in Ian's design doc. The advantage of this approach is that Xen doesn't
need to take any actions on the physical ITS command queue when the
guest issues virtual ITS commands, therefore completely solving this
problem at the root. (Although I am not sure about enable/disable
commands: could we avoid issuing enable/disable on pLPIs?) It also helps
toward solving the INVALL potential DOS issue, because it significantly
reduces the computation needed when an INVALL is issued by the guest.

On the other end, this approach has the potential of consuming much more
memory to map all the possible pLPIs that a device could use up to the
theoretical max. Of course that is not good either. But fortunately for
PCI devices we know how many events a device can generate. Also we
should be able to get that info on device tree for other devices. So I
suggest Xen only maps as many pLPIs as events the device can generate,
when the device is assigned to the guest. This way there would be no
wasted memory.

Does it make sense? Do you think it could work?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.