[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 0/7] Implement support for external IPT monitoring



----- 17 cze 2020 o 18:19, Andrew Cooper andrew.cooper3@xxxxxxxxxx napisał(a):

> On 17/06/2020 04:02, Tamas K Lengyel wrote:
>> On Tue, Jun 16, 2020 at 2:17 PM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> 
>> wrote:
>>> On 16/06/2020 19:47, Michał Leszczyński wrote:
>>>> ----- 16 cze 2020 o 20:17, Andrew Cooper andrew.cooper3@xxxxxxxxxx 
>>>> napisał(a):
>>>>
>>>>> Are there any restrictions on EPT being enabled in the first place?  I'm
>>>>> not aware of any, and in principle we could use this functionality for
>>>>> PV guests as well (using the CPL filter).  Therefore, I think it would
>>>>> be helpful to not tie the functionality to HVM guests, even if that is
>>>>> the only option enabled to start with.
>>>> I think at the moment it's not required to have EPT. This patch series 
>>>> doesn't
>>>> use any translation feature flags, so the output address is always a 
>>>> machine
>>>> physical address, regardless of context. I will check if it could be easily
>>>> used with PV.
>>> If its trivial to add PV support then please do.  If its not, then don't
>>> feel obliged, but please do at least consider how PV support might look
>>> in the eventual feature.
>>>
>>> (Generally speaking, considering "how would I make this work in other
>>> modes where it is possible" leads to a better design.)
>>>
>>>>> The buffer mapping and creation logic is fairly problematic.  Instead of
>>>>> fighting with another opencoded example, take a look at the IOREQ
>>>>> server's use of "acquire resource" which is a mapping interface which
>>>>> supports allocating memory on behalf of the guest, outside of the guest
>>>>> memory, for use by control tools.
>>>>>
>>>>> I think what this wants is a bit somewhere in domain_create to indicate
>>>>> that external tracing is used for this domain (and allocate whatever
>>>>> structures/buffers are necessary), acquire resource to map the buffers
>>>>> themselves, and a domctl for any necessary runtime controls.
>>>>>
>>>> I will check this out, this sounds like a good option as it would remove 
>>>> lots of
>>>> complexity from the existing ipt_enable domctl.
>>> Xen has traditionally opted for a "and turn this extra thing on
>>> dynamically" model, but this has caused no end of security issues and
>>> broken corner cases.
>>>
>>> You can see this still existing in the difference between
>>> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being
>>> required to chose the number of vcpus for the domain) and we're making
>>> good progress undoing this particular wart (before 4.13, it was
>>> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by
>>> issuing other hypercalls between these two).
>>>
>>> There is a lot of settings which should be immutable for the lifetime of
>>> the domain, and external monitoring looks like another one of these.
>>> Specifying it at createdomain time allows for far better runtime
>>> behaviour (you are no longer in a situation where the first time you try
>>> to turn tracing on, you end up with -ENOMEM because another VM booted in
>>> the meantime and used the remaining memory), and it makes for rather
>>> more simple code in Xen itself (at runtime, you can rely on it having
>>> been set up properly, because a failure setting up will have killed the
>>> domain already).
>> I'm not in favor of this being a flag that gets set during domain
>> creation time. It could certainly be the case that some users would
>> want this being on from the start till the end but in other cases you
>> may want to enable it intermittently only for some time in-between
>> particular events. If it's an on/off flag during domain creation you
>> pretty much force that choice on the users and while the overhead of
>> PT is better than say MTF it's certainly not nothing. In case there is
>> an OOM situation enabling IPT dynamically the user can always just
>> pause the VM and wait till memory becomes available.
> 
> There is nothing wrong with having "turn tracing on/off at runtime"
> hypercalls.  It is specifically what I suggested two posts up in this
> thread, but it should be limited to the TraceEn bit in RTIT_CTL.
> 
> What isn't ok is trying to allocate the buffers, write the TOPA, etc on
> first-enable or first-map, because the runtime complexity of logic like
> this large, and far too easy to get wrong in security relevant ways.
> 
> The domain create flag would mean "I wish to use tracing with this
> domain", and not "I want tracing enabled from the getgo".
> 


I'm trying to implement this suggestion right now. I've already switched to 
acquire_resource and now I want to make buffers statically allocated on domain 
creation.

I think it would be reasonable to add an option like "vmtrace_ipt_size" in 
xl.cfg. By default it would be 0 (meaning "disabled"), and when it's set to 
non-zero value by the user, the IPT buffers of given size will be allocated for 
each vCPU upon domain creation.

Could you give some hints about how to correctly add a new option in xl.cfg in 
a way that it's accessible on the hypervisor part? This part related to 
configuration parsing/processing is what I don't understand yet.


>>>>> What semantics do you want for the buffer becoming full?  Given that
>>>>> debugging/tracing is the goal, I presume "pause vcpu on full" is the
>>>>> preferred behaviour, rather than drop packets on full?
>>>>>
>>>> Right now this is a ring-style buffer and when it would become full it 
>>>> would
>>>> simply wrap and override the old data.
>>> How does the consumer spot that the data has wrapped?  What happens if
>>> data starts getting logged, but noone is listening?  What happens if the
>>> consumer exits/crashes/etc and stops listening as a consequence?
>>>
>>> It's fine to simply state what will happen, and possibly even "don't do
>>> that then", but the corner cases do at least need thinking about.
>> AFAIU the current use-case is predominantly to be used in conjunction
>> with VMI events where you want to be able to see the trace leading up
>> to a particular vmexit. So in the case when the buffer is wrapped
>> in-between events and data is lost that's not really of concern.
> 
> That's all fine.  I imagine the output here is voluminous, and needs
> help being cut down as much as possible.
> 
> On a tangent, I presume you'd like to include VM-fork eventually, which
> ought to include copying the trace buffer on fork?
> 
> ~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.