Xen project Mailing List

Re: [Xen-devel] [PATCH v3 8/9] viridian: add implementation of synthetic timers

To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>, 'Jan Beulich' <JBeulich@xxxxxxxx>

From: Paul Durrant <Paul.Durrant@xxxxxxxxxx>

Date: Wed, 6 Mar 2019 11:47:52 +0000

Accept-language: en-GB, en-US

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, "Tim \(Xen.org\)" <tim@xxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>

Delivery-date: Wed, 06 Mar 2019 11:48:00 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHUuVJk0O6OPQtTdE6eEsyqpqJvF6XwsIcAgA3yV/CAAA3cIA==

Thread-topic: [PATCH v3 8/9] viridian: add implementation of synthetic timers

> -----Original Message----- > From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx] On Behalf Of > Paul Durrant > Sent: 06 March 2019 11:24 > To: 'Jan Beulich' <JBeulich@xxxxxxxx> > Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>; Wei Liu > <wei.liu2@xxxxxxxxxx>; Konrad Rzeszutek Wilk > <konrad.wilk@xxxxxxxxxx>; Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>; Tim > (Xen.org) <tim@xxxxxxx>; > George Dunlap <George.Dunlap@xxxxxxxxxx>; Julien Grall > <julien.grall@xxxxxxx>; xen-devel <xen- > devel@xxxxxxxxxxxxxxxxxxxx>; Ian Jackson <Ian.Jackson@xxxxxxxxxx>; Roger Pau > Monne > <roger.pau@xxxxxxxxxx> > Subject: Re: [Xen-devel] [PATCH v3 8/9] viridian: add implementation of > synthetic timers > > > -----Original Message----- > > From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > > Sent: 25 February 2019 14:54 > > To: Paul Durrant <Paul.Durrant@xxxxxxxxxx> > > Cc: Julien Grall <julien.grall@xxxxxxx>; Andrew Cooper > > <Andrew.Cooper3@xxxxxxxxxx>; Roger Pau Monne > > <roger.pau@xxxxxxxxxx>; Wei Liu <wei.liu2@xxxxxxxxxx>; George Dunlap > > <George.Dunlap@xxxxxxxxxx>; Ian > > Jackson <Ian.Jackson@xxxxxxxxxx>; Stefano Stabellini > > <sstabellini@xxxxxxxxxx>; xen-devel <xen- > > devel@xxxxxxxxxxxxxxxxxxxx>; Konrad Rzeszutek Wilk > > <konrad.wilk@xxxxxxxxxx>; Tim (Xen.org) > > <tim@xxxxxxx> > > Subject: Re: [PATCH v3 8/9] viridian: add implementation of synthetic timers > > > > >>> On 31.01.19 at 11:47, <paul.durrant@xxxxxxxxxx> wrote: > > > --- a/xen/arch/x86/hvm/viridian/synic.c > > > +++ b/xen/arch/x86/hvm/viridian/synic.c > > > @@ -329,7 +329,53 @@ void viridian_synic_domain_deinit(struct domain *d) > > > > > > void viridian_synic_poll_messages(struct vcpu *v) > > > { > > > - /* There are currently no message sources */ > > > + viridian_time_poll_timers(v); > > > +} > > > + > > > +bool viridian_synic_deliver_timer_msg(struct vcpu *v, unsigned int sintx, > > > + unsigned int index, > > > + int64_t expiration, int64_t > > > delivery) > > > +{ > > > + const union viridian_sint_msr *vs = > > > &v->arch.hvm.viridian->sint[sintx]; > > > + HV_MESSAGE *msg = v->arch.hvm.viridian->simp.ptr; > > > + struct { > > > + uint32_t TimerIndex; > > > + uint32_t Reserved; > > > + uint64_t ExpirationTime; > > > + uint64_t DeliveryTime; > > > + } payload = { > > > + .TimerIndex = index, > > > + .ExpirationTime = expiration, > > > + .DeliveryTime = delivery, > > > + }; > > > + > > > + if ( test_bit(sintx, &v->arch.hvm.viridian->msg_pending) ) > > > + return false; > > > + > > > + BUILD_BUG_ON(sizeof(*msg) != HV_MESSAGE_SIZE); > > > + msg += sintx; > > > + > > > + /* > > > + * To avoid using an atomic test-and-set this function must be called > > > + * in context of the vcpu receiving the message. > > > + */ > > > + ASSERT(v == current); > > > + if ( msg->Header.MessageType != HvMessageTypeNone ) > > > + { > > > + msg->Header.MessageFlags.MessagePending = 1; > > > + set_bit(sintx, &v->arch.hvm.viridian->msg_pending); > > > > As per the comment above this is always in context of the subject > > vCPU. It looks to me as if this was also the case for the two > > clear_bit() on the field in the prior patch. If so, all three could be > > the non-atomic variants instead. > > The only slight subtlety I think is the one in the wrmsr function, which can > be called in context of a > domain restore. I think it's still ok for it to be non-atomic in this case > but I'll assert (v = > current || !v->running), which I think should cover it. > > > > > > + return false; > > > + } > > > + > > > + msg->Header.MessageType = HvMessageTimerExpired; > > > + msg->Header.MessageFlags.MessagePending = 0; > > > + msg->Header.PayloadSize = sizeof(payload); > > > + memcpy(msg->Payload, &payload, sizeof(payload)); > > > + > > > + if ( !vs->fields.mask ) > > > + vlapic_set_irq(vcpu_vlapic(v), vs->fields.vector, 0); > > > > If this wasn't with v == current, I think you'd also need a barrier > > here. Could you extend the comment above to also mention this > > aspect? > > Ok. > > > > > > @@ -118,14 +119,237 @@ static int64_t time_ref_count(struct domain *d) > > > return raw_trc_val(d) + trc->off; > > > } > > > > > > +static int64_t time_now(struct domain *d) > > > > Why would this return a signed value? And can't the function > > parameter be const? > > The function parameter can be const, but I think the result needs to be > signed for the missed ticks > logic in start_timer() to work correctly. > > > > > > +{ > > > + const struct viridian_page *rt = > > > &d->arch.hvm.viridian->reference_tsc; > > > + HV_REFERENCE_TSC_PAGE *p = rt->ptr; > > > + uint32_t start, end; > > > + __int128_t tsc; > > > + __int128_t scale; > > > > I don't think you need both of them be 128 bits wide. I also don't > > see why either would want to be of a signed type. > > The spec says (as in the comment below): > > "The partition reference time is computed by the following formula: > > ReferenceTime = ((VirtualTsc * TscScale) >> 64) + TscOffset > > The multiplication is a 64 bit multiplication, which results in a 128 bit > number which is then shifted > 64 times to the right to obtain the high 64 bits.TscScale" > > Again, I'm using signed arithmetic as I think it's necessary for the missed > ticks logic to work > correctly in the event of an overflow. FAOD the code that I am concerned about is: /* * The timer is already started, so we're re-scheduling. * Hence advance the timer expiration by one tick. */ next = vs->expiration + vs->count; /* Now check to see if any expirations have been missed */ if ( now - next > 0 ) missed = (now - next) / vs->count; If now and next were unsigned then next may overflow such that (now - next) ends up being very large, rather than negative, so I'd end up calculating a completely bogus value for missed. Paul > > > > > > + int64_t offset; > > > + > > > + /* > > > + * If the reference TSC page is not enabled, or has been invalidated > > > + * fall back to the partition reference counter. > > > + */ > > > + if ( !p || !p->TscSequence ) > > > + return time_ref_count(d); > > > + > > > + /* > > > + * The following sampling algorithm for tsc, scale and offset is > > > + * documented in the specifiction. > > > + */ > > > + start = p->TscSequence; > > > + > > > + do { > > > + tsc = rdtsc(); > > > + scale = p->TscScale; > > > + offset = p->TscOffset; > > > + > > > + smp_mb(); > > > + end = p->TscSequence; > > > > Why is this a full barrier, rather than just a read one? And don't you need > > to add a counterpart in update_reference_tsc()? > > Yes, a read barrier is enough with the counterpart write barrier added. > > > > > > + } while (end != start); > > > > update_reference_tsc() increments TscSequence. If end doesn't match > > start at this point, you're entering a near infinite loop here as long as > > you don't update start inside the loop. I also think that there's a > > second read barrier needed between this initial reading of the sequence > > number and the reading of the actual values. > > Yes, the start value should be inside the loop of course. > > > > > > + /* > > > + * The specification says: "The partition reference time is computed > > > + * by the following formula: > > > + * > > > + * ReferenceTime = ((VirtualTsc * TscScale) >> 64) + TscOffset > > > + * > > > + * The multiplication is a 64 bit multiplication, which results in a > > > + * 128 bit number which is then shifted 64 times to the right to > > > obtain > > > + * the high 64 bits." > > > + */ > > > + return ((tsc * scale) >> 64) + offset; > > > +} > > > + > > > +static void stop_stimer(struct viridian_stimer *vs) > > > +{ > > > + struct vcpu *v = vs->v; > > > > const? > > Ok. > > > > > > + unsigned int stimerx = vs - &v->arch.hvm.viridian->stimer[0]; > > > + > > > + if ( !vs->started ) > > > + return; > > > + > > > + stop_timer(&vs->timer); > > > + clear_bit(stimerx, &v->arch.hvm.viridian->stimer_pending); > > > + vs->started = false; > > > +} > > > + > > > +static void stimer_expire(void *data) > > > +{ > > > + struct viridian_stimer *vs = data; > > > > const? > > Ok. > > > > > > + struct vcpu *v = vs->v; > > > + unsigned int stimerx = vs - &v->arch.hvm.viridian->stimer[0]; > > > + > > > + if ( !vs->config.fields.enabled ) > > > + return; > > > + > > > + set_bit(stimerx, &v->arch.hvm.viridian->stimer_pending); > > > + vcpu_kick(v); > > > +} > > > + > > > +static void start_stimer(struct viridian_stimer *vs) > > > +{ > > > + struct vcpu *v = vs->v; > > > + unsigned int stimerx = vs - &v->arch.hvm.viridian->stimer[0]; > > > + int64_t now = time_now(v->domain); > > > + s_time_t timeout; > > > + > > > + if ( !test_and_set_bit(stimerx, > > > &v->arch.hvm.viridian->stimer_enabled) ) > > > + printk(XENLOG_G_INFO "%pv: VIRIDIAN STIMER%u: enabled\n", v, > > > + stimerx); > > > + > > > + if ( vs->config.fields.periodic ) > > > + { > > > + unsigned int missed = 0; > > > + int64_t next; > > > + > > > + /* > > > + * The specification says that if the timer is lazy then we > > > + * skip over any missed expirations so we can treat this case > > > + * as the same as if the timer is currently stopped, i.e. we > > > + * just schedule expiration to be 'count' ticks from now. > > > + */ > > > + if ( !vs->started || vs->config.fields.lazy ) > > > + { > > > + next = now + vs->count; > > > + } > > > > Unnecessary braces. > > Yes. > > > > > > @@ -149,6 +373,57 @@ int viridian_time_wrmsr(struct vcpu *v, uint32_t > > > idx, uint64_t val) > > > } > > > break; > > > > > > + case HV_X64_MSR_TIME_REF_COUNT: > > > + return X86EMUL_EXCEPTION; > > > > Isn't this an unrelated change? > > It is. I'll call it out in the comment comment. > > > > > > + case HV_X64_MSR_STIMER0_CONFIG: > > > + case HV_X64_MSR_STIMER1_CONFIG: > > > + case HV_X64_MSR_STIMER2_CONFIG: > > > + case HV_X64_MSR_STIMER3_CONFIG: > > > + { > > > + unsigned int stimerx = (idx - HV_X64_MSR_STIMER0_CONFIG) / 2; > > > + struct viridian_stimer *vs = > > > &v->arch.hvm.viridian->stimer[stimerx]; > > > + > > > + if ( !(viridian_feature_mask(d) & HVMPV_stimer) ) > > > + return X86EMUL_EXCEPTION; > > > + > > > + stop_stimer(vs); > > > + > > > + vs->config.raw = val; > > > + > > > + if ( !vs->config.fields.sintx ) > > > + vs->config.fields.enabled = 0; > > > + > > > + if ( vs->config.fields.enabled ) > > > + start_stimer(vs); > > > + > > > + break; > > > + } > > > + case HV_X64_MSR_STIMER0_COUNT: > > > > Missing blank line again (and also further down here as well as in the > > rdmsr code). > > > > Ok. TBH I've always thought the normal style was to omit the blank line if > the case statement has > braces. > > > > + case HV_X64_MSR_STIMER1_COUNT: > > > + case HV_X64_MSR_STIMER2_COUNT: > > > + case HV_X64_MSR_STIMER3_COUNT: > > > + { > > > + unsigned int stimerx = (idx - HV_X64_MSR_STIMER0_COUNT) / 2; > > > + struct viridian_stimer *vs = > > > &v->arch.hvm.viridian->stimer[stimerx]; > > > + > > > + if ( !(viridian_feature_mask(d) & HVMPV_stimer) ) > > > + return X86EMUL_EXCEPTION; > > > + > > > + stop_stimer(vs); > > > + > > > + vs->count = val; > > > + > > > + if ( !vs->count ) > > > > Any reason you don't use val here (which the compiler likely will do > > anyway)? > > Not particularly, I just think it reads better and is more consistent with > other code. > > > > > > @@ -201,6 +476,32 @@ int viridian_time_rdmsr(const struct vcpu *v, > > > uint32_t idx, uint64_t *val) > > > break; > > > } > > > > > > + case HV_X64_MSR_STIMER0_CONFIG: > > > + case HV_X64_MSR_STIMER1_CONFIG: > > > + case HV_X64_MSR_STIMER2_CONFIG: > > > + case HV_X64_MSR_STIMER3_CONFIG: > > > + { > > > + unsigned int stimerx = (idx - HV_X64_MSR_STIMER0_CONFIG) / 2; > > > + > > > + if ( !(viridian_feature_mask(d) & HVMPV_stimer) ) > > > + return X86EMUL_EXCEPTION; > > > + > > > + *val = v->arch.hvm.viridian->stimer[stimerx].config.raw; > > > > While more noticeable here and ... > > > > > + break; > > > + } > > > + case HV_X64_MSR_STIMER0_COUNT: > > > + case HV_X64_MSR_STIMER1_COUNT: > > > + case HV_X64_MSR_STIMER2_COUNT: > > > + case HV_X64_MSR_STIMER3_COUNT: > > > + { > > > + unsigned int stimerx = (idx - HV_X64_MSR_STIMER0_COUNT) / 2; > > > + > > > + if ( !(viridian_feature_mask(d) & HVMPV_stimer) ) > > > + return X86EMUL_EXCEPTION; > > > + > > > + *val = v->arch.hvm.viridian->stimer[stimerx].count; > > > > ... here, array_access_nospec() are probably needed not just here, > > but also in the wrmsr logic. > > Really? stimerx is calculated based on hitting the case statement in the > first place. > > > > > > --- a/xen/include/asm-x86/hvm/viridian.h > > > +++ b/xen/include/asm-x86/hvm/viridian.h > > > @@ -40,6 +40,33 @@ union viridian_sint_msr > > > } fields; > > > }; > > > > > > +union viridian_stimer_config_msr > > > +{ > > > + uint64_t raw; > > > + struct > > > + { > > > + uint64_t enabled:1; > > > + uint64_t periodic:1; > > > + uint64_t lazy:1; > > > + uint64_t auto_enable:1; > > > + uint64_t vector:8; > > > + uint64_t direct_mode:1; > > > + uint64_t reserved_zero1:3; > > > + uint64_t sintx:4; > > > + uint64_t reserved_zero2:44; > > > + } fields; > > > +}; > > > + > > > +struct viridian_stimer { > > > + struct vcpu *v; > > > > Isn't a full 8-byte pointer a little too much overhead here? You could > > instead store the timer index ... > > I think I need it in stimer_expire() which can be called in any vcpu context > IIUC. > > > > > > + struct timer timer; > > > + union viridian_stimer_config_msr config; > > > + uint64_t count; > > > + int64_t expiration; > > > + s_time_t timeout; > > > + bool started; > > > > ... in a field using the 7-byte padding here, and use container_of() > > to get at the outer structure. > > That would get me as far as viridian_vcpu, but there's no pointer to struct > vcpu in there, and I need > one to call vcpu_kick(). > > Paul > > > > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxxx > https://lists.xenproject.org/mailman/listinfo/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.