[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH for-4.12 6/8] xen/arm: Implement workaround for Cortex-A76 erratum 1165522

To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
From: Julien Grall <julien.grall@xxxxxxx>
Date: Thu, 24 Jan 2019 13:17:24 +0000
Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, James Morse <james.morse@xxxxxxx>, andre.przywara@xxxxxxx
Delivery-date: Thu, 24 Jan 2019 13:17:40 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

(+ James)

Hi Stefano,

@James, please correct me if I am wrong below :).

On 24/01/2019 00:52, Stefano Stabellini wrote:

On Wed, 28 Nov 2018, Julien Grall wrote:

  void p2m_restore_state(struct vcpu *n)
@@ -111,10 +130,17 @@ void p2m_restore_state(struct vcpu *n)
      if ( is_idle_vcpu(n) )
          return;

- WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2);

      WRITE_SYSREG(n->arch.sctlr, SCTLR_EL1);
      WRITE_SYSREG(n->arch.hcr_el2, HCR_EL2);

+ /*

+     * ARM64_WORKAROUND_AT_SPECULATE: VTTBR_EL2 should be restored after all
+     * registers associated to EL1/EL0 translations regime have been
+     * synchronized.
+     */
+    asm volatile(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_AT_SPECULATE));


Obviously you have done a lot more thinking about this than me, but
I don't fully understand the need for this barrier: this is not about
ARM64_WORKAROUND_AT_SPECULATE per se, right?


This particular isb() is about ARM64_WORKAROUND_AT_SPECULATE.

Shouldn't the CPU be able
to figure out the right execution speculation path given that the
instructions ordering is correct?

What instructions ordering? Writing a system registers can be re-ordered and youneed an isb() to ensure the full synchronization before executing at ATinstruction or flushing the TLBs.

In hardware without the erratum, the registers associated with EL1 translationare out-of-context and the AT cannot speculate. The isb() added by patch #5ensure the context is synchronized so an AT afterwards would use a consistentcontext. Now...

I guess it depends on the nature of
the hardware bug.

... in the context of the errata, you have to imagine what can happen if an ATinstruction is inserted (via speculation) between each instruction and whathappen if the system registers are re-ordered.

The key of the erratum is VTTBR_EL2. This is what will stop a speculated ATinstruction to allocate a TLBs entry because you are not allowed to cache atranslation that will fault. Without the isb() here, the VTTBR_EL2 may besynchronized before the rest of the context, so a speculated AT instruction mayuse an inconsistent state and allocate a TLB entry with an unexpectedtranslation against the guest.

So here, we want to ensure the rest of the context is synchronized beforewriting to VTTBR_EL2, hence the isb().

+    WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2);
+
      last_vcpu_ran = &p2m->last_vcpu_ran[smp_processor_id()];

/*

@@ -157,8 +183,23 @@ static void p2m_force_tlb_flush_sync(struct p2m_domain 
*p2m)
      ovttbr = READ_SYSREG64(VTTBR_EL2);
      if ( ovttbr != p2m->vttbr )
      {
+        uint64_t vttbr;
+
          local_irq_save(flags);
-        WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2);
+
+        /*
+         * ARM64_WORKAROUND_AT_SPECULATE: We need to stop AT to allocate
+         * TLBs entries because the context is partially modified. We
+         * only need the VMID for flushing the TLBs, so we can generate
+         * a new VTTBR with the VMID to flush and the empty root table.
+         */
+        if ( !cpus_have_const_cap(ARM64_WORKAROUND_AT_SPECULATE) )
+            vttbr = p2m->vttbr;
+        else
+            vttbr = generate_vttbr(p2m->vmid, empty_root_mfn);
+
+        WRITE_SYSREG64(vttbr, VTTBR_EL2);


Good idea, any reasons not to use generate_vttbr(p2m->vmid,
empty_root_mfn) in the general case? There should be no downsides,
right?

An empty root means you need to have the root page-tables allocated. In theconfiguration we currently support, the could be either 4K or 8K.

Even if this seems small, this is a downside for platforms that don't requiresuch trick.

          /* Ensure VTTBR_EL2 is synchronized before flushing the TLBs */
          isb();
      }
@@ -1504,6 +1545,23 @@ static uint32_t __read_mostly vtcr;
  static void setup_virt_paging_one(void *data)
  {
      WRITE_SYSREG32(vtcr, VTCR_EL2);
+
+    /*
+     * ARM64_WORKAROUND_AT_SPECULATE: We want to keep the TLBs free from
+     * entries related to EL1/EL0 translation regime until a guest vCPU
+     * is running. For that, we need to set-up VTTBR to point to an empty
+     * page-table and turn on stage-2 translation.


I don't understand why this is needed: isn't the lack of HCR_VM (due to
your previous patch) supposed to be sufficient? How can there be
speculation without HCR_VM?

HCR_EL2.VM unsets means the stage-2 will not be used for the EL1/EL0 translationregime. In the context of the erratum, the AT can still speculate except it willnot take into account the stage-2. The dependencies on VMID stills applies whenHCR_EL2.VM is unset, so from my understanding, the entry could get cached towhatever is VTTBR_EL2.VMID.


Even if speculation happens without HCR_EL2, why do we need to set it
now? Isn't setting empty_root_mfn enough?

The main goal here is to have the TLBs in a known state after the CPU has beeninitialized. After the sequence below, we are sure that the TLBs don't containentries associated to the EL1/EL0 regime and and a speculated AT instructionwill not be able to allocate more.

Without HCR_EL2.VM set, the stage-2 page-table will not get used. So aspeculated AT instruction could still allocate an entry in TLB. It is not amajor issue as it would be against INVALID_VMID, yet it is not a very sanesituation for the hypervisor.


Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH for-4.12 6/8] xen/arm: Implement workaround for Cortex-A76 erratum 1165522
  - From: Stefano Stabellini

References:
- Re: [Xen-devel] [PATCH for-4.12 6/8] xen/arm: Implement workaround for Cortex-A76 erratum 1165522
  - From: Stefano Stabellini

Prev by Date: Re: [Xen-devel] [PATCH 1/3] treewide: Lift switch variables out of switches
Next by Date: Re: [Xen-devel] [PATCH] xen/sched: Introduce domain_vcpu() helper
Previous by thread: Re: [Xen-devel] [PATCH for-4.12 6/8] xen/arm: Implement workaround for Cortex-A76 erratum 1165522
Next by thread: Re: [Xen-devel] [PATCH for-4.12 6/8] xen/arm: Implement workaround for Cortex-A76 erratum 1165522
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.