[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/arm: smmuv1: remove iommu group when deassign a device
Hi , > On 28 Jun 2022, at 6:23 pm, Mykyta Poturai <mykyta.poturai@xxxxxxxxx> wrote: > >> Hi Mykyta, >> >>> On 21 Jun 2022, at 10:38 am, Mykyta Poturai <mykyta.poturai@xxxxxxxxx> >>> wrote: >>> >>>> Thanks for testing the patch. >>>>> But not fixed the "Unexpected Global fault" that occasionally happens >>>>> when destroying >>>>> the domain with an actively working GPU. Although, I am not sure if this >>>>> issue >>>>> is relevant here. >>>> >>>> Can you please if possible share the more details and logs so that I can >>>> look if this issue is relevant here ? >>> >>> So in my setup I have a board with IMX8 chip and 2 core Vivante GPU. GPU is >>> split between domains. >>> One core goes to Dom0 and one to DomU. >>> >>> Steps to trigger this issue: >>> 1. Start DomU >>> 2. Start wayland and glmark2-es2-wayland inside DomU >>> 3. Destroy DomU >>> >>> Sometimes it destroys fine but roughly 1 of 8 times I get logs like this: >>> >>> root@dom0:~# xl dest DomU >>> [12725.412940] xenbr0: port 1(vif8.0) entered disabled state >>> [12725.671033] xenbr0: port 1(vif8.0) entered disabled state >>> [12725.689923] device vif8.0 left promiscuous mode >>> [12725.696736] xenbr0: port 1(vif8.0) entered disabled state >>> [12725.696989] audit: type=1700 audit(1616594240.068:39): dev=vif8.0 prom=0 >>> old_prom=256 auid=4294967295 uid=0 gid=0 ses=4294967295 >>> (XEN) smmu: /iommu@51400000: Unexpected global fault, this could be serious >>> (XEN) smmu: /iommu@51400000: GFSR 0x00000001, GFSYNR0 0x00000004, >>> GFSYNR1 0x00001055, GFSYNR2 0x00000000 >>> >>> My guess is that this happens because GPU continues to access memory when >>> the context is already invalidated, >>> and therefore triggers the "Invalid context fault". >> >> Yes you are right in this case GPU trying to do DMA operation after Xen >> destroyed the guest and configures >> the S2CR type value to fault. Solution to this issue is the patch that I >> shared earlier. >> >> You can try this patch and confirm.This patch will solve both the issues. >> >> diff --git a/xen/drivers/passthrough/arm/smmu.c >> b/xen/drivers/passthrough/arm/smmu.c >> index 5cacb2dd99..ff1b73d3d8 100644 >> --- a/xen/drivers/passthrough/arm/smmu.c >> +++ b/xen/drivers/passthrough/arm/smmu.c >> @@ -1680,6 +1680,10 @@ static int arm_smmu_attach_dev(struct iommu_domain >> *domain, struct device *dev) >> if (!cfg) >> return -ENODEV; >> >> + ret = arm_smmu_master_alloc_smes(dev); >> + if (ret) >> + return ret; >> + >> return arm_smmu_domain_add_master(smmu_domain, cfg); >> } >> >> @@ -2075,7 +2079,7 @@ static int arm_smmu_add_device(struct device *dev) >> iommu_group_add_device(group, dev); >> iommu_group_put(group); >> >> - return arm_smmu_master_alloc_smes(dev); >> + return 0; >> } >> >> >> Regards, >> Rahul > > Hi Rahul, > > With this patch I get the same results, here is the error message: > > (XEN) smmu: /iommu@51400000: Unexpected global fault, this could be serious > (XEN) smmu: /iommu@51400000: GFSR 0x00000001, GFSYNR0 0x00000004, GFSYNR1 > 0x00001055, GFSYNR2 0x00000000 > As you mentioned earlier, this fault is observed because GPU continues to access memory when the context is already invalidated, and therefore triggers the "Invalid context fault". This is a different issue and is not related to this patch. @Julien are you okay if I will send the below patch for official review as this issue pending for a long time? In this patch we don’t need to call arm_smmu_master_free_smes() in arm_smmu_detach_dev() but we will implement new function arm_smmu_domain_remove_master() that will configure the s2cr value to type fault in detach function. arm_smmu_domain_remove_master() will revert the changes done by arm_smmu_domain_add_master() in attach function. diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c index 69511683b4..da3adf8e7f 100644 --- a/xen/drivers/passthrough/arm/smmu.c +++ b/xen/drivers/passthrough/arm/smmu.c @@ -1598,21 +1598,6 @@ out_err: return ret; } -static void arm_smmu_master_free_smes(struct arm_smmu_master_cfg *cfg) -{ - struct arm_smmu_device *smmu = cfg->smmu; - int i, idx; - struct iommu_fwspec *fwspec = arm_smmu_get_fwspec(cfg); - - spin_lock(&smmu->stream_map_lock); - for_each_cfg_sme(cfg, i, idx, fwspec->num_ids) { - if (arm_smmu_free_sme(smmu, idx)) - arm_smmu_write_sme(smmu, idx); - cfg->smendx[i] = INVALID_SMENDX; - } - spin_unlock(&smmu->stream_map_lock); -} - static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain, struct arm_smmu_master_cfg *cfg) { @@ -1635,6 +1620,20 @@ static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain, return 0; } +static void arm_smmu_domain_remove_master(struct arm_smmu_domain *smmu_domain, + struct arm_smmu_master_cfg *cfg) +{ + struct arm_smmu_device *smmu = smmu_domain->smmu; + struct arm_smmu_s2cr *s2cr = smmu->s2crs; + struct iommu_fwspec *fwspec = arm_smmu_get_fwspec(cfg); + int i, idx; + + for_each_cfg_sme(cfg, i, idx, fwspec->num_ids) { + s2cr[idx] = s2cr_init_val; + arm_smmu_write_s2cr(smmu, idx); + } +} + static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) { int ret; @@ -1684,10 +1683,11 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) static void arm_smmu_detach_dev(struct iommu_domain *domain, struct device *dev) { + struct arm_smmu_domain *smmu_domain = domain->priv; struct arm_smmu_master_cfg *cfg = find_smmu_master_cfg(dev); if (cfg) - arm_smmu_master_free_smes(cfg); + return arm_smmu_domain_remove_master(smmu_domain, cfg); } Regards, Rahul
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |