[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0


  • To: G.R. <firemeteor@xxxxxxxxxxxxxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 29 Dec 2021 09:32:38 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Pe8tWSAxsYXH3EC6+zhM3nXnUZ3ZtbBnfaEIj5tNTgw=; b=WwGyHSeMjZSzhda/gUBW4XMK52ATbx38w/2r2gUdWKkK7jkzfAHRhEguymzLt3gIMAPX6XLe3K2tM1Mp4pFm6qxLj85V5Rb24nEFQE91PpTdhRPiifA3X561Smjc0rWD3rKHVnzAK6Qm4MSn0aed42ehDfxAH6xvCF7Z9uQ6vcbfqzEAKIRYfGJ808xtCLh7KROEZW3+CsX4HhbDV9cT3OznMCgrC7sezzq0mE5kS6wJrdkZBIvHLZdmHYn5Bvk0pcxJa8Y6VhdikeekXHMjH/CqPX/G93nkryVI7B7UR4NWNL3PfgFDQ993qcmBR/WCBAH0HqfCyLB4vwgTKiLeCQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EV8MxdsCWvdBlVHI8KghG518RdJAwoKo/i0GJTThbKQW0lm991NuC4/Vws07ks89B5mTBmW8ahjagpwUUR3A5NNDOtdQIQdcoB9ptYcZwLusTzTkZlbiVZyg8AAM+fmsDjkbIzB2Ft1X4Xz7V2z9l4mTUJcJjupi+EYd3gLj0AXXUcG7kNedBPStb/4i8J5ojpNg63vz7CJWFUK9gouRi1W/0Pwepfho4aYEiopRoC7Dzlf6dbqEaHQtKu3uEAuP/WOymzymCd/e+XlDldrZyij101ZiWKIXff8vwI8AB9SfYBPZwj146v2yZ/+5CuWevflDMABqJZXsFqmtw/6/bg==
  • Authentication-results: esa2.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 29 Dec 2021 08:33:31 +0000
  • Ironport-data: A9a23:Tnxj36NVe+F1hv/vrR17kMFynXyQoLVcMsEvi/4bfWQNrUoh1TNUn DcdDWCBPvfYamD2e411b47j9k0CvsPRx4A3SAto+SlhQUwRpJueD7x1DKtR0wB+jCHnZBg6h ynLQoCYdKjYdpJYz/uUGuCJQUNUjMlkfZKhTr6UUsxNbVU8En5400gyw7dRbrNA2rBVPSvc4 bsenOWHULOV82Yc3rU8sv/rRLtH5ZweiRtA1rAMTakjUGz2zhH5OKk3N6CpR0YUd6EPdgKMq 0Qv+5nilo/R109F5tpICd8XeGVSKlLZFVDmZna7x8FOK/WNz8A/+v9TCRYSVatYozmmkukg+ opdjq6TV1sYBp/3mPUXCBYNRkmSPYUekFPGCX22sMjVxEzaaXr8hf5pCSnaP6VBpLwxWzsXs 6VFdnZdNXhvhMrvqF6/YvNrick5atHiIasUu216zCGfBvEjKXzGa/uStIMDjGlo7ixINansX NsgTyRzVxrvTwJWCGZPI41kxOj90xETdBUH8QnI9MLb+VP73AF3lbTgLtfRUtiLXtlO2FaVo Hrc+Gb0CQ1cM8aQoRKL82ihg+LTkCThcJ8JGaejsOVtnUeYy2IUEhIbE122vZGEZlWWAowFb RZOo2x38PZ0pBfDosTBswOQrFGJh08uBMprIc4ktDC1+JHU8gzCCT1RJtJeU+AOuMgzTD0s8 1aGmdL1GDBi2IGopWKhGqS89m3rZ3VMRYMWTWpdFFZevYG/yG0mpk+XJuuPBpJZmTEc9dvY5 zmR5BYziLwI5SLg//XqpAuX695AS3Wgc+LU2uk1dj70hu+aTNT8D2BN1bQ9xawfRGp+ZgPc1 EXoY+DEsIgz4WilzURhutklErCz/OqiOzbBm1NpFJRJ323zpyb8JN4Bu24lfBsB3iM4ldnBO hC7VeR5vsE7AZdXRfUvP9LZ5zoCk8AM6ugJptiLN4ETM/CdhSeM/T10ZF744oweuBNErE3LA r/CKZzEJS9DUcxPlWPqL89Age5D7n1vngv7GMGkpylLJJLDPRZ5v59eawDQBg34hYvZyDjoH yF3a5HXlk4BCbKmOUE6M+c7dDg3EJTyPrivw+R/fe+fOAt2XmYnDv7a27Q6fIJ52a9Sk4/1E ruVAye0EXLz2i/KLxukcHdmZO+9VJpztytjbyctIUypyz4oZoP2tPUTcJ4+fL8G8u1/zKErE 6lZKpvYWvkfGC7a/zk9bIXmqNAwfhqcmg/TbTGuZyIyfsA8SlWRqMPkZAbm6AIHEjGz6Zklu 7Sl2w6CGcgDSg1uAdz4cvWqy1/t73ERlPgrBxnDI8VJeVWq+49vcnSjgvgyKsAKCBPC2jrFi FrGXUZG/bHA+tZn/sPIiKaIq5aSP9F/RkcKTXPG6buWNDXB+jbxy4F3T+vVLyvWU3n5+fv+a LwNne38KvAOgH1Dr5F4T+Rw1as76tbi++1awwBjECmZZlinEOo9cHyP3M0JvaxR3L5J/wCxX xvXqNVdPLyIPuLjEUIQe1V5PrjSi6lMl2mA9+kxLWX7+DRzreiOXkhlNhWRjDBQceluO4Q/z OZ94MMb5mRTUPbx3gpqWsyMy1mxEw==
  • Ironport-hdrordr: A9a23:FvNmEa3dvW2tWnjeOJXpdgqjBSpyeYIsimQD101hICG9Lfb2qy n+ppgmPEHP5Qr5OEtApTiBUJPwJk800aQFm7X5Wo3SITUO2VHYV72KiLGN/9SOIVydygcw79 YET0E6MqyNMbEYt7eK3ODbKadY/DDvysnB7o2/vhRQpENRGtldBm9Ce3im+yZNNW977PQCZf 6hDp0tnUveRZ1bVLXxOlA1G8z44/HbnpPvZhALQzYh9Qm1lDutrJr3CQKR0BsyWy5Ghe5Kyx mJryXJooGY992rwB7V0GHeq7xQhdva09NGQOiBkNIcJDnAghuhIK5hR7qBljYop/zH0idhrP D85zMbe+hj4XLYeW+45TPrxgnbyT4rr0TvzFeJ6EGT1/DRdXYfMY5slIhZehzW5w4Lp9dnyp 9G2Gqfqt5+EQ7AtD6V3amHazha0m6P5VYym+8aiHJSFaEEbqVKkIAZ9ERJVL8dASPB7pw9Gu UGNrCS2B9vSyLbU5nlhBgt/DT1NU5DXCtuA3Jy9vB96gIm3UyQlCAjtYkidnRpzuNLd3AL3Z WBDk1SrsA8ciYhV9MIOA4we7rGNoXze2O/DIuzGyWvKEhVAQOEl3bIiI9Fkd1CPqZ4i6cPpA ==
  • Ironport-sdr: o6Dso3i3l9d8e+WZvWtdCZKrakUO8ldAFqlGi7U+euHuADnVUo4t638gOBLRDA2chrOF4AUO9X QgbiladUZP+W1W/04I8BreyxSijyCEAggultACDQpJqnAUUhtgJdJHQ51MKmHiDUYvJOfzjlHb HffQVUtEd9cMq8s20ANTGDai+RIudNxYSluwgGaZAkKrGNBcg8merfeQJVruYYfiRq7HS0N/d5 rmPdQm/Rt6x1MmJlSqjB/5ICDA2fH/6n82c4dKZyOX7gK/Uir2NoLkgQm9ATFFPUdmhJfNDZnH 8JMQOS4ctS7i2s3dUhCpd2H6
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Adding xen-devel back.

On Wed, Dec 29, 2021 at 01:44:18AM +0800, G.R. wrote:
> On Tue, Dec 28, 2021 at 3:05 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
> >
> > On Sun, Dec 26, 2021 at 02:06:55AM +0800, G.R. wrote:
> > > > > Thanks. I've raised this on freensd-net for advice [0]. IMO netfront
> > > > > shouldn't receive an mbuf that crosses a page boundary, but if that's
> > > > > indeed a legit mbuf I will figure out the best way to handle it.
> > > > >
> > > > > I have a clumsy patch (below) that might solve this, if you want to
> > > > > give it a try.
> > > >
> > > > Applied the patch and it worked like a charm!
> > > > Thank you so much for your quick help!
> > > > Wish you a wonderful holiday!
> > >
> > > I may have said too quickly...
> > > With the patch I can attach the iscsi disk and neither the dom0 nor
> > > the NAS domU complains this time.
> > > But when I attempt to mount the attached disk it reports I/O errors 
> > > randomly.
> > > By randomly I mean different disks behave differently...
> > > I don't see any error logs from kernels this time.
> > > (most of the iscsi disks are NTFS FS and mounted through the user mode
> > > fuse library)
> > > But since I have a local backup copy of the image, I can confirm that
> > > mounting that backup image does not result in any I/O error.
> > > Looks like something is still broken here...
> >
> > Indeed. That patch was likely too simple, and didn't properly handle
> > the split of mbuf data buffers.
> >
> > I have another version based on using sglist, which I think it's also
> > a worthwhile change for netfront. Can you please give it a try? I've
> > done a very simple test and seems fine, but you certainly have more
> > interesting cases.
> >
> > You will have to apply it on top of a clean tree, without any of the
> > other patches applied.
> 
> Unfortunately this new version is even worse.
> It not only does not fix the known issue on iSCSI, but also creating
> regression on NFS.
> The regression on NFS is kind of random that it takes a
> non-deterministic time to show up.
> Here is a stack trace for reference:
> db:0:kdb.enter.default>  bt
> Tracing pid 1696 tid 100622 td 0xfffff800883d5740
> kdb_enter() at kdb_enter+0x37/frame 0xfffffe009f80d900
> vpanic() at vpanic+0x197/frame 0xfffffe009f80d950
> panic() at panic+0x43/frame 0xfffffe009f80d9b0
> xn_txq_mq_start_locked() at xn_txq_mq_start_locked+0x5bc/frame
> 0xfffffe009f80da50

I think this is hitting a KASSERT, could you paste the text printed as
part of the panic (not just he backtrace)?

Sorry this is taking a bit of time to solve.

Thanks!



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.