> In our case, the foreign pages actually originate from blkback, are
> passed to iSCSI for processing, and are abused by the ref manipulation
> in the dom0 network stack.  On return to blkback, the page refs are
> off.  What we haven't been able to do yet, is identify the exact
> circumstances that trigger the issue.  We have a fairly elaborate
> reproducer involving running a pool of domains and continuously
> rebooting them.  Eventually, one domain will hang on exit with a stuck
> page with elevated ref counts.
> In our case, the stuck page is always a blkback I/O page.
> Running the same test on a FC SAN or local SCSI backend device doesn't
> hang.

I'd be inclined to investigate this by hacking the start_xmit function
of the NIC driver to randomly corrupt 1 in 100 packets. That's usually a
good way of exercising some of the darker corners of the networking
stack. (Better than creating a netfilter DROP rule).


