Xen project Mailing List

Re: ocplib+endian improvement

To: Dave Scott <Dave.Scott@xxxxxxxxxxxxx>

From: Anil Madhavapeddy <anil@xxxxxxxxxx>

Date: Tue, 8 Jan 2013 22:33:18 +0000

Cc: Steven Smith <sos22@xxxxxxxxxxxxx>, "cl-mirage@xxxxxxxxxxxxxxx" <cl-mirage@xxxxxxxxxxxxxxx>

List-id: MirageOS development <cl-mirage.lists.cam.ac.uk>

Fixed; it was a minor problem in the C start_info stubs that was truncating the size of the bigarray mapping in OCaml. This only showed up once bounds checking was activated, hence you didn't notice it before. https://github.com/mirage/mirage-platform/commit/b34dd44c5e20f9534d74cef71a8d875005e3cc7e I'm cutting stable releases of all these changes now... On 8 Jan 2013, at 18:54, Dave Scott <Dave.Scott@xxxxxxxxxxxxx> wrote: > That's interesting-- line 306 in ring.ml is an attempt to read the ring input > (from the frontend PoV) producer pointer, in a function (suspiciously) named > "unsafe_read". That ought to be at offset 2048+12 in the page. Perhaps this > buffer isnt actually the xenstore ring? > > Is this easy to repro? > > -- > Dave Scott > > On Jan 8, 2013, at 5:42 PM, "Anil Madhavapeddy" <anil@xxxxxxxxxx> wrote: > >> This turns out to due to a lack of bounds checking, causing a buffer >> overflow. I've re-added the bounds checks into the new cstruct, and now >> get a nice exception instead of a crash from the netfront end. The fact >> that the dom0 crashes is a good case for a Mirage fuzz tester in the >> future ;-) >> >> Dave, it seems like the real problem is in xenstore-client somewhere, >> according to the backtrace: >> >> x86_mm.c: Demand map pfns at 8001000-2008001000. >> gnttab_init: 2047 >> gnttab_stubs.c: gnttab_table mapped at 0x8001000 >> Caught: Invalid_argument("index out of bounds") >> Called from file "lib/ring.ml", line 306, characters 27-51 >> Called from file "core/xs_protocol.ml", line 274, characters 6-31 >> Called from file "client_lwt/xs_client.ml", line 110, characters 30-42 >> Called from file "src/core/lwt.ml", line 679, characters 20-24 >> >> Is this something to do with multipage rings? It seems to read beyond the >> actual xenstore page, which is odd... >> >> -anil >> >>> On Mon, Jan 07, 2013 at 03:08:51PM +0000, Steven Smith wrote: >>>> Right. So I'm going to stop looking at this now. I'd like to say >>>> that's because I've fixed it and everything's fine, but, actually, >>>> it's because the BMC on my test box has failed and I can't find any >>>> other machines which are in a usable state. Pretty much the only >>>> concrete thing which I've found is that the ring indices in the shared >>>> page are corrupted, with what looks like ASCII; not sure if that helps >>>> you at all. >>>> >>>>> [ 39.830497] <0>In xen_netbk_tx_build_gops; done 95000 iterations so far >>>>> [ 39.837138] <0>nr_pending_reqs 0 >>>>> [ 39.840485] <0>Ring status: rsp_prod_pvt 24cda, req_cons 238da >>>>> [ 39.846437] <0>Shared: req_prod 75422121, req_event 69646c69, rsp_prod >>>>> 24cda, rsp_event 6c6d6143 >>>> >>>> Decode indices as ascii: >>>> >>>> 75422121 -> uB!! !!Bu >>>> 69646c69 -> idli ildi >>>> 6c6d6143 -> lmaC Caml >>>> >>>> Which could plausibly be fragments of strings which might be present >>>> in an ocaml program, but it's not really terribly convincing. >>>> >>>> I do have a patch which might at least stop the bug from taking down >>>> dom0, which I've attached, but I've not been able to test it all. >>>> >>>> So that's not desperately helpful, really. Sorry about that. >>>> >>>> Steven. >>>> >>>> >>>> >>>>> http://www.recoil.org/~avsm/www-crashes-pvops.xen is the offending >>>>> kernel, which always gives me the traceback below when booted using xm >>>>> (on Xen 4.1, either dom0 kernel of 3.2 or 3.7)... can you repro on your >>>>> setup? You have to provide one VIF. >>>>> >>>>> -anil >>>>> >>>>> Begin forwarded message: >>>>> >>>>>> From: Anil Madhavapeddy <anil@xxxxxxxxxx> >>>>>> Subject: Re: ocplib+endian improvement >>>>>> Date: 6 January 2013 21:08:18 GMT >>>>>> To: "<cl-mirage@xxxxxxxxxxxxxxx> List" <cl-mirage@xxxxxxxxxxxxxxx> >>>>>> Cc: Steven Smith <steven.smith@xxxxxxxxxxxx> >>>>>> >>>>>> Hm, but trying out TCP seems to trigger a softlockup in Xen netback >>>>>> (both kernels 3.2 and latest 3.7). Will need to do some more debugging >>>>>> tomorrow... >>>>>> >>>>>> Steven (Smith): any luck with the grant-free netback modification? I >>>>>> could try it out at the same time as debugging this particular issue. >>>>>> >>>>>> -anil >>>>>> >>>>>> [ 277.249069] Code: 00 89 c1 7c c8 41 59 c3 90 90 90 65 c6 04 25 41 b1 >>>>>> 00 00 00 65 f6 04 25 40 b1 00 00 ff 74 05 e8 47 00 00 00 c3 66 0f 1f 44 >>>>>> 00 00 <65> c6 04 25 41 b1 00 00 01 c3 66 0f 1f 44 00 00 65 f6 04 25 41 >>>>>> [ 305.248782] BUG: soft lockup - CPU#0 stuck for 23s! [netback/0:3772] >>>>>> [ 305.248883] Modules linked in: xt_physdev iptable_filter ip_tables >>>>>> x_tables xen_netback xen_gntdev xen_evtchn xenfs xen_privcmd nfsd >>>>>> auth_rpcgss nfs_acl nfs lockd dns_resolver fscache sunrpc bridge stp llc >>>>>> loop crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper >>>>>> cryptd xts lrw gf128mul snd_pcm sp5100_tco snd_page_alloc snd_timer snd >>>>>> soundcore tpm_tis tpm amd64_edac_mod edac_mce_amd i2c_piix4 i2c_core >>>>>> dcdbas evdev pcspkr tpm_bios edac_core microcode psmouse fam15h_power >>>>>> k10temp serio_raw acpi_power_meter button processor thermal_sys ext4 >>>>>> crc16 jbd2 mbcache dm_mod sg sd_mod crc_t10dif ata_generic ohci_hcd >>>>>> pata_atiixp ixgbe ahci ptp libahci pps_core libata ehci_hcd dca mdio >>>>>> scsi_mod bnx2 usbcore usb_common >>>>>> [ 305.248932] CPU 0 >>>>>> [ 305.248934] Pid: 3772, comm: netback/0 Tainted: G W >>>>>> 3.7-trunk-amd64 #1 Debian 3.7.1-1~experimental.2 Dell Inc. PowerEdge >>>>>> R415/08WNM9 >>>>>> [ 305.248936] RIP: e030:[<ffffffffa021c153>] [<ffffffffa021c153>] >>>>>> xen_netbk_tx_build_gops+0x19d/0x7ad [xen_netback] >>>>>> [ 305.248941] RSP: e02b:ffff88020eba3ca8 EFLAGS: 00000217 >>>>>> [ 305.248943] RAX: 0000000073202626 RBX: ffffc90007ab5000 RCX: >>>>>> 000000001ce71e08 >>>>>> [ 305.248945] RDX: 000000001ce71e08 RSI: ffffc90007ab02a8 RDI: >>>>>> ffff880653403800 >>>>>> [ 305.248946] RBP: ffffc90007ab70c0 R08: ffff8806534038d8 R09: >>>>>> ffff88020eba3c74 >>>>>> [ 305.248947] R10: ffffc90007ab0208 R11: ffffc90007ab0208 R12: >>>>>> ffff880653403800 >>>>>> [ 305.248949] R13: 0000000000007320 R14: 0000000000000000 R15: >>>>>> 000000000104210e >>>>>> [ 305.248953] FS: 00007f59123a4700(0000) GS:ffff8807ff800000(0000) >>>>>> knlGS:0000000000000000 >>>>>> [ 305.248955] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>> [ 305.248956] CR2: 00007fd7c822b070 CR3: 00000007f2177000 CR4: >>>>>> 0000000000000660 >>>>>> [ 305.248958] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>>>>> 0000000000000000 >>>>>> [ 305.248960] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: >>>>>> 0000000000000400 >>>>>> [ 305.248961] Process netback/0 (pid: 3772, threadinfo >>>>>> ffff88020eba2000, task ffff8807ee921180) >>>>>> [ 305.248963] Stack: >>>>>> [ 305.248964] ffffffff81004067 ffffffff81004202 1ce71e0800000003 >>>>>> ffff8807ee921180 >>>>>> [ 305.248968] ffffc90007ab70c0 ffff8807ff811740 0000000000000000 >>>>>> 2074706fffff2e2f >>>>>> [ 305.248972] ffff4f2073202626 2074706f20646c72 ffffffff810037f7 >>>>>> ffff8807ee921180 >>>>>> [ 305.248975] Call Trace: >>>>>> [ 305.248978] [<ffffffff81004067>] ? arch_local_irq_restore+0x7/0x8 >>>>>> [ 305.248982] [<ffffffff81004202>] ? xen_mc_flush+0x11d/0x160 >>>>>> [ 305.248985] [<ffffffff810037f7>] ? >>>>>> xen_mc_issue.constprop.22+0x10/0x4d >>>>>> [ 305.248988] [<ffffffff8100d02f>] ? load_TLS+0x7/0xa >>>>>> [ 305.248991] [<ffffffff8100d60c>] ? __switch_to+0x195/0x3f8 >>>>>> [ 305.248994] [<ffffffff8105fadb>] ? mmdrop+0xd/0x1c >>>>>> [ 305.248996] [<ffffffff81061390>] ? finish_task_switch+0x83/0xb4 >>>>>> [ 305.249000] [<ffffffff813778e9>] ? __schedule+0x4b2/0x4e0 >>>>>> [ 305.249003] [<ffffffff8107f0d3>] ? arch_local_irq_disable+0x7/0x8 >>>>>> [ 305.249006] [<ffffffff81378465>] ? _raw_spin_lock_irqsave+0x14/0x35 >>>>>> [ 305.249009] [<ffffffff8107f0d3>] ? arch_local_irq_disable+0x7/0x8 >>>>>> [ 305.249012] [<ffffffff81378465>] ? _raw_spin_lock_irqsave+0x14/0x35 >>>>>> [ 305.249016] [<ffffffffa021c897>] ? xen_netbk_kthread+0x134/0x78d >>>>>> [xen_netback] >>>>>> [ 305.249019] [<ffffffff8105d78f>] ? arch_local_irq_enable+0x7/0x8 >>>>>> [ 305.249022] [<ffffffff81061357>] ? finish_task_switch+0x4a/0xb4 >>>>>> [ 305.249025] [<ffffffff81057987>] ? abort_exclusive_wait+0x79/0x79 >>>>>> [ 305.249029] [<ffffffffa021c763>] ? >>>>>> xen_netbk_tx_build_gops+0x7ad/0x7ad [xen_netback] >>>>>> [ 305.249032] [<ffffffffa021c763>] ? >>>>>> xen_netbk_tx_build_gops+0x7ad/0x7ad [xen_netback] >>>>>> [ 305.249035] [<ffffffff810570ac>] ? kthread+0x81/0x89 >>>>>> [ 305.249038] [<ffffffff810037f7>] ? >>>>>> xen_mc_issue.constprop.22+0x10/0x4d >>>>>> [ 305.249041] [<ffffffff8105702b>] ? __kthread_parkme+0x5c/0x5c >>>>>> [ 305.249043] [<ffffffff8137d6bc>] ? ret_from_fork+0x7c/0xb0 >>>>>> [ 305.249046] [<ffffffff8105702b>] ? __kthread_parkme+0x5c/0x5c >>>>>> [ 305.249048] Code: bc 24 80 00 00 00 4d 89 a4 24 a8 00 00 00 49 c7 84 >>>>>> 24 a0 00 00 00 58 bf 21 a0 4c 89 f6 e8 86 d9 e2 e0 e9 c3 05 00 00 8b 54 >>>>>> 24 14 <66> 8b 74 24 3e 41 8d 4f ff 0f b7 44 24 42 48 c7 44 24 30 00 00 >>>>>> avsm@gabriel:~/src/git/mirage/mirage-www$ >>>>>> Message from syslogd@gabriel at Jan 6 21:06:38 ... >>>>>> >>>>>> On 6 Jan 2013, at 18:44, Anil Madhavapeddy <anil@xxxxxxxxxx> wrote: >>>>>> >>>>>>> I've been porting the network stack to take advantage of the cstruct >>>>>>> turbo-boost that Pierre and Thomas worked on. This optimisation adds >>>>>>> compiler built-ins (in 4.01.0+) which let the code generator optimise >>>>>>> away many of the temporary values required for low-level optimisation. >>>>>>> >>>>>>> Here's a (very quick) before/after for a ping flood (which is a good >>>>>>> stress test of the low-level shared ring, network driver and protocol >>>>>>> stack). >>>>>>> >>>>>>> For a ping flood With 4.00.1 without the optimisation: >>>>>>> 73755 packets transmitted, 73702 received, +49 duplicates, 0% packet >>>>>>> loss, time 6283ms >>>>>>> rtt min/avg/max/mdev = 0.031/0.228/1209.178/9.887 ms, pipe 14850, >>>>>>> ipg/ewma 0.085/0.036 ms >>>>>>> >>>>>>> and with the optimisation: >>>>>>> 41791 packets transmitted, 41764 received, +25 duplicates, 0% packet >>>>>>> loss, time 3539msrtt min/avg/max/mdev = 0.030/0.188/1261.042/8.459 ms, >>>>>>> pipe 14742, ipg/ewma 0.084/0.039 ms >>>>>>> >>>>>>> So our average latency drops quite significantly (0.228 -> 0.188), as >>>>>>> does CPU load (not shown above). >>>>>>> >>>>>>> I've not committed these changes to the mainstream yet until I test out >>>>>>> TCP more, but it's getting there! >>>>>>> >>>>>>> -anil >>>>>> >>>>>> >>>>> >>> >>>> diff --git a/drivers/net/xen-netback/netback.c >>>> b/drivers/net/xen-netback/netback.c >>>> index 3f27707..1a6f816 100644 >>>> --- a/drivers/net/xen-netback/netback.c >>>> +++ b/drivers/net/xen-netback/netback.c >>>> @@ -776,8 +776,7 @@ static struct xenvif *poll_net_schedule_list(struct >>>> xen_netbk *netbk) >>>> >>>> vif = list_first_entry(&netbk->net_schedule_list, >>>> struct xenvif, schedule_list); >>>> - if (!vif) >>>> - goto out; >>>> + BUG_ON(!vif); >>>> >>>> xenvif_get(vif); >>>> >>>> @@ -857,7 +856,9 @@ static void tx_credit_callback(unsigned long data) >>>> } >>>> >>>> static void netbk_tx_err(struct xenvif *vif, >>>> - struct xen_netif_tx_request *txp, RING_IDX end) >>>> + struct xen_netif_tx_request *txp, >>>> + RING_IDX end, >>>> + int fatal) >>>> { >>>> RING_IDX cons = vif->tx.req_cons; >>>> >>>> @@ -868,8 +869,15 @@ static void netbk_tx_err(struct xenvif *vif, >>>> txp = RING_GET_REQUEST(&vif->tx, cons++); >>>> } while (1); >>>> vif->tx.req_cons = cons; >>>> - xen_netbk_check_rx_xenvif(vif); >>>> + if (!fatal) { >>>> + /* If this is a fatal error then we don't need to poll >>>> + the device again (and doing so might lead to some >>>> + quite bad behaviour if the ring structure has >>>> + become corrupted in some way). */ >>>> + xen_netbk_check_rx_xenvif(vif); >>>> + } >>>> xenvif_put(vif); >>>> + >>>> } >>>> >>>> static int netbk_count_requests(struct xenvif *vif, >>>> @@ -1232,7 +1240,9 @@ static unsigned xen_netbk_tx_build_gops(struct >>>> xen_netbk *netbk) >>>> struct gnttab_copy *gop = netbk->tx_copy_ops, *request_gop; >>>> struct sk_buff *skb; >>>> int ret; >>>> + int cntr; >>>> >>>> + cntr = 0; >>>> while (((nr_pending_reqs(netbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS) && >>>> !list_empty(&netbk->net_schedule_list)) { >>>> struct xenvif *vif; >>>> @@ -1245,15 +1255,43 @@ static unsigned xen_netbk_tx_build_gops(struct >>>> xen_netbk *netbk) >>>> int work_to_do; >>>> unsigned int data_len; >>>> pending_ring_idx_t index; >>>> + int loud; >>>> + >>>> + cntr++; >>>> + loud = 0; >>>> + if (cntr % 1000 == 0) { >>>> + printk("<0>In xen_netbk_tx_build_gops; done %d >>>> iterations so far\n", >>>> + cntr); >>>> + printk("<0>nr_pending_reqs %d\n", >>>> + nr_pending_reqs(netbk)); >>>> + loud = 1; >>>> + } >>>> >>>> /* Get a netif from the list with work to do. */ >>>> vif = poll_net_schedule_list(netbk); >>>> - if (!vif) >>>> + if (!vif) { >>>> + /* This can sometimes happen because the test >>>> + of list_empty(net_schedule_list) at the top >>>> + of the loop is unlocked. Just go back and >>>> + have another look. */ >>>> + printk("<0>No vif, keep going\n"); >>>> continue; >>>> + } >>>> + >>>> + if (loud) { >>>> + printk("<0>Ring status: rsp_prod_pvt %x, req_cons >>>> %x\n", >>>> + vif->tx.rsp_prod_pvt, vif->tx.req_cons); >>>> + printk("<0>Shared: req_prod %x, req_event %x, rsp_prod >>>> %x, rsp_event %x\n", >>>> + vif->tx.sring->req_prod, >>>> + vif->tx.sring->req_event, >>>> + vif->tx.sring->rsp_prod, >>>> + vif->tx.sring->rsp_event); >>>> + } >>>> >>>> RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, work_to_do); >>>> if (!work_to_do) { >>>> xenvif_put(vif); >>>> + printk("<0>No work on %p\n", vif); >>>> continue; >>>> } >>>> >>>> @@ -1264,6 +1302,7 @@ static unsigned xen_netbk_tx_build_gops(struct >>>> xen_netbk *netbk) >>>> /* Credit-based scheduling. */ >>>> if (txreq.size > vif->remaining_credit && >>>> tx_credit_exceeded(vif, txreq.size)) { >>>> + printk("<0>Stopped by credit scheduler\n"); >>>> xenvif_put(vif); >>>> continue; >>>> } >>>> @@ -1279,14 +1318,14 @@ static unsigned xen_netbk_tx_build_gops(struct >>>> xen_netbk *netbk) >>>> work_to_do); >>>> idx = vif->tx.req_cons; >>>> if (unlikely(work_to_do < 0)) { >>>> - netbk_tx_err(vif, &txreq, idx); >>>> + netbk_tx_err(vif, &txreq, idx, 1); >>>> continue; >>>> } >>>> } >>>> >>>> ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do); >>>> if (unlikely(ret < 0)) { >>>> - netbk_tx_err(vif, &txreq, idx - ret); >>>> + netbk_tx_err(vif, &txreq, idx - ret, 1); >>>> continue; >>>> } >>>> idx += ret; >>>> @@ -1294,7 +1333,7 @@ static unsigned xen_netbk_tx_build_gops(struct >>>> xen_netbk *netbk) >>>> if (unlikely(txreq.size < ETH_HLEN)) { >>>> netdev_dbg(vif->dev, >>>> "Bad packet size: %d\n", txreq.size); >>>> - netbk_tx_err(vif, &txreq, idx); >>>> + netbk_tx_err(vif, &txreq, idx, 1); >>>> continue; >>>> } >>>> >>>> @@ -1304,7 +1343,7 @@ static unsigned xen_netbk_tx_build_gops(struct >>>> xen_netbk *netbk) >>>> "txreq.offset: %x, size: %u, end: %lu\n", >>>> txreq.offset, txreq.size, >>>> (txreq.offset&~PAGE_MASK) + txreq.size); >>>> - netbk_tx_err(vif, &txreq, idx); >>>> + netbk_tx_err(vif, &txreq, idx, 1); >>>> continue; >>>> } >>>> >>>> @@ -1320,7 +1359,7 @@ static unsigned xen_netbk_tx_build_gops(struct >>>> xen_netbk *netbk) >>>> if (unlikely(skb == NULL)) { >>>> netdev_dbg(vif->dev, >>>> "Can't allocate a skb in start_xmit.\n"); >>>> - netbk_tx_err(vif, &txreq, idx); >>>> + netbk_tx_err(vif, &txreq, idx, 0); >>>> break; >>>> } >>>> >>>> @@ -1333,7 +1372,8 @@ static unsigned xen_netbk_tx_build_gops(struct >>>> xen_netbk *netbk) >>>> >>>> if (netbk_set_skb_gso(vif, skb, gso)) { >>>> kfree_skb(skb); >>>> - netbk_tx_err(vif, &txreq, idx); >>>> + netbk_tx_err(vif, &txreq, idx, 1); >>>> + printk("<0>Killed by netbk_set_skb_gso\n"); >>>> continue; >>>> } >>>> } >>>> @@ -1342,7 +1382,8 @@ static unsigned xen_netbk_tx_build_gops(struct >>>> xen_netbk *netbk) >>>> page = xen_netbk_alloc_page(netbk, skb, pending_idx); >>>> if (!page) { >>>> kfree_skb(skb); >>>> - netbk_tx_err(vif, &txreq, idx); >>>> + netbk_tx_err(vif, &txreq, idx, 0); >>>> + printk("<0>No pages for payload?\n"); >>>> continue; >>>> } >>>> >>>> @@ -1382,7 +1423,7 @@ static unsigned xen_netbk_tx_build_gops(struct >>>> xen_netbk *netbk) >>>> skb, txfrags, gop); >>>> if (request_gop == NULL) { >>>> kfree_skb(skb); >>>> - netbk_tx_err(vif, &txreq, idx); >>>> + netbk_tx_err(vif, &txreq, idx, 0); >>>> continue; >>>> } >>>> gop = request_gop; >>> >>> >>> -- >>> Anil Madhavapeddy http://anil.recoil.org >>> >> >> -- >> Anil Madhavapeddy http://anil.recoil.org >> >

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.