[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [MirageOS-devel] Systematic crash on create_bounce_frame when hitting specific data allocation threshold
On 14 December 2016 at 11:35, Vittorio Cozzolino <vittorio.cozzolino@xxxxxxxxx> wrote: > Hi, > I'm running a unikernel on XEN that basically accesses a remote DB, fetches > and computes some data, sends out the result. Apparently, if I try to fetch > and parse a JSON response greater than a empirically found threshold > (details at the bottom of the email), the PVM XEN unikernel just crashes and > this is wait I see when running sudo xl dmesg: > > (XEN) Pagetable walk from 00000000002c9ff8: > (XEN) L4[0x000] = 00000010b5f67067 0000000000000567 > (XEN) L3[0x000] = 00000010b5f68067 0000000000000568 > (XEN) L2[0x001] = 00000010b5f6a067 000000000000056a > (XEN) L1[0x0c9] = 00100010b1ac9025 00000000000002c9 > (XEN) domain_crash_sync called from entry.S: fault at ffff82d0802261be > create_bounce_frame+0x66/0x13a > (XEN) Domain 23 (vcpu#0) crashed on cpu#17: > (XEN) ----[ Xen-4.6.0 x86_64 debug=n Not tainted ]---- > (XEN) CPU: 17 > (XEN) RIP: e033:[<0000000000258cf4>] > (XEN) RFLAGS: 0000000000010206 EM: 1 CONTEXT: pv guest (d23v0) > (XEN) rax: 0000000000258cf0 rbx: 0000000000000000 rcx: 0000000000000073 > (XEN) rdx: 0000000000442528 rsi: 0000000000000000 rdi: 00000000002ca018 > (XEN) rbp: 00000000002ca1e8 rsp: 00000000002ca000 r8: 0000000000000002 > (XEN) r9: 0000000000000007 r10: 0000000000000007 r11: 0000000000000000 > (XEN) r12: 00000000002ca118 r13: 0000000000000000 r14: 00000011238fa000 > (XEN) r15: 0000000000000074 cr0: 0000000080050033 cr4: 00000000001526e0 > (XEN) cr3: 00000010b5f66000 cr2: 00000000002c9ff8 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 > (XEN) Guest stack trace from rsp=00000000002ca000: > (XEN) 00000000002ca118 0000000000000000 000000000025933f 0000000000000074 > (XEN) 00000011238fa000 0000000000000000 00000000002ca118 00000000002ca1e8 > (XEN) 0000000000000000 0000000000000000 0000000000000007 0000000000000007 > (XEN) 0000000000000002 ffff800000000000 0000000000000073 0000000000442528 > (XEN) 00000000002ca118 0000000000000000 ffffffffffffffff 0000000000256708 > (XEN) 000000010000e030 0000000000010006 00000000002ca0c8 000000000000e02b > (XEN) 0000000000000ffc 3736353433323130 4645444342413938 4e4d4c4b4a494847 > (XEN) 00000000002ca18b 00000000002ca1e8 00000000002ca18a 0000000000000074 > (XEN) 00000000002566a0 00000000002ca118 00000000002561bc 7561662065676150 > (XEN) 696c20746120746c 646461207261656e 3062642073736572 706972202c306433 > (XEN) 2c38303736353220 3030207367657220 3030303030303030 202c383333616332 > (XEN) 6533616332207073 735f72756f202c38 3030303030302070 3261633230303030 > (XEN) 65646f63202c3866 ffffffff0a0d3020 0000000000000bfc 61665f686374614d > (XEN) 0200006572756c69 0000000000000073 0000000000000000 ffffffffffffffef > (XEN) 0000000000000000 00000000002ca2e8 0000000000000000 00000011238fa000 > (XEN) 0000000000000074 00000000002ca338 000000000025630a 636f6c625f737953 > (XEN) 0000003000000030 00000000002ca2e0 00000000002ca218 ffffffffffffffeb > (XEN) 0000000000db03d0 0000000000256708 00000000002ca338 00000000002ca3e8 > (XEN) 00000000002ca2f8 ffffffffffffffe9 00000000000013fc 656e696665646e55 > (XEN) 7372756365725f64 75646f6d5f657669 050000000000656c 00000000003df368 > > I've tried to destroy/create multiple times the same unikernel and I always > receive the same error. When running on Unix I don't bump into this issue, > even when fetching and parsing multiple MB of data. > > By filling my code with logs, I figured out where exactly the unikernel > stops. Specifically during the JSON response parsing (I'm using the YoJson > library): > > let directExtractionn rawJson = > Log.info (fun f -> f "Initializing direct extraction"); > let json = Yojson.Basic.from_string rawJson in > let result = [json] |> filter_member "results" |> flatten |> > filter_member "series" > |> flatten |> filter_member "values" |> flatten in > List.map ( > fun item -> > let datapoint = match item |> index 1 with > | `String a -> a > | `Float f -> string_of_float f > | `Int i -> string_of_float > (float_of_int i) > | `Bool b -> string_of_bool b > in > datapoint > ) result |> computeAverage >>= fun aver -> > log_lwt ~inject:(fun f -> f "Result %f" aver) > > I know that probably my code is not really optimized and clean but I'm quite > shocked to see that my unikernel crashes when it has to extract roughly 3500 > datapoints (it's more or less the threshold at which it crashes). The > function computeAverage is not even called. If I run the same code on Unix I > can parse and process up to a 1M datapoints in less than a second. I've also > tried to increase the number of vcpus and memory, but nothing changed (16 > vcpus and 4GB of memory). > > I would like to add that this threshold changes depending on the host > machine: > > - Machine A (Ubuntu 14.04, Xen 4.6.0, 32 Cores, 128 GB RAM, 10 GB Network > Interface) -> Threshold is around 107Kb > - Machine B (Debian 8.5, Xen 4.4.1, 4 cores, 8 GB RAM, 1GB Network > Interface) -> Threshold is around 33Kb Can you simplify the case? For example, instead of fetching the JSON, what if you in-line the raw data in your code and parse that? Does adding a `Gc.full_major ()` just before the crash help? That might indicate we're running out of memory and failing to run the GC for some reason. You could also use `objdump -d` or similar on the unikernel image and see what the addresses in the stack trace correspond to. -- talex5 (GitHub/Twitter) http://roscidus.com/blog/ GPG: 5DD5 8D70 899C 454A 966D 6A51 7513 3C8F 94F6 E0CC GPG: DA98 25AE CAD0 8975 7CDA BD8E 0713 3F96 CA74 D8BA _______________________________________________ MirageOS-devel mailing list MirageOS-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |