[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Systematic crash on create_bounce_frame when hitting specific data allocation threshold



On 14 December 2016 at 11:35, Vittorio Cozzolino
<vittorio.cozzolino@xxxxxxxxx> wrote:
> Hi,
> I'm running a unikernel on XEN that basically accesses a remote DB, fetches
> and computes some data, sends out the result. Apparently, if I try to fetch
> and parse a JSON response greater than a empirically found threshold
> (details at the bottom of the email), the PVM XEN unikernel just crashes and
> this is wait I see when running sudo xl dmesg:
>
> (XEN) Pagetable walk from 00000000002c9ff8:
> (XEN)  L4[0x000] = 00000010b5f67067 0000000000000567
> (XEN)  L3[0x000] = 00000010b5f68067 0000000000000568
> (XEN)  L2[0x001] = 00000010b5f6a067 000000000000056a
> (XEN)  L1[0x0c9] = 00100010b1ac9025 00000000000002c9
> (XEN) domain_crash_sync called from entry.S: fault at ffff82d0802261be
> create_bounce_frame+0x66/0x13a
> (XEN) Domain 23 (vcpu#0) crashed on cpu#17:
> (XEN) ----[ Xen-4.6.0  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    17
> (XEN) RIP:    e033:[<0000000000258cf4>]
> (XEN) RFLAGS: 0000000000010206   EM: 1   CONTEXT: pv guest (d23v0)
> (XEN) rax: 0000000000258cf0   rbx: 0000000000000000   rcx: 0000000000000073
> (XEN) rdx: 0000000000442528   rsi: 0000000000000000   rdi: 00000000002ca018
> (XEN) rbp: 00000000002ca1e8   rsp: 00000000002ca000   r8:  0000000000000002
> (XEN) r9:  0000000000000007   r10: 0000000000000007   r11: 0000000000000000
> (XEN) r12: 00000000002ca118   r13: 0000000000000000   r14: 00000011238fa000
> (XEN) r15: 0000000000000074   cr0: 0000000080050033   cr4: 00000000001526e0
> (XEN) cr3: 00000010b5f66000   cr2: 00000000002c9ff8
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
> (XEN) Guest stack trace from rsp=00000000002ca000:
> (XEN)    00000000002ca118 0000000000000000 000000000025933f 0000000000000074
> (XEN)    00000011238fa000 0000000000000000 00000000002ca118 00000000002ca1e8
> (XEN)    0000000000000000 0000000000000000 0000000000000007 0000000000000007
> (XEN)    0000000000000002 ffff800000000000 0000000000000073 0000000000442528
> (XEN)    00000000002ca118 0000000000000000 ffffffffffffffff 0000000000256708
> (XEN)    000000010000e030 0000000000010006 00000000002ca0c8 000000000000e02b
> (XEN)    0000000000000ffc 3736353433323130 4645444342413938 4e4d4c4b4a494847
> (XEN)    00000000002ca18b 00000000002ca1e8 00000000002ca18a 0000000000000074
> (XEN)    00000000002566a0 00000000002ca118 00000000002561bc 7561662065676150
> (XEN)    696c20746120746c 646461207261656e 3062642073736572 706972202c306433
> (XEN)    2c38303736353220 3030207367657220 3030303030303030 202c383333616332
> (XEN)    6533616332207073 735f72756f202c38 3030303030302070 3261633230303030
> (XEN)    65646f63202c3866 ffffffff0a0d3020 0000000000000bfc 61665f686374614d
> (XEN)    0200006572756c69 0000000000000073 0000000000000000 ffffffffffffffef
> (XEN)    0000000000000000 00000000002ca2e8 0000000000000000 00000011238fa000
> (XEN)    0000000000000074 00000000002ca338 000000000025630a 636f6c625f737953
> (XEN)    0000003000000030 00000000002ca2e0 00000000002ca218 ffffffffffffffeb
> (XEN)    0000000000db03d0 0000000000256708 00000000002ca338 00000000002ca3e8
> (XEN)    00000000002ca2f8 ffffffffffffffe9 00000000000013fc 656e696665646e55
> (XEN)    7372756365725f64 75646f6d5f657669 050000000000656c 00000000003df368
>
> I've tried to destroy/create multiple times the same unikernel and I always
> receive the same error. When running on Unix I don't bump into this issue,
> even when fetching and parsing multiple MB of data.
>
> By filling my code with logs, I figured out where exactly the unikernel
> stops. Specifically during the JSON response parsing (I'm using the YoJson
> library):
>
> let directExtractionn rawJson =
>            Log.info (fun f -> f "Initializing direct extraction");
>             let json = Yojson.Basic.from_string rawJson in
>             let result = [json] |> filter_member "results" |> flatten |>
> filter_member "series"
>             |> flatten |> filter_member "values" |> flatten in
>                 List.map (
>                                 fun item ->
>                                 let datapoint = match item |> index 1 with
>                                     | `String a -> a
>                                     | `Float f -> string_of_float f
>                                     | `Int i -> string_of_float
> (float_of_int i)
>                                     | `Bool b -> string_of_bool b
>                                 in
>                                 datapoint
>             ) result |> computeAverage >>= fun aver ->
>             log_lwt ~inject:(fun f -> f "Result %f" aver)
>
> I know that probably my code is not really optimized and clean but I'm quite
> shocked to see that my unikernel crashes when it has to extract roughly 3500
> datapoints (it's more or less the threshold at which it crashes). The
> function computeAverage is not even called. If I run the same code on Unix I
> can parse and process up to a 1M datapoints in less than a second. I've also
> tried to increase the number of vcpus and memory, but nothing changed (16
> vcpus and 4GB of memory).
>
> I would like to add that this threshold changes depending on the host
> machine:
>
> - Machine A (Ubuntu 14.04, Xen 4.6.0, 32 Cores, 128 GB RAM, 10 GB Network
> Interface) -> Threshold is around 107Kb
> - Machine B (Debian 8.5, Xen 4.4.1, 4 cores, 8 GB RAM, 1GB Network
> Interface) -> Threshold is around 33Kb

Can you simplify the case? For example, instead of fetching the JSON,
what if you in-line the raw data in your code and parse that?

Does adding a `Gc.full_major ()` just before the crash help? That
might indicate we're running out of memory and failing to run the GC
for some reason.

You could also use `objdump -d` or similar on the unikernel image and
see what the addresses in the stack trace correspond to.


-- 
talex5 (GitHub/Twitter)        http://roscidus.com/blog/
GPG: 5DD5 8D70 899C 454A 966D  6A51 7513 3C8F 94F6 E0CC
GPG: DA98 25AE CAD0 8975 7CDA  BD8E 0713 3F96 CA74 D8BA

_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.