[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MirageOS-devel] Systematic crash on create_bounce_frame when hitting specific data allocation threshold


  • To: Vittorio Cozzolino <vittorio.cozzolino@xxxxxxxxx>
  • From: Anil Madhavapeddy <anil@xxxxxxxxxx>
  • Date: Wed, 14 Dec 2016 11:46:08 +0000
  • Cc: "mirageos-devel@xxxxxxxxxxxxxxxxxxxx" <mirageos-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 14 Dec 2016 11:46:17 +0000
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=recoil.org; h=content-type :mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; q=dns; s= selector1; b=HtKkLN4gDu0mZhtOvGoQi8UOSMJ6yOovsWcuIbmpHTlaQ4lWkAT y/GlzNtK0fP6hOl3iYvZaECCUe+R2nU7+ayc/zuZ2Kb4wQie08K1dxVdqUze4xr5 fOUWOfxtWzMdZWwSZPyC2FeZHW82ZEQ3Kxk3QvTpMk36KIhaHK64HfVY=
  • List-id: Developer list for MirageOS <mirageos-devel.lists.xenproject.org>

> On 14 Dec 2016, at 11:35, Vittorio Cozzolino <vittorio.cozzolino@xxxxxxxxx> 
> wrote:
> 
> Hi,
> I'm running a unikernel on XEN that basically accesses a remote DB, fetches 
> and computes some data, sends out the result. Apparently, if I try to fetch 
> and parse a JSON response greater than a empirically found threshold (details 
> at the bottom of the email), the PVM XEN unikernel just crashes and this is 
> wait I see when running sudo xl dmesg:
> 
> (XEN) Pagetable walk from 00000000002c9ff8:
> (XEN)  L4[0x000] = 00000010b5f67067 0000000000000567
> (XEN)  L3[0x000] = 00000010b5f68067 0000000000000568
> (XEN)  L2[0x001] = 00000010b5f6a067 000000000000056a
> (XEN)  L1[0x0c9] = 00100010b1ac9025 00000000000002c9
> (XEN) domain_crash_sync called from entry.S: fault at ffff82d0802261be 
> create_bounce_frame+0x66/0x13a
> (XEN) Domain 23 (vcpu#0) crashed on cpu#17:
> (XEN) ----[ Xen-4.6.0  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    17
> (XEN) RIP:    e033:[<0000000000258cf4>]
> (XEN) RFLAGS: 0000000000010206   EM: 1   CONTEXT: pv guest (d23v0)
> (XEN) rax: 0000000000258cf0   rbx: 0000000000000000   rcx: 0000000000000073
> (XEN) rdx: 0000000000442528   rsi: 0000000000000000   rdi: 00000000002ca018
> (XEN) rbp: 00000000002ca1e8   rsp: 00000000002ca000   r8:  0000000000000002
> (XEN) r9:  0000000000000007   r10: 0000000000000007   r11: 0000000000000000
> (XEN) r12: 00000000002ca118   r13: 0000000000000000   r14: 00000011238fa000
> (XEN) r15: 0000000000000074   cr0: 0000000080050033   cr4: 00000000001526e0
> (XEN) cr3: 00000010b5f66000   cr2: 00000000002c9ff8
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
> (XEN) Guest stack trace from rsp=00000000002ca000:
> (XEN)    00000000002ca118 0000000000000000 000000000025933f 0000000000000074
> (XEN)    00000011238fa000 0000000000000000 00000000002ca118 00000000002ca1e8
> (XEN)    0000000000000000 0000000000000000 0000000000000007 0000000000000007
> (XEN)    0000000000000002 ffff800000000000 0000000000000073 0000000000442528
> (XEN)    00000000002ca118 0000000000000000 ffffffffffffffff 0000000000256708
> (XEN)    000000010000e030 0000000000010006 00000000002ca0c8 000000000000e02b
> (XEN)    0000000000000ffc 3736353433323130 4645444342413938 4e4d4c4b4a494847
> (XEN)    00000000002ca18b 00000000002ca1e8 00000000002ca18a 0000000000000074
> (XEN)    00000000002566a0 00000000002ca118 00000000002561bc 7561662065676150
> (XEN)    696c20746120746c 646461207261656e 3062642073736572 706972202c306433
> (XEN)    2c38303736353220 3030207367657220 3030303030303030 202c383333616332
> (XEN)    6533616332207073 735f72756f202c38 3030303030302070 3261633230303030
> (XEN)    65646f63202c3866 ffffffff0a0d3020 0000000000000bfc 61665f686374614d
> (XEN)    0200006572756c69 0000000000000073 0000000000000000 ffffffffffffffef
> (XEN)    0000000000000000 00000000002ca2e8 0000000000000000 00000011238fa000
> (XEN)    0000000000000074 00000000002ca338 000000000025630a 636f6c625f737953
> (XEN)    0000003000000030 00000000002ca2e0 00000000002ca218 ffffffffffffffeb
> (XEN)    0000000000db03d0 0000000000256708 00000000002ca338 00000000002ca3e8
> (XEN)    00000000002ca2f8 ffffffffffffffe9 00000000000013fc 656e696665646e55
> (XEN)    7372756365725f64 75646f6d5f657669 050000000000656c 00000000003df368
> 
> I've tried to destroy/create multiple times the same unikernel and I always 
> receive the same error. When running on Unix I don't bump into this issue, 
> even when fetching and parsing multiple MB of data.
> 
> By filling my code with logs, I figured out where exactly the unikernel 
> stops. Specifically during the JSON response parsing (I'm using the YoJson 
> library):
> 
> let directExtractionn rawJson =
>            Log.info (fun f -> f "Initializing direct extraction");
>             let json = Yojson.Basic.from_string rawJson in
>             let result = [json] |> filter_member "results" |> flatten |> 
> filter_member "series"
>             |> flatten |> filter_member "values" |> flatten in
>                 List.map (
>                                 fun item ->
>                                 let datapoint = match item |> index 1 with
>                                     | `String a -> a
>                                     | `Float f -> string_of_float f
>                                     | `Int i -> string_of_float (float_of_int 
> i)
>                                     | `Bool b -> string_of_bool b
>                                 in
>                                 datapoint
>             ) result |> computeAverage >>= fun aver ->
>             log_lwt ~inject:(fun f -> f "Result %f" aver)
> 
> I know that probably my code is not really optimized and clean but I'm quite 
> shocked to see that my unikernel crashes when it has to extract roughly 3500 
> datapoints (it's more or less the threshold at which it crashes). The 
> function computeAverage is not even called. If I run the same code on Unix I 
> can parse and process up to a 1M datapoints in less than a second. I've also 
> tried to increase the number of vcpus and memory, but nothing changed (16 
> vcpus and 4GB of memory).
> 
> I would like to add that this threshold changes depending on the host machine:
> 
> - Machine A (Ubuntu 14.04, Xen 4.6.0, 32 Cores, 128 GB RAM, 10 GB Network 
> Interface) -> Threshold is around 107Kb
> - Machine B (Debian 8.5, Xen 4.4.1, 4 cores, 8 GB RAM, 1GB Network Interface) 
> -> Threshold is around 33Kb

Hi Vittorio,

This is clearly a bug :-)  To help us diagnose it, which version of Mirage were 
you building with?  The released Mirage 2.9.1, or the development branch on 
https://github.com/mirage/mirage-dev?  Also which version of OCaml was in use?

regards,
Anil
_______________________________________________
MirageOS-devel mailing list
MirageOS-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.