Xen project Mailing List

Re: [MirageOS-devel] Systematic crash on create_bounce_frame when hitting specific data allocation threshold

To: Vittorio Cozzolino <vittorio.cozzolino@xxxxxxxxx>

From: Thomas Leonard <talex5@xxxxxxxxx>

Date: Wed, 14 Dec 2016 15:36:18 +0000

Cc: "mirageos-devel@xxxxxxxxxxxxxxxxxxxx" <mirageos-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 14 Dec 2016 15:36:25 +0000

List-id: Developer list for MirageOS <mirageos-devel.lists.xenproject.org>

On 14 December 2016 at 15:12, Vittorio Cozzolino <vittorio.cozzolino@xxxxxxxxx> wrote: > Hi Thomas, > > I've tried a few things: > > - `Gc.full_major()` unfortunately doesn't help. > - Looking at the address pointed by the RIP at the moment of the exception, > I can see this instruction: > > 25605f: e8 7c ad ff ff callq 250de0 <memcpy> > > I don't know how useful can it be, considering that I can trigger the same > crash by actually changing the code and, in this case, the references > instruction would be something totally different (like a movel, push). Maybe > the instruction type is not much related to the crash itself? I feel like it > doesn't make much sense.. It would be more interesting to know the caller of this function, etc. It's possible that it branched to an invalid address and started executing random code at some point, so the actual location of the crash might not help but things further up the stack might be useful. > - Regarding in-lining the raw data in the code, I'm still working on it. > Actually I don't fully understand what you mean, are you suggesting > de-structuring the JSON format and insert into my code directly a list/array > of values? Or copying the JSON output directly inside my code as a static > variable? I've tried the latter and the error persists. I will build the > list of static values and see what happens. Yes, I mean putting the json in your code, as let raw_json = "..." If it still crashes with this, you can remove the database call. If it still crashes, you can remove networking completely from your unikernel. You can eliminate a lot of code quickly this way. If you can get a unikernel that just parses a JSON string and crashes, other people can try it too and it should be easy to find the cause. > Anyway, whatever I do with the retrieved JSON (even List.iter with an empty > function body), the unikernel crashes. I have the impression that as soon as > I try to access the variable containing the JSON value the system crash is > triggered. > > Best regards, > Vittorio > > > Il 14/12/2016 13:45, Thomas Leonard ha scritto: >> >> On 14 December 2016 at 11:35, Vittorio Cozzolino >> <vittorio.cozzolino@xxxxxxxxx> wrote: >>> >>> Hi, >>> I'm running a unikernel on XEN that basically accesses a remote DB, >>> fetches >>> and computes some data, sends out the result. Apparently, if I try to >>> fetch >>> and parse a JSON response greater than a empirically found threshold >>> (details at the bottom of the email), the PVM XEN unikernel just crashes >>> and >>> this is wait I see when running sudo xl dmesg: >>> >>> (XEN) Pagetable walk from 00000000002c9ff8: >>> (XEN) L4[0x000] = 00000010b5f67067 0000000000000567 >>> (XEN) L3[0x000] = 00000010b5f68067 0000000000000568 >>> (XEN) L2[0x001] = 00000010b5f6a067 000000000000056a >>> (XEN) L1[0x0c9] = 00100010b1ac9025 00000000000002c9 >>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d0802261be >>> create_bounce_frame+0x66/0x13a >>> (XEN) Domain 23 (vcpu#0) crashed on cpu#17: >>> (XEN) ----[ Xen-4.6.0 x86_64 debug=n Not tainted ]---- >>> (XEN) CPU: 17 >>> (XEN) RIP: e033:[<0000000000258cf4>] >>> (XEN) RFLAGS: 0000000000010206 EM: 1 CONTEXT: pv guest (d23v0) >>> (XEN) rax: 0000000000258cf0 rbx: 0000000000000000 rcx: >>> 0000000000000073 >>> (XEN) rdx: 0000000000442528 rsi: 0000000000000000 rdi: >>> 00000000002ca018 >>> (XEN) rbp: 00000000002ca1e8 rsp: 00000000002ca000 r8: >>> 0000000000000002 >>> (XEN) r9: 0000000000000007 r10: 0000000000000007 r11: >>> 0000000000000000 >>> (XEN) r12: 00000000002ca118 r13: 0000000000000000 r14: >>> 00000011238fa000 >>> (XEN) r15: 0000000000000074 cr0: 0000000080050033 cr4: >>> 00000000001526e0 >>> (XEN) cr3: 00000010b5f66000 cr2: 00000000002c9ff8 >>> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 >>> (XEN) Guest stack trace from rsp=00000000002ca000: >>> (XEN) 00000000002ca118 0000000000000000 000000000025933f >>> 0000000000000074 >>> (XEN) 00000011238fa000 0000000000000000 00000000002ca118 >>> 00000000002ca1e8 >>> (XEN) 0000000000000000 0000000000000000 0000000000000007 >>> 0000000000000007 >>> (XEN) 0000000000000002 ffff800000000000 0000000000000073 >>> 0000000000442528 >>> (XEN) 00000000002ca118 0000000000000000 ffffffffffffffff >>> 0000000000256708 >>> (XEN) 000000010000e030 0000000000010006 00000000002ca0c8 >>> 000000000000e02b >>> (XEN) 0000000000000ffc 3736353433323130 4645444342413938 >>> 4e4d4c4b4a494847 >>> (XEN) 00000000002ca18b 00000000002ca1e8 00000000002ca18a >>> 0000000000000074 >>> (XEN) 00000000002566a0 00000000002ca118 00000000002561bc >>> 7561662065676150 >>> (XEN) 696c20746120746c 646461207261656e 3062642073736572 >>> 706972202c306433 >>> (XEN) 2c38303736353220 3030207367657220 3030303030303030 >>> 202c383333616332 >>> (XEN) 6533616332207073 735f72756f202c38 3030303030302070 >>> 3261633230303030 >>> (XEN) 65646f63202c3866 ffffffff0a0d3020 0000000000000bfc >>> 61665f686374614d >>> (XEN) 0200006572756c69 0000000000000073 0000000000000000 >>> ffffffffffffffef >>> (XEN) 0000000000000000 00000000002ca2e8 0000000000000000 >>> 00000011238fa000 >>> (XEN) 0000000000000074 00000000002ca338 000000000025630a >>> 636f6c625f737953 >>> (XEN) 0000003000000030 00000000002ca2e0 00000000002ca218 >>> ffffffffffffffeb >>> (XEN) 0000000000db03d0 0000000000256708 00000000002ca338 >>> 00000000002ca3e8 >>> (XEN) 00000000002ca2f8 ffffffffffffffe9 00000000000013fc >>> 656e696665646e55 >>> (XEN) 7372756365725f64 75646f6d5f657669 050000000000656c >>> 00000000003df368 >>> >>> I've tried to destroy/create multiple times the same unikernel and I >>> always >>> receive the same error. When running on Unix I don't bump into this >>> issue, >>> even when fetching and parsing multiple MB of data. >>> >>> By filling my code with logs, I figured out where exactly the unikernel >>> stops. Specifically during the JSON response parsing (I'm using the >>> YoJson >>> library): >>> >>> let directExtractionn rawJson = >>> Log.info (fun f -> f "Initializing direct extraction"); >>> let json = Yojson.Basic.from_string rawJson in >>> let result = [json] |> filter_member "results" |> flatten |> >>> filter_member "series" >>> |> flatten |> filter_member "values" |> flatten in >>> List.map ( >>> fun item -> >>> let datapoint = match item |> index 1 >>> with >>> | `String a -> a >>> | `Float f -> string_of_float f >>> | `Int i -> string_of_float >>> (float_of_int i) >>> | `Bool b -> string_of_bool b >>> in >>> datapoint >>> ) result |> computeAverage >>= fun aver -> >>> log_lwt ~inject:(fun f -> f "Result %f" aver) >>> >>> I know that probably my code is not really optimized and clean but I'm >>> quite >>> shocked to see that my unikernel crashes when it has to extract roughly >>> 3500 >>> datapoints (it's more or less the threshold at which it crashes). The >>> function computeAverage is not even called. If I run the same code on >>> Unix I >>> can parse and process up to a 1M datapoints in less than a second. I've >>> also >>> tried to increase the number of vcpus and memory, but nothing changed (16 >>> vcpus and 4GB of memory). >>> >>> I would like to add that this threshold changes depending on the host >>> machine: >>> >>> - Machine A (Ubuntu 14.04, Xen 4.6.0, 32 Cores, 128 GB RAM, 10 GB Network >>> Interface) -> Threshold is around 107Kb >>> - Machine B (Debian 8.5, Xen 4.4.1, 4 cores, 8 GB RAM, 1GB Network >>> Interface) -> Threshold is around 33Kb >> >> Can you simplify the case? For example, instead of fetching the JSON, >> what if you in-line the raw data in your code and parse that? >> >> Does adding a `Gc.full_major ()` just before the crash help? That >> might indicate we're running out of memory and failing to run the GC >> for some reason. >> >> You could also use `objdump -d` or similar on the unikernel image and >> see what the addresses in the stack trace correspond to. >> >> > > -- > Vittorio Cozzolino, M.Eng. > Technische Universität München - Institut für Informatik > Office 01.05.041 > Boltzmannstr. 3, 85748 Garching, Germany > Tel: +49 89 289-17356 > http://www.cm.in.tum.de/en/research-group/vittorio-cozzolino > > -- talex5 (GitHub/Twitter) http://roscidus.com/blog/ GPG: 5DD5 8D70 899C 454A 966D 6A51 7513 3C8F 94F6 E0CC GPG: DA98 25AE CAD0 8975 7CDA BD8E 0713 3F96 CA74 D8BA _______________________________________________ MirageOS-devel mailing list MirageOS-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.