[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/1] xentrace: Add TRC_HW_VCHIP
On 03/28/14 07:45, Jan Beulich wrote: On 28.03.14 at 12:25, <dslutz@xxxxxxxxxxx> wrote:This add a set of trace events that track the setup of various virtual chips related to timers in domU. This set is hpet, pit (i8253, i8254), rtc (MC146818), apic (lapic), and pic (i8259). The pmtimer is not traced since it does not have a changeable rate.But you're not saying anything about why this would be useful (considering that it wasn't needed before), and hence don't provide a reason for taking this change. Thank you for asking. I am assuming from this that some patches (like this one) should have this. This is an area that I am very weak on. No simple statement comes to mind. So here is the story of how this patch came about. Months ago, 1 server for about 2 to 3 days would after 1st boot have the 1 domU hang (1 out of 10 times) with the 1st interesting message on the domU console of: ..MP-BIOS bug: 8254 timer not connected to IO-APIC Since I know that this message was added to deal with certain bad motherboards and that xen does not have this issue, I started looking into this. I considered add code like:HVM_DBG_LOG(DBG_LEVEL_VLAPIC_TIMER, "value[0x%016"PRIx64"]", value); but was not sure this would not change the timer enough to stop the bug from happening. So I added this patch. I then spent a lot of time trying to reproduce this issue. I was not able to, nor was the person and server that reported it was able to. This was under various configurations: 1) No change. 2) debug=y xen build 3) debug=n + patch 4) debug=y + patch Using a few of the trace files and the source code of the domU's kernel, I was able to determine that the hpet.c code was involved. Using this knowledge, I made a patch to xen to simulate various values of "diff" (tn_cmp - cur_tick). With this debug code I was able to generate the hang on demand. This work is what caused me to post the patch: hpet: Act more like real hardware http://lists.xen.org/archives/html/xen-devel/2014-02/msg02408.html Which I now know to not be complete. More testing after that time has shown that 'diff > 0' will also cause this report if diff is large enough. Armed with this I went back to a few saved traces that I had an was able to determine that the first interval in the calls to create_periodic_time() (i.e. diff) had a very high variance. I no longer have the actual data, but my memory was that the hpet_tick_to_ns(h, diff) values ranged from 23,696ns to 955,456ns. More looking into linux in this area and learning about hpet hardware and specification leads me to fact that this should not be happening. I am still working on the set of changes to the hpet.c code to fix the set of bugs that I think are there. So I only know that this patch did provide very useful data to me. I would think that it would be a help to developers in the future. I think would could write a complex analysis program of the current trace data and infer what some of the trace data was. This to me is a lot harder. Since this is a new independent selectable trace a developer can deside to include of exclude these. This is long (and not a good reason for taking this change) but I hope it helps. -Don Slutz Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |