[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting
On Mon, Oct 24, 2016 at 8:05 AM, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> wrote: > On 10/24/2016 12:18 AM, Kyle Huey wrote: >> >> The anomalies we see appear to be related to, or at least triggerable >> by, the performance monitoring interrupt. The following program runs >> a loop of roughly 2^25 conditional branches. It takes one argument, >> the number of conditional branches to program the PMI to trigger on. >> The default is 50,000, and if you run the program with that it'll >> produce the same value every time. If you drop it to 5000 or so >> you'll probably see occasional off-by-one discrepancies. If you drop >> it to 500 the performance counter values fluctuate wildly. > > Yes, it does change but I also see the difference on baremetal (although > not as big as it is in an HVM guest): > ostr@workbase> ./pmu 500 > Period is 500 > Counted 5950003 conditional branches > ostr@workbase> ./pmu 500 > Period is 500 > Counted 5850003 conditional branches > ostr@workbase> ./pmu 500 > Period is 500 > Counted 7530107 conditional branches > ostr@workbase> Yeah, you're right. I simplified the testcase too far. I have included a better one. This testcase is stable on bare metal (down to an interrupt every 10 branches, I didn't try below that) and more accurately represents what our software actually does. rr acts as a ptrace supervisor to the process being recorded, and it seems that context switching between the supervisor and tracee processes stabilizes the performance counter values somehow. >> I'm not yet sure if this is specifically related to the PMI, or if it >> can be caused by any interrupt and it's only how frequently the >> interrupts occur that matters. > > I have never used file interface to performance counters, but what are > we reporting here (in read_counter()) --- total number of events or > number of events since last sample? It is also curious to me that the > counter in non-zero after PERF_EVENT_IOC_RESET (but again, I don't have > any experience with these interfaces). It should be number of events since the last time the counter was reset (or overflowed, I guess). On my machine the counter value is zero both before and after the PERF_EVENT_IOC_RESET ioctl. > Also, exclude_guest doesn't appear to make any difference, I don't know > if there are any bits in Intel counters that allow you to distinguish > guest from host (unlike AMD, where there is a bit for that). exclude_guest is a Linux specific thing for excluding KVM guests. There is no hardware support involved; it's handled entirely in the perf events infrastructure in the kernel. - Kyle #define _GNU_SOURCE 1 #include <assert.h> #include <fcntl.h> #include <linux/perf_event.h> #include <signal.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/ioctl.h> #include <sys/ptrace.h> #include <sys/syscall.h> #include <sys/wait.h> #include <unistd.h> static struct perf_event_attr rcb_attr; static uint64_t period; static int fd; void counter_on(uint64_t ticks) { int ret = ioctl(fd, PERF_EVENT_IOC_RESET, 0); assert(!ret); ret = ioctl(fd, PERF_EVENT_IOC_PERIOD, &ticks); assert(!ret); ret = ioctl(fd, PERF_EVENT_IOC_ENABLE, 1); assert(!ret); } void counter_off() { int ret = ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); assert(!ret); } int64_t read_counter() { int64_t val; ssize_t nread = read(fd, &val, sizeof(val)); assert(nread == sizeof(val)); return val; } void do_test() { int i, dummy; for (i = 0; i < (1 << 25); i++) { dummy += i % (1 << 10); dummy += i % (79 * (1 << 10)); } } int main(int argc, const char* argv[]) { int pid; memset(&rcb_attr, 0, sizeof(rcb_attr)); rcb_attr.size = sizeof(rcb_attr); rcb_attr.type = PERF_TYPE_RAW; /* Intel retired conditional branches counter, ring 3 only */ rcb_attr.config = 0x5101c4; rcb_attr.exclude_kernel = 1; rcb_attr.exclude_guest = 1; /* We'll change this later */ rcb_attr.sample_period = 0xffffffff; signal(SIGALRM, SIG_IGN); pid = fork(); if (pid == 0) { /* Wait for the parent */ kill(getpid(), SIGSTOP); do_test(); return 0; } /* start the counter */ fd = syscall(__NR_perf_event_open, &rcb_attr, pid, -1, -1, 0); if (fd < 0) { printf("Failed to initialize counter\n"); return -1; } counter_off(); struct f_owner_ex own; own.type = F_OWNER_PID; own.pid = pid; if (fcntl(fd, F_SETOWN_EX, &own) || fcntl(fd, F_SETFL, O_ASYNC) || fcntl(fd, F_SETSIG, SIGALRM)) { printf("Failed to make counter async\n"); return -1; } period = 50000; if (argc > 1) { sscanf(argv[1], "%ld", &period); } printf("Period is %ld\n", period); counter_on(period); ptrace(PTRACE_SEIZE, pid, NULL, 0); ptrace(PTRACE_CONT, pid, NULL, SIGCONT); int status = 0; while (1) { waitpid(pid, &status, 0); if (WIFEXITED(status)) { break; } if (WIFSIGNALED(status)) { assert(0); continue; } if (WIFSTOPPED(status)) { if (WSTOPSIG(status) == SIGALRM || WSTOPSIG(status) == SIGSTOP) { ptrace(PTRACE_CONT, pid, NULL, WSTOPSIG(status)); continue; } } assert(0 && "unhandled ptrace event!"); } counter_off(); int64_t counts = read_counter(); printf("Counted %ld conditional branches\n", counts); return 0; } _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |