|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting
On Mon, Oct 24, 2016 at 8:05 AM, Boris Ostrovsky
<boris.ostrovsky@xxxxxxxxxx> wrote:
> On 10/24/2016 12:18 AM, Kyle Huey wrote:
>>
>> The anomalies we see appear to be related to, or at least triggerable
>> by, the performance monitoring interrupt. The following program runs
>> a loop of roughly 2^25 conditional branches. It takes one argument,
>> the number of conditional branches to program the PMI to trigger on.
>> The default is 50,000, and if you run the program with that it'll
>> produce the same value every time. If you drop it to 5000 or so
>> you'll probably see occasional off-by-one discrepancies. If you drop
>> it to 500 the performance counter values fluctuate wildly.
>
> Yes, it does change but I also see the difference on baremetal (although
> not as big as it is in an HVM guest):
> ostr@workbase> ./pmu 500
> Period is 500
> Counted 5950003 conditional branches
> ostr@workbase> ./pmu 500
> Period is 500
> Counted 5850003 conditional branches
> ostr@workbase> ./pmu 500
> Period is 500
> Counted 7530107 conditional branches
> ostr@workbase>
Yeah, you're right. I simplified the testcase too far. I have
included a better one. This testcase is stable on bare metal (down to
an interrupt every 10 branches, I didn't try below that) and more
accurately represents what our software actually does. rr acts as a
ptrace supervisor to the process being recorded, and it seems that
context switching between the supervisor and tracee processes
stabilizes the performance counter values somehow.
>> I'm not yet sure if this is specifically related to the PMI, or if it
>> can be caused by any interrupt and it's only how frequently the
>> interrupts occur that matters.
>
> I have never used file interface to performance counters, but what are
> we reporting here (in read_counter()) --- total number of events or
> number of events since last sample? It is also curious to me that the
> counter in non-zero after PERF_EVENT_IOC_RESET (but again, I don't have
> any experience with these interfaces).
It should be number of events since the last time the counter was
reset (or overflowed, I guess). On my machine the counter value is
zero both before and after the PERF_EVENT_IOC_RESET ioctl.
> Also, exclude_guest doesn't appear to make any difference, I don't know
> if there are any bits in Intel counters that allow you to distinguish
> guest from host (unlike AMD, where there is a bit for that).
exclude_guest is a Linux specific thing for excluding KVM guests.
There is no hardware support involved; it's handled entirely in the
perf events infrastructure in the kernel.
- Kyle
#define _GNU_SOURCE 1
#include <assert.h>
#include <fcntl.h>
#include <linux/perf_event.h>
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/ptrace.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <unistd.h>
static struct perf_event_attr rcb_attr;
static uint64_t period;
static int fd;
void counter_on(uint64_t ticks)
{
int ret = ioctl(fd, PERF_EVENT_IOC_RESET, 0);
assert(!ret);
ret = ioctl(fd, PERF_EVENT_IOC_PERIOD, &ticks);
assert(!ret);
ret = ioctl(fd, PERF_EVENT_IOC_ENABLE, 1);
assert(!ret);
}
void counter_off()
{
int ret = ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
assert(!ret);
}
int64_t read_counter()
{
int64_t val;
ssize_t nread = read(fd, &val, sizeof(val));
assert(nread == sizeof(val));
return val;
}
void do_test()
{
int i, dummy;
for (i = 0; i < (1 << 25); i++) {
dummy += i % (1 << 10);
dummy += i % (79 * (1 << 10));
}
}
int main(int argc, const char* argv[])
{
int pid;
memset(&rcb_attr, 0, sizeof(rcb_attr));
rcb_attr.size = sizeof(rcb_attr);
rcb_attr.type = PERF_TYPE_RAW;
/* Intel retired conditional branches counter, ring 3 only */
rcb_attr.config = 0x5101c4;
rcb_attr.exclude_kernel = 1;
rcb_attr.exclude_guest = 1;
/* We'll change this later */
rcb_attr.sample_period = 0xffffffff;
signal(SIGALRM, SIG_IGN);
pid = fork();
if (pid == 0) {
/* Wait for the parent */
kill(getpid(), SIGSTOP);
do_test();
return 0;
}
/* start the counter */
fd = syscall(__NR_perf_event_open, &rcb_attr, pid, -1, -1, 0);
if (fd < 0) {
printf("Failed to initialize counter\n");
return -1;
}
counter_off();
struct f_owner_ex own;
own.type = F_OWNER_PID;
own.pid = pid;
if (fcntl(fd, F_SETOWN_EX, &own) ||
fcntl(fd, F_SETFL, O_ASYNC) ||
fcntl(fd, F_SETSIG, SIGALRM)) {
printf("Failed to make counter async\n");
return -1;
}
period = 50000;
if (argc > 1) {
sscanf(argv[1], "%ld", &period);
}
printf("Period is %ld\n", period);
counter_on(period);
ptrace(PTRACE_SEIZE, pid, NULL, 0);
ptrace(PTRACE_CONT, pid, NULL, SIGCONT);
int status = 0;
while (1) {
waitpid(pid, &status, 0);
if (WIFEXITED(status)) {
break;
}
if (WIFSIGNALED(status)) {
assert(0);
continue;
}
if (WIFSTOPPED(status)) {
if (WSTOPSIG(status) == SIGALRM ||
WSTOPSIG(status) == SIGSTOP) {
ptrace(PTRACE_CONT, pid, NULL, WSTOPSIG(status));
continue;
}
}
assert(0 && "unhandled ptrace event!");
}
counter_off();
int64_t counts = read_counter();
printf("Counted %ld conditional branches\n", counts);
return 0;
}
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |