[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] [RFC 1/9] schedule: Introduce per-pcpu time accounting
- To: Andrii Anisov <andrii.anisov@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
- From: Julien Grall <julien.grall@xxxxxxx>
- Date: Mon, 28 Oct 2019 14:28:42 +0000
- Cc: Tim Deegan <tim@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Andrii Anisov <andrii_anisov@xxxxxxxx>, Wei Liu <wl@xxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
- Delivery-date: Mon, 28 Oct 2019 14:29:04 +0000
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
Hi Andrii,
Sorry for the late answer. It would be good to get a review from the scheduler
maintainers (Dario, George) to make sure they are happy with the suggested
states here.
Please see my comments below.
On 11/09/2019 11:32, Andrii Anisov wrote:
From: Andrii Anisov <andrii_anisov@xxxxxxxx>
Introduce per-pcpu time accounting what includes the following states:
I think we need a very detailed description of each states. Otherwise it will be
hard to know how to categorize it.
TACC_HYP - the pcpu executes hypervisor code like softirq processing
(including scheduling), tasklets and context switches
IHMO, "like" is too weak here. What do you exactly plan to introduce?
For instance, on Arm, you consider that leave_hypervisor_tail() is part of
TACC_HYP. This function will include some handling for synchronous trap.
TACC_GUEST - the pcpu executes guests code
Looking at the arm64 code, you are executing some hypervisor code here. I agree
this is impossible to not run any hypervisor code with TACC_GUEST, but I think
this should be clarified in the documentation.
TACC_IDLE - the low-power state of the pcpu
Did you intend to mean "idle vCPU" is in use?
TACC_IRQ - the pcpu performs interrupts processing, without separation to
guest or hypervisor interrupts
TACC_GSYNC - the pcpu executes hypervisor code to process synchronous trap
from the guest. E.g. hypercall processing or io emulation.
Currently, the only reenterant state is TACC_IRQ. It is assumed, no changes
to state other than TACC_IRQ could happen until we return from nested
interrupts. IRQ time is accounted in a distinct way comparing to other states.
s/comparing/compare/
It is acumulated between other states transition moments, and is substracted
s/acumulated/accumulated/ s/substracted/subtracted/
from the old state on states transion calculation.
s/transion/transition/
Signed-off-by: Andrii Anisov <andrii_anisov@xxxxxxxx>
---
xen/common/schedule.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++
xen/include/xen/sched.h | 27 +++++++++++++++++
2 files changed, 108 insertions(+)
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 7b71581..6dd6603 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1539,6 +1539,87 @@ static void schedule(void)
context_switch(prev, next);
}
+DEFINE_PER_CPU(struct tacc, tacc);
+
+static void tacc_state_change(enum TACC_STATES new_state)
This should never be called with the TACC_IRQ, right?
+{
+ s_time_t now, delta;
+ struct tacc* tacc = &this_cpu(tacc);
+ unsigned long flags;
+
+ local_irq_save(flags);
+
+ now = NOW();
+ delta = now - tacc->state_entry_time;
+
+ /* We do not expect states reenterability (at least through this
function)*/
+ ASSERT(new_state != tacc->state);
+
+ tacc->state_time[tacc->state] += delta - tacc->irq_time;
+ tacc->state_time[TACC_IRQ] += tacc->irq_time;
+ tacc->irq_time = 0;
+ tacc->state = new_state;
+ tacc->state_entry_time = now;
+
+ local_irq_restore(flags);
+}
+
+void tacc_hyp(int place)
Place is never used except for your commented printk. So what's the goal for it?
Also, is it really necessary to provide helper for each state? Couldn't we just
introduce one functions doing all the state?
+{
+// printk("\ttacc_hyp %u, place %d\n", smp_processor_id(), place);
+ tacc_state_change(TACC_HYP);
+}
+
+void tacc_guest(int place)
+{
+// printk("\ttacc_guest %u, place %d\n", smp_processor_id(), place);
+ tacc_state_change(TACC_GUEST);
+}
+
+void tacc_idle(int place)
+{
+// printk("\tidle cpu %u, place %d\n", smp_processor_id(), place);
+ tacc_state_change(TACC_IDLE);
+}
+
+void tacc_gsync(int place)
+{
+// printk("\ttacc_gsync %u, place %d\n", smp_processor_id(), place);
+ tacc_state_change(TACC_GSYNC);
+}
+
+void tacc_irq_enter(int place)
+{
+ struct tacc* tacc = &this_cpu(tacc);
+
+// printk("\ttacc_irq_enter %u, place %d, cnt %d\n", smp_processor_id(),
place, this_cpu(tacc).irq_cnt);
+ ASSERT(!local_irq_is_enabled());
+ ASSERT(tacc->irq_cnt >= 0);
+
+ if ( tacc->irq_cnt == 0 )
+ {
+ tacc->irq_enter_time = NOW();
+ }
+
+ tacc->irq_cnt++;
+}
+
+void tacc_irq_exit(int place)
+{
+ struct tacc* tacc = &this_cpu(tacc);
+
+// printk("\ttacc_irq_exit %u, place %d, cnt %d\n", smp_processor_id(), place,
tacc->irq_cnt);
+ ASSERT(!local_irq_is_enabled());
+ ASSERT(tacc->irq_cnt > 0);
+ if ( tacc->irq_cnt == 1 )
+ {
+ tacc->irq_time = NOW() - tacc->irq_enter_time;
If I understand correctly, you will use irq_time to update TACC_IRQ in
tacc_state_change(). It may be possible to receive another interrupt before the
state is changed (e.g. HYP -> GUEST). This means only the time for the last IRQ
received would be accounted.
+ tacc->irq_enter_time = 0;
+ }
+
+ tacc->irq_cnt--;
+}
+
void context_saved(struct vcpu *prev)
{
/* Clear running flag /after/ writing context to memory. */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index e3601c1..04a8724 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -1002,6 +1002,33 @@ extern void dump_runq(unsigned char key);
void arch_do_physinfo(struct xen_sysctl_physinfo *pi);
+enum TACC_STATES {
We don't tend to use all uppercases for enum name.
+ TACC_HYP = 0,
enum begins at 0 and increment by one every time. So there is no need to
hardcode a number.
Also, looking at the code, I think you rely on the first state to be TACC_HYP.
Am I correct?
+ TACC_GUEST = 1,
+ TACC_IDLE = 2,
+ TACC_IRQ = 3,
+ TACC_GSYNC = 4,
+ TACC_STATES_MAX
+};
It would be good to document all the states in the header as well.
+
+struct tacc
Please document the structure.
+{
+ s_time_t state_time[TACC_STATES_MAX];
+ s_time_t state_entry_time;
+ int state;
This should be the enum you used above here.
+
+ s_time_t guest_time;
This is not used.
+
+ s_time_t irq_enter_time;
+ s_time_t irq_time;
+ int irq_cnt;
Why do you need this to be signed?
+};
+
+DECLARE_PER_CPU(struct tacc, tacc);
+
+void tacc_hyp(int place);
+void tacc_idle(int place);
+
#endif /* __SCHED_H__ */
/*
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|