[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v11 9/9] tools: CMDs and APIs for Platform QoS Monitoring



> -----Original Message-----
> From: Ian Jackson [mailto:Ian.Jackson@xxxxxxxxxxxxx]
> Sent: Monday, June 23, 2014 11:23 PM
> To: Xu, Dongxiao
> Cc: xen-devel@xxxxxxxxxxxxx; keir@xxxxxxx; JBeulich@xxxxxxxx;
> Ian.Campbell@xxxxxxxxxx; stefano.stabellini@xxxxxxxxxxxxx;
> andrew.cooper3@xxxxxxxxxx; konrad.wilk@xxxxxxxxxx; dgdegra@xxxxxxxxxxxxx;
> George.Dunlap@xxxxxxxxxxxxx
> Subject: Re: [PATCH v11 9/9] tools: CMDs and APIs for Platform QoS Monitoring
> 
> Dongxiao Xu writes ("[PATCH v11 9/9] tools: CMDs and APIs for Platform QoS
> Monitoring"):
> > Introduced some new xl commands to handle the platform QoS monitoring
> > related services.
> >
> > The following two commands is to attach/detach the QoS monitoring
> > service to/from a certain domain.
> 
> Thanks for this.  I'm no expert on the underlying layers.  But:

Ian, thanks for your review, and following are some feedback.

> 
> > diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
> > index 30bd4bf..de7f0a8 100644
> > --- a/docs/man/xl.pod.1
> > +++ b/docs/man/xl.pod.1
> > @@ -1366,6 +1366,27 @@ Load FLASK policy from the given policy file. The
> initial policy is provided to
> >  the hypervisor as a multiboot module; this command allows runtime updates
> to the
> >  policy. Loading new security policy will reset runtime changes to device 
> > labels.
> >
> > +=head1 PLATFORM QOS MONITORING
> > +
> > +New Intel processor may offer monitoring capability in each logical
> > +processor to measure specific quality-of-service metric.
> 
> This should not refer to a specific CPU vendor.

Okay, will remove it the specific CPU vendor.

> 
> > +=over 4
> > +
> > +=item B<pqos-monitor-attach> [I<domain-id>]
> > +
> > +attach: Attach the platform QoS monitoring service to a domain.
> 
> The document should explain what effect this has and why it has to be
> done separately.

Okay, will state the reason in the patch.

> 
> > +=item B<pqos-monitor-detach> [I<domain-id>]
> > +
> > +detach: Detach the platform QoS monitoring service from a domain.
> > +
> > +=item B<pqos-monitor-show> [I<qos-monitor-type>] [I<domain-id>]
> > +
> > +Show monitoring data for a certain domain or all domains. Current supported
> > +QoS monitor types are:
> > + - "cqm": cache QoS monitoring, showing the L3 cache occupancy.
> 
> The documentation here is a bit sparse.  What is the output format ?

I checked its output after installed the xl, and the format looks okay, see 
below:

PLATFORM QOS MONITORING
       New Intel processor may offer monitoring capability in each logical 
processor to measure specific quality-of-service metric.

       pqos-monitor-attach [domain-id]
           Attach the platform QoS monitoring service to a domain.

       pqos-monitor-detach [domain-id]
           Detach the platform QoS monitoring service from a domain.

       pqos-monitor-show [qos-monitor-type] [domain-id]
           Show monitoring data for a certain domain or all domains. Current 
supported QoS monitor types are:
            - "cqm": cache QoS monitoring, showing the L3 cache occupancy.

> Does this program run continuously or is it one shot ?

The command is one-shot.

> 
> I think "cqm" could reasonably be replaced by "cache", since the Q
> (from QoS) and M (from Monitoring) are already in the rest of the
> command.

Would "cache_occupancy" be better?

> 
> > +int xc_pqos_monitor_attach(xc_interface *xch, uint32_t domid);
> > +int xc_pqos_monitor_detach(xc_interface *xch, uint32_t domid);
> > +int xc_pqos_monitor_get_domain_rmid(xc_interface *xch, uint32_t domid,
> > +    uint32_t *rmid);
> > +int xc_pqos_monitor_get_total_rmid(xc_interface *xch, uint32_t
> *total_rmid);
> > +int xc_pqos_monitor_get_l3_upscaling_factor(xc_interface *xch,
> > +    uint32_t *upscaling_factor);
> > +int xc_pqos_monitor_get_l3_cache_size(xc_interface *xch,
> > +    uint32_t *l3_cache_size);
> > +int xc_pqos_monitor_select_socket_cpu(xc_interface *xch, uint32_t socket);
> > +int xc_pqos_monitor_get_data(xc_interface *xch, uint32_t rmid,
> > +    uint32_t cpu, uint32_t pqos_monitor_type, uint64_t *monitor_data);
> > +int xc_pqos_monitor_cqm_enabled(xc_interface *xch);
> 
> I'm not the expert on libxc, but there seems be a lot of very
> formulaic functions here which don't provide a great deal of (or, any)
> functionality.  Perhaps it would be better to have libxl make the
> sysctl directly ?

There is always a libxc function to wrapper the hypercalls in current code.
This piece of code follow the current way...

> 
> > +#if defined(__i386__) || defined(__x86_64__)
> > +int libxl_pqos_monitor_attach(libxl_ctx *ctx, uint32_t domid);
> > +int libxl_pqos_monitor_detach(libxl_ctx *ctx, uint32_t domid);
> > +int libxl_pqos_monitor_domain_attached(libxl_ctx *ctx, uint32_t domid);
> > +int libxl_pqos_monitor_cqm_enabled(libxl_ctx *ctx);
> 
> It's not usual in libxl to arch-#ifdef these kind of functions.

Will remove the ifdef.

> 
> > +int libxl_pqos_monitor_get_total_rmid(libxl_ctx *ctx, uint32_t 
> > *total_rmid);
> > +int libxl_pqos_monitor_get_l3_cache_size(libxl_ctx *ctx,
> > +    uint32_t *l3_cache_size);
> > +int libxl_pqos_monitor_get_domain_cqm(libxl_ctx *ctx, uint32_t domid,
> > +    uint32_t socketid, uint64_t *l3_cache_occupancy);
> 
> Why not one function to return the whole lot in a struct in the IDL ?

libxl_pqos_monitor_get_total_rmid() and libxl_pqos_monitor_get_l3_cache_size() 
may be called only once, while libxl_pqos_monitor_get_domain_cqm() will be 
called periodically.
This is why I split them...
> 
> > +static const char * const msg[] = {
> > +    [ENOSYS] = "unsupported operation",
> > +    [ENODEV] = "Platform QoS Monitoring is not supported in this system",
> > +    [EEXIST] = "Platform QoS Monitoring is already attached to this 
> > domain",
> > +    [ENOENT] = "Platform QoS Monitoring is not attached to this domain",
> > +    [EUSERS] = "there is no free RMID available",
> > +    [ESRCH]  = "is this Domain ID valid?",
> > +    [EFAULT] = "cannot find a valid CPU belonging to this socket",
> 
> I'm afraid this error handling approach is entirely wrong.
> 
> libxl functions should return libxl error codes.  You shouldn't be
> using errno.
> 
> You should invent new libxl error codes as you need them.

The errno is just to check the execution of inner libxc functions, and it will 
not act as the return value for libxl functions.
Currently the libxl_pqos_monitor_xxx functions return 0 on success, and " 
ERROR_FAIL " on failure.

> 
> I don't understand what ENOSYS might mean here.  If the operation is
> not supported then it is unsupported.  What is the distinction between
> that and ENODEV ?  One (or both) of them should probably be ERROR_NI.

Here the ENOSYS means platform QoS monitoring is enabled, however the tool 
stack is trying to get some value that system is not supported. You can refer 
to the 0006 patch.
Such errnos are just to enlight the error logging information in libxl 
functions, not the final return value. :)

Thanks,
Dongxiao

> 
> For the others I think you may need to add new error codes to libxl in
> libxl_types.idl.
> 
> Thanks,
> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.