[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v11 9/9] tools: CMDs and APIs for Platform QoS Monitoring



Dongxiao Xu writes ("[PATCH v11 9/9] tools: CMDs and APIs for Platform QoS 
Monitoring"):
> Introduced some new xl commands to handle the platform QoS monitoring
> related services.
> 
> The following two commands is to attach/detach the QoS monitoring
> service to/from a certain domain.

Thanks for this.  I'm no expert on the underlying layers.  But:

> diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
> index 30bd4bf..de7f0a8 100644
> --- a/docs/man/xl.pod.1
> +++ b/docs/man/xl.pod.1
> @@ -1366,6 +1366,27 @@ Load FLASK policy from the given policy file. The 
> initial policy is provided to
>  the hypervisor as a multiboot module; this command allows runtime updates to 
> the
>  policy. Loading new security policy will reset runtime changes to device 
> labels.
>  
> +=head1 PLATFORM QOS MONITORING
> +
> +New Intel processor may offer monitoring capability in each logical
> +processor to measure specific quality-of-service metric.

This should not refer to a specific CPU vendor.

> +=over 4
> +
> +=item B<pqos-monitor-attach> [I<domain-id>]
> +
> +attach: Attach the platform QoS monitoring service to a domain.

The document should explain what effect this has and why it has to be
done separately.

> +=item B<pqos-monitor-detach> [I<domain-id>]
> +
> +detach: Detach the platform QoS monitoring service from a domain.
> +
> +=item B<pqos-monitor-show> [I<qos-monitor-type>] [I<domain-id>]
> +
> +Show monitoring data for a certain domain or all domains. Current supported
> +QoS monitor types are:
> + - "cqm": cache QoS monitoring, showing the L3 cache occupancy.

The documentation here is a bit sparse.  What is the output format ?
Does this program run continuously or is it one shot ?

I think "cqm" could reasonably be replaced by "cache", since the Q
(from QoS) and M (from Monitoring) are already in the rest of the
command.

> +int xc_pqos_monitor_attach(xc_interface *xch, uint32_t domid);
> +int xc_pqos_monitor_detach(xc_interface *xch, uint32_t domid);
> +int xc_pqos_monitor_get_domain_rmid(xc_interface *xch, uint32_t domid,
> +    uint32_t *rmid);
> +int xc_pqos_monitor_get_total_rmid(xc_interface *xch, uint32_t *total_rmid);
> +int xc_pqos_monitor_get_l3_upscaling_factor(xc_interface *xch,
> +    uint32_t *upscaling_factor);
> +int xc_pqos_monitor_get_l3_cache_size(xc_interface *xch,
> +    uint32_t *l3_cache_size);
> +int xc_pqos_monitor_select_socket_cpu(xc_interface *xch, uint32_t socket);
> +int xc_pqos_monitor_get_data(xc_interface *xch, uint32_t rmid,
> +    uint32_t cpu, uint32_t pqos_monitor_type, uint64_t *monitor_data);
> +int xc_pqos_monitor_cqm_enabled(xc_interface *xch);

I'm not the expert on libxc, but there seems be a lot of very
formulaic functions here which don't provide a great deal of (or, any)
functionality.  Perhaps it would be better to have libxl make the
sysctl directly ?

> +#if defined(__i386__) || defined(__x86_64__)
> +int libxl_pqos_monitor_attach(libxl_ctx *ctx, uint32_t domid);
> +int libxl_pqos_monitor_detach(libxl_ctx *ctx, uint32_t domid);
> +int libxl_pqos_monitor_domain_attached(libxl_ctx *ctx, uint32_t domid);
> +int libxl_pqos_monitor_cqm_enabled(libxl_ctx *ctx);

It's not usual in libxl to arch-#ifdef these kind of functions.

> +int libxl_pqos_monitor_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid);
> +int libxl_pqos_monitor_get_l3_cache_size(libxl_ctx *ctx,
> +    uint32_t *l3_cache_size);
> +int libxl_pqos_monitor_get_domain_cqm(libxl_ctx *ctx, uint32_t domid,
> +    uint32_t socketid, uint64_t *l3_cache_occupancy);

Why not one function to return the whole lot in a struct in the IDL ?

> +static const char * const msg[] = {
> +    [ENOSYS] = "unsupported operation",
> +    [ENODEV] = "Platform QoS Monitoring is not supported in this system",
> +    [EEXIST] = "Platform QoS Monitoring is already attached to this domain",
> +    [ENOENT] = "Platform QoS Monitoring is not attached to this domain",
> +    [EUSERS] = "there is no free RMID available",
> +    [ESRCH]  = "is this Domain ID valid?",
> +    [EFAULT] = "cannot find a valid CPU belonging to this socket",

I'm afraid this error handling approach is entirely wrong.

libxl functions should return libxl error codes.  You shouldn't be
using errno.

You should invent new libxl error codes as you need them.

I don't understand what ENOSYS might mean here.  If the operation is
not supported then it is unsupported.  What is the distinction between
that and ENODEV ?  One (or both) of them should probably be ERROR_NI.

For the others I think you may need to add new error codes to libxl in
libxl_types.idl.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.