[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 0/2] Add a new acquire resource to query vcpu statistics



Hello George and thanks for the review! you will find my comments below.

On Fri, Jun 17, 2022 at 07:54:32PM +0000, George Dunlap wrote:
> 
> 
> > On 17 May 2022, at 15:33, Matias Ezequiel Vara Larsen 
> > <matiasevara@xxxxxxxxx> wrote:
> > 
> > Hello all,
> > 
> > The purpose of this RFC is to get feedback about a new acquire resource that
> > exposes vcpu statistics for a given domain. The current mechanism to get 
> > those
> > statistics is by querying the hypervisor. This mechanism relies on a 
> > hypercall
> > and holds the domctl spinlock during its execution. When a pv tool like 
> > xcp-rrdd
> > periodically samples these counters, it ends up affecting other paths that 
> > share
> > that spinlock. By using acquire resources, the pv tool only requires a few
> > hypercalls to set the shared memory region and samples are got without 
> > issuing
> > any other hypercall. The original idea has been suggested by Andrew Cooper 
> > to
> > which I have been discussing about how to implement the current PoC. You can
> > find the RFC patch series at [1]. The series is rebased on top of 
> > stable-4.15.
> > 
> > I am currently a bit blocked on 1) what to expose and 2) how to expose it. 
> > For
> > 1), I decided to expose what xcp-rrdd is querying, e.g., 
> > XEN_DOMCTL_getvcpuinfo.
> > More precisely, xcp-rrd gets runstate.time[RUNSTATE_running]. This is a 
> > uint64_t
> > counter. However, the time spent in other states may be interesting too.
> > Regarding 2), I am not sure if simply using an array of uint64_t is enough 
> > or if
> > a different interface should be exposed. The remaining question is when to 
> > get
> > new values. For the moment, I am updating this counter during
> > vcpu_runstate_change().
> > 
> > The current series includes a simple pv tool that shows how this new 
> > interface is
> > used. This tool maps the counter and periodically samples it.
> > 
> > Any feedback/help would be appreciated.
> 
> Hey Matias,
> 
> Sorry it’s taken so long to get back to you.  My day-to-day job has shifted 
> away from technical things to community management; this has been on my radar 
> but I never made time to dig into it.
> 
> There are some missing details I’ve had to try to piece together about the 
> situation, so let me sketch things out a bit further and see if I understand 
> the situation:
> 
> * xcp-rrd currently wants (at minimum) to record 
> runstate.time[RUNSTATE_running] for each vcpu.  Currently that means calling 
> XEN_DOMCTL_getvcpuinfo, which has to hold a single global domctl_lock (!) for 
> the entire hypercall; and of course must be iterated over every vcpu in the 
> system for every update.
> 

For example, in xcp-ng, xcp-rrdd is sampling all the VCPUs of the system every 5
seconds. Also, xcp-rrdd queries other counters like XEN_DOMCTL_getdomaininfo. 

Out of curiosity, do you know any benchmark to measure the impact of this
querying? My guess is that the time of domctl-based operations would increase
with the number of VCPUs. In such a escenario, I am supposing that the time to
query all vcpus increase with the number of vcpus thus holding the domctl_lock
longer. However, this would be only observable in a large
enough system.

> * VCPUOP_get_runstate_info copies out a vcpu_runstate_info struct, which 
> contains information on the other runstates.  Additionally, 
> VCPUOP_register_runstate_memory_area already does something similar to what 
> you want: it passes a virtual address to Xen, which Xen maps, and copies 
> information about the various vcpus into (in update_runstate_area()).
> 
> * However, the above assumes a domain of “current->domain”: That is a domain 
> can call VCPUOP_get_runstate_info on one of its own vcpus, but dom0 cannot 
> call it to get information about the vcpus of other domains.
> 
> * Additionally, VCPUOP_register_runstate_memory_area registers by *virtual 
> address*; this is actually problematic even for guest kernels looking at 
> their own vcpus; but would be completely inappropriate for a dom0 userspace 
> application, which is what you’re looking at.
> 

I just learned about VCPUOP_register_runstate_memory_area a few days ago. I did
not know that it is only for current->domain. Thanks for pointing it out.

> Your solution is to expose things via the xenforeignmemory interface instead, 
> modelled after the vmtrace_buf functionality.
> 
> Does that all sound right?
> 

That's correct. I used vmtrace_buf functionality for inspiration.

> I think at a high level that’s probably the right way to go.
> 
> As you say, my default would be to expose similar information as 
> VCPUOP_get_runstate_info.  I’d even consider just using vcpu_runstate_info_t.
> 
> The other option would be to try to make the page a more general “foreign 
> vcpu info” page, which we could expand with more information as we find it 
> useful.
> 
> In this patch, you’re allocating 4k *per vcpu on the entire system* to hold a 
> single 64-bit value; even if you decide to use vcpu_runstate_info_t, there’s 
> still quite a large wastage.  Would it make sense rather to have this pass 
> back an array of MFNs designed to be mapped contiguously, with the vcpus 
> listed as an array? This seems to be what XENMEM_resource_ioreq_server does.
> 
> The advantage of making the structure extensible is that we wouldn’t need to 
> add another interface, and potentially another full page, if we wanted to add 
> more functionality that we wanted to export.  On the other hand, every new 
> functionality that we add may require adding code to copy things into it; 
> making it so that such code is added bit by bit as it’s requested might be 
> better.
> 

Current PoC is indeed a waste of memory in two senses:
1) data structures are not well chosen 
2) memory is being allocated unconditionally

For 1), you propose to use an extensible structure on top of an array of MFNs. I
checked xen.git/xen/include/public/hvm/ioreq.h, it defines the
structure:

struct shared_iopage {
    struct ioreq vcpu_ioreq[1];
};

And then, it accesses it as:

p->vcpu_ioreq[v->vcpu_id];

I could have similar structures, let me sketch it and then I will write down a
design document. The extensible structures could look like:

struct vcpu_stats{ 
   uint64 runstate_running_time;
   // potentially other runstate-time counters
};

struct shared_statspages {
   // potentially other counters, e.g., domain-info
   struct vcpu_stats vcpu_info[1]
}; 

The shared_statspage structure would be mapped on top of an array of continuous
MFNs. The vcpus are listed as an array. I think this structure could be extended
to record per-domain counters by defining them just before vcpu_info[1].  

What do you think?

For 2), you propose a domctl flag on domain creation to enable/disable the
allocation and use of these buffers. I think that is the right way to go for the
moment.

There is a 3) point regarding what Jan suggested about how to ensure that the
consumed data is consistent. I do not have a response for that yet, I will think
about it.

I will address these points and submit v1 in the next few weeks.   

Thanks, Matias.

> I have some more comments I’ll give on the 1/2 patch.
> 
>  -George
> 
> 
> 
> 
> 
> 
> > 
> > Thanks, Matias.
> > 
> > [1] https://github.com/MatiasVara/xen/tree/feature_stats
> > 
> > Matias Ezequiel Vara Larsen (2):
> >  xen/memory : Add stats_table resource type
> >  tools/misc: Add xen-stats tool
> > 
> > tools/misc/Makefile         |  5 +++
> > tools/misc/xen-stats.c      | 83 +++++++++++++++++++++++++++++++++++++
> > xen/common/domain.c         | 42 +++++++++++++++++++
> > xen/common/memory.c         | 29 +++++++++++++
> > xen/common/sched/core.c     |  5 +++
> > xen/include/public/memory.h |  1 +
> > xen/include/xen/sched.h     |  5 +++
> > 7 files changed, 170 insertions(+)
> > create mode 100644 tools/misc/xen-stats.c
> > 
> > --
> > 2.25.1
> > 
> 





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.