Xen project Mailing List

Re: [RFC PATCH 0/2] Add a new acquire resource to query vcpu statistics

To: George Dunlap <George.Dunlap@xxxxxxxxxx>

From: Matias Ezequiel Vara Larsen <matiasevara@xxxxxxxxx>

Date: Fri, 24 Jun 2022 15:06:35 +0200

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Matias Ezequiel Vara Larsen <matias.vara@xxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Anthony Perard <anthony.perard@xxxxxxxxxx>

Delivery-date: Fri, 24 Jun 2022 13:06:47 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hello George and thanks for the review! you will find my comments below. On Fri, Jun 17, 2022 at 07:54:32PM +0000, George Dunlap wrote: > > > > On 17 May 2022, at 15:33, Matias Ezequiel Vara Larsen > > <matiasevara@xxxxxxxxx> wrote: > > > > Hello all, > > > > The purpose of this RFC is to get feedback about a new acquire resource that > > exposes vcpu statistics for a given domain. The current mechanism to get > > those > > statistics is by querying the hypervisor. This mechanism relies on a > > hypercall > > and holds the domctl spinlock during its execution. When a pv tool like > > xcp-rrdd > > periodically samples these counters, it ends up affecting other paths that > > share > > that spinlock. By using acquire resources, the pv tool only requires a few > > hypercalls to set the shared memory region and samples are got without > > issuing > > any other hypercall. The original idea has been suggested by Andrew Cooper > > to > > which I have been discussing about how to implement the current PoC. You can > > find the RFC patch series at [1]. The series is rebased on top of > > stable-4.15. > > > > I am currently a bit blocked on 1) what to expose and 2) how to expose it. > > For > > 1), I decided to expose what xcp-rrdd is querying, e.g., > > XEN_DOMCTL_getvcpuinfo. > > More precisely, xcp-rrd gets runstate.time[RUNSTATE_running]. This is a > > uint64_t > > counter. However, the time spent in other states may be interesting too. > > Regarding 2), I am not sure if simply using an array of uint64_t is enough > > or if > > a different interface should be exposed. The remaining question is when to > > get > > new values. For the moment, I am updating this counter during > > vcpu_runstate_change(). > > > > The current series includes a simple pv tool that shows how this new > > interface is > > used. This tool maps the counter and periodically samples it. > > > > Any feedback/help would be appreciated. > > Hey Matias, > > Sorry it’s taken so long to get back to you. My day-to-day job has shifted > away from technical things to community management; this has been on my radar > but I never made time to dig into it. > > There are some missing details I’ve had to try to piece together about the > situation, so let me sketch things out a bit further and see if I understand > the situation: > > * xcp-rrd currently wants (at minimum) to record > runstate.time[RUNSTATE_running] for each vcpu. Currently that means calling > XEN_DOMCTL_getvcpuinfo, which has to hold a single global domctl_lock (!) for > the entire hypercall; and of course must be iterated over every vcpu in the > system for every update. > For example, in xcp-ng, xcp-rrdd is sampling all the VCPUs of the system every 5 seconds. Also, xcp-rrdd queries other counters like XEN_DOMCTL_getdomaininfo. Out of curiosity, do you know any benchmark to measure the impact of this querying? My guess is that the time of domctl-based operations would increase with the number of VCPUs. In such a escenario, I am supposing that the time to query all vcpus increase with the number of vcpus thus holding the domctl_lock longer. However, this would be only observable in a large enough system. > * VCPUOP_get_runstate_info copies out a vcpu_runstate_info struct, which > contains information on the other runstates. Additionally, > VCPUOP_register_runstate_memory_area already does something similar to what > you want: it passes a virtual address to Xen, which Xen maps, and copies > information about the various vcpus into (in update_runstate_area()). > > * However, the above assumes a domain of “current->domain”: That is a domain > can call VCPUOP_get_runstate_info on one of its own vcpus, but dom0 cannot > call it to get information about the vcpus of other domains. > > * Additionally, VCPUOP_register_runstate_memory_area registers by *virtual > address*; this is actually problematic even for guest kernels looking at > their own vcpus; but would be completely inappropriate for a dom0 userspace > application, which is what you’re looking at. > I just learned about VCPUOP_register_runstate_memory_area a few days ago. I did not know that it is only for current->domain. Thanks for pointing it out. > Your solution is to expose things via the xenforeignmemory interface instead, > modelled after the vmtrace_buf functionality. > > Does that all sound right? > That's correct. I used vmtrace_buf functionality for inspiration. > I think at a high level that’s probably the right way to go. > > As you say, my default would be to expose similar information as > VCPUOP_get_runstate_info. I’d even consider just using vcpu_runstate_info_t. > > The other option would be to try to make the page a more general “foreign > vcpu info” page, which we could expand with more information as we find it > useful. > > In this patch, you’re allocating 4k *per vcpu on the entire system* to hold a > single 64-bit value; even if you decide to use vcpu_runstate_info_t, there’s > still quite a large wastage. Would it make sense rather to have this pass > back an array of MFNs designed to be mapped contiguously, with the vcpus > listed as an array? This seems to be what XENMEM_resource_ioreq_server does. > > The advantage of making the structure extensible is that we wouldn’t need to > add another interface, and potentially another full page, if we wanted to add > more functionality that we wanted to export. On the other hand, every new > functionality that we add may require adding code to copy things into it; > making it so that such code is added bit by bit as it’s requested might be > better. > Current PoC is indeed a waste of memory in two senses: 1) data structures are not well chosen 2) memory is being allocated unconditionally For 1), you propose to use an extensible structure on top of an array of MFNs. I checked xen.git/xen/include/public/hvm/ioreq.h, it defines the structure: struct shared_iopage { struct ioreq vcpu_ioreq[1]; }; And then, it accesses it as: p->vcpu_ioreq[v->vcpu_id]; I could have similar structures, let me sketch it and then I will write down a design document. The extensible structures could look like: struct vcpu_stats{ uint64 runstate_running_time; // potentially other runstate-time counters }; struct shared_statspages { // potentially other counters, e.g., domain-info struct vcpu_stats vcpu_info[1] }; The shared_statspage structure would be mapped on top of an array of continuous MFNs. The vcpus are listed as an array. I think this structure could be extended to record per-domain counters by defining them just before vcpu_info[1]. What do you think? For 2), you propose a domctl flag on domain creation to enable/disable the allocation and use of these buffers. I think that is the right way to go for the moment. There is a 3) point regarding what Jan suggested about how to ensure that the consumed data is consistent. I do not have a response for that yet, I will think about it. I will address these points and submit v1 in the next few weeks. Thanks, Matias. > I have some more comments I’ll give on the 1/2 patch. > > -George > > > > > > > > > > Thanks, Matias. > > > > [1] https://github.com/MatiasVara/xen/tree/feature_stats > > > > Matias Ezequiel Vara Larsen (2): > > xen/memory : Add stats_table resource type > > tools/misc: Add xen-stats tool > > > > tools/misc/Makefile | 5 +++ > > tools/misc/xen-stats.c | 83 +++++++++++++++++++++++++++++++++++++ > > xen/common/domain.c | 42 +++++++++++++++++++ > > xen/common/memory.c | 29 +++++++++++++ > > xen/common/sched/core.c | 5 +++ > > xen/include/public/memory.h | 1 + > > xen/include/xen/sched.h | 5 +++ > > 7 files changed, 170 insertions(+) > > create mode 100644 tools/misc/xen-stats.c > > > > -- > > 2.25.1 > > >

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.