[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 00/10] Preemption in hypervisor (ARM only)


  • To: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>
  • From: George Dunlap <George.Dunlap@xxxxxxxxxx>
  • Date: Mon, 1 Mar 2021 14:39:26 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2s7/jpucbryLMd/CgBEC63KtswULGB+1xWcUaD9hRqw=; b=HCO04tomrdpPhAEDe8rVyjG2rlIpw6O+bIaKJrGtxuWorxgCH9OBqINi4oeV8uKoqL1NcjFqns/ag7ze3dkrkzLxPSK4cCOggVZFpwkHBBDGMkMim5GW+0fF9QiFsbooPnPIuhp4PBR+AjJFT0qfXWGyD0+4NRCrHyatheGvM/CCJ/0U5eJ1Uttvclue08EJ1MtOiTfqY5rikMOx53xZdjZL+G4uWaqv2le4whjzA87V6qHzSCVEAkUpf4kqyfCgwpFDa6280oIQlirTNs20KrZ6S1mOYFV8IK4CvhTcfRV9SjpBDFiE7f3BF4tLwP7UjB0C5+UIBouEJJN+l10vzg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=AILkgrJAbRqHBI+HmkDIWtQDISsL7zNU6xoVte5R5FTGDXSjLRRwk8xpegRZbuzj+pXGbMGNNDvcBDhUyY7ag+Memj53P5lJ1jcApfo+OG2XEjVUj0lWbvPtKlbvPw/2ZdfOkdSYCd2F0ejU9DPYBfkFWF1LG5UEmaZUYBgPqWAOJoqf22dX0G4YFlcrZZT+5gWAYA4GXtBHyP4RitVjiBIYMT4WW9qiLEk0965J127COtucqd5LAnUT4U/dROb5JFiJJFlhABomZUxfiO42LfHp0ZBkPX+ds/cMFPwDwKMeO+zDNEOYkdc2DQW+r6hCNwEUrQZPDVx7F5w5X4rgPg==
  • Authentication-results: esa2.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "Dario Faggioli" <dfaggioli@xxxxxxxx>, Meng Xu <mengxu@xxxxxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Mon, 01 Mar 2021 14:39:33 +0000
  • Ironport-sdr: WixwaUwfPGjh23YRBqHwapn7I0xlNemX5rq86OFdbR1/dsW7/3H86cOH99Uc3pkEJy+SrTdhXK bsg5hdhiIRMR1+j4paXeFikAsYP0WEXjjZB3S8vVgcaWxnqjsgPBgLLrys0wjc7jgnTgCGex8c iKvaEbM/gLgYpir0YUqfvTW0rKD+pjsktwUbEtW1IYj+ZvgHuXs3RxHhcwFfDscIyWEEOGtuni ASP5UoPKV6lQ80T1HMbZiDeed4GpmwOKGcuhtPiBL0XxnZJURIjhd1tRECl3tevAJJ+fQ66DWA qgs=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHXCYx4A6OUUHr1gkqxWv1TEOLkuqpnnLaAgABcP4CAB0VMgA==
  • Thread-topic: [RFC PATCH 00/10] Preemption in hypervisor (ARM only)


> On Feb 24, 2021, at 11:37 PM, Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx> 
> wrote:
> 
> 
>> Hypervisor/virt properties are different to both a kernel-only-RTOS, and
>> regular usespace.  This was why I gave you some specific extra scenarios
>> to do latency testing with, so you could make a fair comparison of
>> "extra overhead caused by Xen" separate from "overhead due to
>> fundamental design constraints of using virt".
> 
> I can't see any fundamental constraints there. I see how virtualization
> architecture can influence context switch time: how many actions you
> need to switch one vCPU to another. I have in mind low level things
> there: reprogram MMU to use another set of tables, reprogram your
> interrupt controller, timer, etc. Of course, you can't get latency lower
> that context switch time. This is the only fundamental constraint I can
> see.

Well suppose you have two domains, A and B, both of which control  hardware 
which have hard real-time requirements.

And suppose that A has just started handling handling a latency-sensitive 
interrupt, when a latency-sensitive interrupt comes in for B.  You might well 
preempt A and let B run for a full timeslice, causing A’s interrupt handler to 
be delayed by a significant amount.

Preventing that sort of thing would be a much more tricky issue to get right.

>> If you want timely interrupt handling, you either need to partition your
>> workloads by the long-running-ness of their hypercalls, or not have
>> long-running hypercalls.
> 
> ... or do long-running tasks asynchronously. I believe, for most
> domctls and sysctls there is no need to hold calling vCPU in hypervisor
> mode at all.
> 
>> I remain unconvinced that preemption is an sensible fix to the problem
>> you're trying to solve.
> 
> Well, this is the purpose of this little experiment. I want to discuss
> different approaches and to estimate amount of required efforts. By the
> way, from x86 point of view, how hard to switch vCPU context while it is
> running in hypervisor mode?

I’m not necessarily opposed to introducing preemption, but the more we ask 
about things, the more complex things begin to look.  The idea of introducing 
an async framework to deal with long-running hypercalls is a huge engineering 
and design effort, not just for Xen, but for all future callers of the 
interface.

The claim in the cover letter was that “[s]ome hypercalls can not be preempted 
at all”.  I looked at the reference, and it looks like you’re referring to this:

"I brooded over ways to make [alloc_domheap_pages()] preemptible. But it is a) 
located deep in call chain and b) used not only by hypercalls. So I can't see 
an easy way to make it preemptible."

Let’s assume for the sake of argument that preventing delays due to 
alloc_domheap_pages() would require significant rearchitecting of the code.  
And let’s even assume that there are 2-3 other such knotty issues making for 
unacceptably long hypercalls.  Will identifying and tracking down those issues 
really be more effort than introducing preemption, introducing async 
operations, and all the other things we’ve been talking about?

One thing that might be interesting is to add some sort of metrics (disabled in 
Kconfig by default); e.g.:

1. On entry to a hypercall, take a timestamp

2. On every hypercall_preempt() call, take another timestamp and see how much 
time has passed without a preempt, and reset the timestamp count; also do a 
check on exit of the hypercall

We could start by trying to do stats and figuring out which hypercalls go the 
longest without preemption, as a way to guide the optimization efforts.  Then 
as we get that number down, we could add an ASSERT()s that the time is never 
longer than a certain amount, and add runs like that to osstest to make sure 
there are no regressions introduced.

I agree that hypercall continuations are complex; and you’re right that the 
fact that the hypercall continuation may never be called limits where 
preemption can happen.  But making the entire hypervisor preemption-friendly is 
also quite complex in its own way; it’s not immediately obvious to me from this 
thread that hypervisor preemption is less complex.

 -George

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.