[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 07/21] IOMMU/x86: support freeing of pagetables


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Tue, 3 May 2022 18:20:43 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=J6RFzzGxPKZprX2xkIxyR/rvK1kwYmEkki2XS56ruW0=; b=fYOhSON1k6LDzn1ZhZWP4a6Zs825sqUgCQKLFsDzrh0fE27VgM8tUGw/k0jJsg1keNp7yeh5HToc0lGDricslYWePQkopfwQE1nARHvy0nms3a8E1EKL0g6BWaCkh9uHmwwhP7Ni1QuONpQ3C7lgydCI0yF4MwJYgtmT3mgEyaHiga/aRiHgxQUVrTkafZLaAa2TvJweAfb8y/wLVhH88mg3OOgFKF0LPnDT7HWig6ryOw9uUCBzcnb69H7P+7Xhn7K1MYay5/qBhosFRy99eTmNuD21ysYj4xwu1f1eyCp0lrzBrbxiQdKNyqBbiKIsafdANdAPCmBFsNPrMSAT9Q==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=V8s5oBAVz4uuFdp6JrbNxZoj32WBitCk3P3FUinCVvuI/tCdXWETo+gtayU0A1HUNiMFJKhm5mI7ly3/3YYVeeA0FferzEcSwrZ49GIFanAV16KLShhFSKcIL5GCWuip6lWOPvgfbvqHdcS/02FJGZ2l4qCsAj1fLUI5lPafN1kdKhUTySPj7X5CfuY9jGEQNYCcie4srN1Z3xyLbHJdsQv3aLjaTu7ntsj+Re1QJbH2QPFMba3Xlsam24lYpr7p7aLx7pE1C2oHNH1ppE4b6twiOHZ8IBEoG7yJe+NuDLYvHB6Y6lFFjislvL5vEegRIr7Ptz9bArXAOhC3L1E97g==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Tue, 03 May 2022 16:21:19 +0000
  • Ironport-data: A9a23:iIKJLqpLlridN0HI436IyXIA/WReBmIdZBIvgKrLsJaIsI4StFCzt garIBmPPqmIZjOmL9l2OYy1/RsFvZWGxoAxGVM/qihmHilB+JuZCYyVIHmrMnLJJKUvbq7GA +byyDXkBJppJpMJjk71atANlVEliefQAOCU5NfsYkidfyc9IMsaoU8lyrdRbrJA24DjWVvQ4 Yiq+aUzBXf+s9JKGjNMg068gEsHUMTa4Fv0aXRnOJinFHeH/5UkJMp3yZOZdhMUcaENdgKOf M7RzanRw4/s10xF5uVJMFrMWhZirrb6ZWBig5fNMkSoqkAqSicais7XOBeAAKv+Zvrgc91Zk b1wWZKMpQgBZ4ruoeQEURpkNX96Le5i+aP6OnOwvpnGp6HGWyOEL/RGKmgTZNdd1sMpRGZE+ LofNSwHaQ2Fi6Su2rWnR+Jwh8Mlas72IIcYvXImxjbcZRokacmbH+OWupkFjXFp2Zwm8fX2P qL1bRJ1axvNeVtXM0o/A5Mihua4wHL4dlW0rXrK/fdvvjWOlGSd1pDSbPbnaIGLFfx2n2nFi GnK5U/JGDoVYYn3JT2ttyjEavX0tSHxVZ8WFba43uV3m1DVzWsWYDUGWF3+rfSnh0qWX9NEN 1dS6icotbI19kGgUp/6RRLQiGaNoxo0S9dWVeog52ml1a788wufQG8eQVZ8hMcOscY3QXkm0 wGPltawXzh36uTKFTSa66ueqi60NW4NN2geaCQYTAwDpd7+vIU0iRGJRdFmeEKosuDI9fjL6 2jihEADa3871JBjO3mTlbwfvw+Rmw==
  • Ironport-hdrordr: A9a23:wj8Xy6x6D80FC91Jc275KrPxv+skLtp133Aq2lEZdPULSKGlfp GV9sjziyWetN9wYh4dcB67Scy9qFfnhOZICOgqTM6ftWzd1FdAQ7sD0WKP+UyCJ8S6zJ8n6U 4CSdkDNDSTNykcsS+S2mDRfbcdKZu8gcaVbI/lvgpQpGpRGsVdBmlCe2Sm+hocfng9OXN1Lu vU2uN34x6bPVgHZMWyAXcIG8DFut3wjZrjJToLHQQu5gWihS6hrOeSKWnS4j4uFxd0hZsy+2 nMlAL0oo2lrvGA0xfZk0ve9Y5fltfNwsZKQOaMls8WADPxjRvAXvUoZ5Sy+BQO5M2/4lcjl9 fB5z8mIsRI8nvUOlq4pBP8sjOQpArHRxfZuC+lqEqmhfa8aCMxCsJHi44cWADe8VAcsNZ117 8O936FtrJMZCmw0xjV1pztbVVHh0C0qX0tnao4lHpES7YTb7dXsMg24F5VKpEdByj3gbpXXN WGNPuspcq+TGnqL0ww5gJUsZ+RtzUIb1q7q3E5y4KoO2M8pgE686MarPZv60vouqhNDqWs3N 60TJiApIs+MfP+UpgNddvpYfHHfVAlEii8Rl57HzzcZdI6EkOIjaLLy5MIw8zvUKA07fIJ6e b8uRVjxCQPR34=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Apr 25, 2022 at 10:35:45AM +0200, Jan Beulich wrote:
> For vendor specific code to support superpages we need to be able to
> deal with a superpage mapping replacing an intermediate page table (or
> hierarchy thereof). Consequently an iommu_alloc_pgtable() counterpart is
> needed to free individual page tables while a domain is still alive.
> Since the freeing needs to be deferred until after a suitable IOTLB
> flush was performed, released page tables get queued for processing by a
> tasklet.
> 
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> I was considering whether to use a softirq-tasklet instead. This would
> have the benefit of avoiding extra scheduling operations, but come with
> the risk of the freeing happening prematurely because of a
> process_pending_softirqs() somewhere.

I'm sorry again if I already raised this, I don't seem to find a
reference.

What about doing the freeing before resuming the guest execution in
guest vCPU context?

We already have a hook like this on HVM in hvm_do_resume() calling
vpci_process_pending().  I wonder whether we could have a similar hook
for PV and keep the pages to be freed in the vCPU instead of the pCPU.
This would have the benefit of being able to context switch the vCPU
in case the operation takes too long.

Not that the current approach is wrong, but doing it in the guest
resume path we could likely prevent guests doing heavy p2m
modifications from hogging CPU time.

> ---
> v4: Change type of iommu_queue_free_pgtable()'s 1st parameter. Re-base.
> v3: Call process_pending_softirqs() from free_queued_pgtables().
> 
> --- a/xen/arch/x86/include/asm/iommu.h
> +++ b/xen/arch/x86/include/asm/iommu.h
> @@ -147,6 +147,7 @@ void iommu_free_domid(domid_t domid, uns
>  int __must_check iommu_free_pgtables(struct domain *d);
>  struct domain_iommu;
>  struct page_info *__must_check iommu_alloc_pgtable(struct domain_iommu *hd);
> +void iommu_queue_free_pgtable(struct domain_iommu *hd, struct page_info *pg);
>  
>  #endif /* !__ARCH_X86_IOMMU_H__ */
>  /*
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -12,6 +12,7 @@
>   * this program; If not, see <http://www.gnu.org/licenses/>.
>   */
>  
> +#include <xen/cpu.h>
>  #include <xen/sched.h>
>  #include <xen/iommu.h>
>  #include <xen/paging.h>
> @@ -550,6 +551,91 @@ struct page_info *iommu_alloc_pgtable(st
>      return pg;
>  }
>  
> +/*
> + * Intermediate page tables which get replaced by large pages may only be
> + * freed after a suitable IOTLB flush. Hence such pages get queued on a
> + * per-CPU list, with a per-CPU tasklet processing the list on the assumption
> + * that the necessary IOTLB flush will have occurred by the time tasklets get
> + * to run. (List and tasklet being per-CPU has the benefit of accesses not
> + * requiring any locking.)
> + */
> +static DEFINE_PER_CPU(struct page_list_head, free_pgt_list);
> +static DEFINE_PER_CPU(struct tasklet, free_pgt_tasklet);
> +
> +static void free_queued_pgtables(void *arg)
> +{
> +    struct page_list_head *list = arg;
> +    struct page_info *pg;
> +    unsigned int done = 0;
> +

With the current logic I think it might be helpful to assert that the
list is not empty when we get here?

Given the operation requires a context switch we would like to avoid
such unless there's indeed pending work to do.

> +    while ( (pg = page_list_remove_head(list)) )
> +    {
> +        free_domheap_page(pg);
> +
> +        /* Granularity of checking somewhat arbitrary. */
> +        if ( !(++done & 0x1ff) )
> +             process_pending_softirqs();
> +    }
> +}
> +
> +void iommu_queue_free_pgtable(struct domain_iommu *hd, struct page_info *pg)
> +{
> +    unsigned int cpu = smp_processor_id();
> +
> +    spin_lock(&hd->arch.pgtables.lock);
> +    page_list_del(pg, &hd->arch.pgtables.list);
> +    spin_unlock(&hd->arch.pgtables.lock);
> +
> +    page_list_add_tail(pg, &per_cpu(free_pgt_list, cpu));
> +
> +    tasklet_schedule(&per_cpu(free_pgt_tasklet, cpu));
> +}
> +
> +static int cf_check cpu_callback(
> +    struct notifier_block *nfb, unsigned long action, void *hcpu)
> +{
> +    unsigned int cpu = (unsigned long)hcpu;
> +    struct page_list_head *list = &per_cpu(free_pgt_list, cpu);
> +    struct tasklet *tasklet = &per_cpu(free_pgt_tasklet, cpu);
> +
> +    switch ( action )
> +    {
> +    case CPU_DOWN_PREPARE:
> +        tasklet_kill(tasklet);
> +        break;
> +
> +    case CPU_DEAD:
> +        page_list_splice(list, &this_cpu(free_pgt_list));

I think you could check whether list is empty before queuing it?

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.