[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 1/2] IOMMU/x86: disallow device assignment to PoD guests


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 3 Feb 2022 10:21:54 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ybZ7fVeG4BCzRYN4EowpOl+XVaC7V3n0qFG0YeMTT00=; b=HrZW8/u5TgSgqMQjOBf9ElGQlSWrX/v1b7b+U1kirOToMM2R6foOkkYg8lv4LZqBrLgtxl4U8K6MyXUVYzeor7vGt4ASKdFWDnlwpFHUZMJZuDUK19UO5SsQRfZvXteA/DkS/HivjqsBWCar9cVyQdUreNVq973CPIlpU3Ke1qbxbyVea6CEvKBy6+31l/gkR81UF5PSvWqmUsuuxQGPSVb8sitdo9mX7gFzSnpqQlwBiOD1SCgiaeCfwcw1VdbRsh3Va7OdnNmuqNbq+ZL4JQxkE9ioaYh7dXACZ4eIm8KZAlPcmk71ycG6E1dp47m2gpfwjpa6rBKotJ+FG2BlSg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YNdlCed5w5Vt6cvtP/3lWewuD6pwZCuKISJ2CFfq6+6KEUnqjS2M+vTo6JKvhjwA63OIGO22bpr0q45g0mJrmMGYWC1wF4rIBJ9fgJL9gcjnUl9blIaPTPzDXfivzeXcMzefj3nXICNWB1zGizk4SRhd01+7LlpuIHfmamnemrKAoM2w2uft+43SUx+DZeMmHS5DbjXCe8uLF0hx5E0IjMDhkqB4zOGSBr8Qh3MUEEcEx3k9IWpyqZuMvIN6G2LiSVcvVaIBCpf8HoSf/ny8QeYmb/iUpmXO2bjrIEyTRcG5sZZEKbP9xX1Uc694DWhF6WSx4+fc+ebl6ZmuDd1bew==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Tamas K Lengyel <tamas@xxxxxxxxxxxxx>, Petre Pircalabu <ppircalabu@xxxxxxxxxxxxxxx>, Alexandru Isaila <aisaila@xxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 03 Feb 2022 09:22:10 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 03.02.2022 10:04, Roger Pau Monné wrote:
> On Thu, Feb 03, 2022 at 09:31:03AM +0100, Jan Beulich wrote:
>> On 02.02.2022 17:13, Roger Pau Monné wrote:
>>> On Tue, Jan 04, 2022 at 10:41:32AM +0100, Jan Beulich wrote:
>>>> @@ -359,7 +360,10 @@ p2m_pod_set_mem_target(struct domain *d,
>>>>  
>>>>      ASSERT( pod_target >= p2m->pod.count );
>>>>  
>>>> -    ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/);
>>>> +    if ( has_arch_pdevs(d) || cache_flush_permitted(d) )
>>>
>>> Is it possible to have cache flush allowed without any PCI device
>>> assigned? AFAICT the iomem/ioport_caps would only get setup when there
>>> are device passed through?
>>
>> One can assign MMIO or ports to a guest the raw way. That's not
>> secure, but functionally explicitly permitted.
>>
>>> TBH I would be fine if we just say that PoD cannot be used in
>>> conjunction with an IOMMU, and just check for is_iommu_enable(d) here.
>>>
>>> I understand it's technically possible for PoD to be used together
>>> with a domain that will later get a device passed through once PoD is
>>> no longer in use, but I doubt there's much value in supporting that
>>> use case, and I fear we might be introducing corner cases that could
>>> create issues in the future. Overall I think it would be safer to just
>>> disable PoD in conjunction with an IOMMU.
>>
>> I consider it wrong to put in place such a restriction, but I could
>> perhaps accept you and Andrew thinking this way if this was the only
>> aspect playing into here. However, this would then want an equivalent
>> tools side check, and while hunting down where to make the change as
>> done here, I wasn't able to figure out where that alternative
>> adjustment would need doing. Hence I would possibly(!) buy into this
>> only if someone else took care of doing so properly in the tool stack
>> (including the emission of a sensible error message).
> 
> What about the (completely untested) chunk below:
> 
> diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
> index d7a40d7550..e585ef4c5c 100644
> --- a/tools/libs/light/libxl_create.c
> +++ b/tools/libs/light/libxl_create.c
> @@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc,
>      pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) &&
>          (d_config->b_info.target_memkb < d_config->b_info.max_memkb);
>  
> -    /* We cannot have PoD and PCI device assignment at the same time
> +    /* We cannot have PoD and an active IOMMU at the same time
>       * for HVM guest. It was reported that IOMMU cannot work with PoD
>       * enabled because it needs to populated entire page table for
> -     * guest. To stay on the safe side, we disable PCI device
> -     * assignment when PoD is enabled.
> +     * guest.
>       */
>      if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV &&
> -        d_config->num_pcidevs && pod_enabled) {
> +        d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED &&
> +        pod_enabled) {
>          ret = ERROR_INVAL;
> -        LOGD(ERROR, domid,
> -             "PCI device assignment for HVM guest failed due to PoD 
> enabled");
> +        LOGD(ERROR, domid, "IOMMU cannot be enabled together with PoD");
>          goto error_out;
>      }

Perhaps. Seeing this I actually recall coming across this check during
my investigation. Not changing it along the lines of what you do was
then really more because of me not being convinced of the extra
restriction; I clearly misremembered when writing the earlier reply.
If we were to do what you suggest, I'd like to ask that the comment be
changed differently, though: "We cannot ..." then isn't really true
anymore. We choose not to permit this mode; "cannot" only applies to
actual device assignment (and of course only as long as there aren't
restartable IOMMU faults).

>> Finally this still leaves out the "raw MMIO / ports" case mentioned
>> above.
> 
> But the raw MMIO 'mode' doesn't care much about PoD, because if
> there's no PCI device assigned there's no IOMMU setup, and thus such
> raw MMIO regions (could?) belong to a device that's not constrained by
> the guest p2m anyway?

Hmm, yes, true.

>>>> --- a/xen/common/vm_event.c
>>>> +++ b/xen/common/vm_event.c
>>>> @@ -639,7 +639,7 @@ int vm_event_domctl(struct domain *d, st
>>>>  
>>>>              rc = -EXDEV;
>>>>              /* Disallow paging in a PoD guest */
>>>> -            if ( p2m_pod_entry_count(p2m_get_hostp2m(d)) )
>>>> +            if ( p2m_pod_active(d) )
>>>
>>> Isn't it fine to just check for entry_count like you suggest in the
>>> change to libxl?
>>
>> I didn't think it would be, but I'm not entirely sure: If paging was
>> enabled before a guest actually starts, it wouldn't have any entries
>> but still be a PoD guest if it has a non-empty cache. The VM event
>> folks may be able to clarify this either way. But ...
>>
>>> This is what p2m_pod_entry_count actually does (rather than entry_count | 
>>> count).
>>
>> ... you really mean "did" here, as I'm removing p2m_pod_entry_count()
>> in this patch. Of course locking could be added to it instead (or
>> p2m_pod_get_mem_target() be used in its place), but I'd prefer if we
>> could go with just the check which precisely matches what the comment
>> says (IOW otherwise I'd need to additionally know what exactly the
>> comment is to say).
> 
> Could you briefly mention this in the commit message? Ie: VM event
> code is also adjusted to make sure PoD is not in use and cannot be
> used during the guest lifetime?

I've added

"Nor was the use of that function in line with the immediately preceding
 comment: A PoD guest isn't just one with a non-zero entry count, but
 also one with a non-empty cache (e.g. prior to actually launching the
 guest)."

to the already existing paragraph about the removal of that function.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.