[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 5/9] vpci/header: Implement guest BAR register handlers


  • To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>, Oleksandr Andrushchenko <andr2000@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 7 Sep 2021 18:30:51 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=aci6m3pjwbaz5O1KJStmBiwl09NYg9Epzo3SDRuPRc0=; b=GzohOCJce+guttojPssOmQoSdKjE6SgftLQQ2pOBCw74eyOVAbDlIwxD4JRNK2A3csjf8zcYbSlp84pSAGeSzSTwRu1vjy9glEM6UGfR5SSwU61bWilHLVQZN10ShwZuBOySZXN5tLDrxeG4DJjDaIYvk3VUeF3ifpNFmkjMKQPVskNzlpAqbVxdmwsasc0wSu3mOMyw/hr+F4vJWht3Z5PHAKdLthkWGsACpx5MK7LYvOYw5pinyINtn6g8IVg6VuSGEiX6a3C1/5XY4WJRwloGGtTutwMgDSfBTCXNL00OmSvLt9olKThUbjVixHNlomDJe8gL2CyvriN6Os0bAg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gjO9pJob/cDyZZx2niWHexw75N8srREp7wcjiaC6YJN3hqv6+IKGBy5w56amWYyPA/f5W+zQdB+ykd5UObIx62+fpVw9W4UEPRLbj7AvU/pKuL8xmloUI+9zIzWxNys2Jwpky+PGX2APMsnisnibmGytdadcwi7gKWGDsNiMR3gEX71j8W/9cXF4bMrJetTVGm4dWLA0p0hgRP6oOkQjJo9MPcT2E4wI/e9dUre/8ZNz8SixX8aFwuTdHURd42MK/uZ860tOpTs3Pwn1sYbV41nQ1/j932wv91atoScseE4rTze4qVo2uKWQfAat92/kQcnQXsdM14jSRltJUsOXxw==
  • Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;
  • Cc: "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "roger.pau@xxxxxxxxxx" <roger.pau@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 07 Sep 2021 16:31:12 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 07.09.2021 15:33, Oleksandr Andrushchenko wrote:
> On 06.09.21 17:31, Jan Beulich wrote:
>> On 03.09.2021 12:08, Oleksandr Andrushchenko wrote:
>>> --- a/xen/drivers/vpci/header.c
>>> +++ b/xen/drivers/vpci/header.c
>>> @@ -400,12 +400,72 @@ static void bar_write(const struct pci_dev *pdev, 
>>> unsigned int reg,
>>>   static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg,
>>>                               uint32_t val, void *data)
>>>   {
>>> +    struct vpci_bar *bar = data;
>>> +    bool hi = false;
>>> +
>>> +    if ( bar->type == VPCI_BAR_MEM64_HI )
>>> +    {
>>> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
>>> +        bar--;
>>> +        hi = true;
>>> +    }
>>> +    else
>>> +        val &= PCI_BASE_ADDRESS_MEM_MASK;
>>> +    bar->guest_addr &= ~(0xffffffffull << (hi ? 32 : 0));
>>> +    bar->guest_addr |= (uint64_t)val << (hi ? 32 : 0);
>> What you store here is not the address that's going to be used,
> 
> bar->guest_addr is never used directly to be reported to a guest.

And this is what I question, as an approach. Your model _may_ work,
but its needlessly deviating from how people like me would expect
this to work. And if it's intended to be this way, how would I
have known?

> It is always used as an initial value which is then modified to reflect
> lower bits, e.g. BAR type and if prefetchable, so I think this is perfectly
> fine to have it this way.

And it is also perfectly fine to store the value to be handed
back to guests on the next read. Keeps the read path simple,
which I think can be assumed to be taken more frequently than
the write one. Plus stored values reflect reality.

Plus - if what you say was really the case, why do you mask off
PCI_BASE_ADDRESS_MEM_MASK here? You should then simply store
the written value and do _all_ the processing in the read path.
No point having partial logic in two places.

>>   as
>> you don't mask off the low bits (to account for the BAR's size).
>> When a BAR gets written with all ones, all writable bits get these
>> ones stored. The address of the BAR, aiui, really changes to
>> (typically) close below 4Gb (in the case of a 32-bit BAR), which
>> is why memory / I/O decoding should be off while sizing BARs.
>> Therefore you shouldn't look for the specific "all writable bits
>> are ones" pattern (or worse, as you presently do, the "all bits
>> outside of the type specifier are ones" one) on the read path.
>> Instead mask the value appropriately here, and simply return back
>> the stored value from the read path.
> "PCI LOCAL BUS SPECIFICATION, REV. 3.0", "IMPLEMENTATION NOTE
> 
> Sizing a 32-bit Base Address Register Example" says, that
> 
> "Software saves the original value of the Base Address register, writes
> 0 FFFF FFFFh to the register, then reads it back."
> 
> The same applies for 64-bit BARs. So what's wrong if I try to catch such
> a write when a guest tries to size the BAR? The only difference is that
> I compare as
>          if ( (val & PCI_BASE_ADDRESS_MEM_MASK_32) == 
> PCI_BASE_ADDRESS_MEM_MASK_32 )
> which is because val in the question has lower bits cleared.

Because while this matches what the spec says, it's not enough to
match how hardware behaves. Yet you want to mimic hardware behavior
as closely as possible here. There is (iirc) at least one other
place in the source tree were we had to adjust a similar initial
implementation to be closer to one expected by guests, no matter
that they may not be following the spec to the letter. Don't
forget that there may be bugs in kernels which don't surface until
the kernel gets run on an overly simplistic emulation.

>>> @@ -522,6 +582,13 @@ static int add_bar_handlers(struct pci_dev *pdev, bool 
>>> is_hwdom)
>>>               if ( rc )
>>>                   return rc;
>>>           }
>>> +        /*
>>> +         * It is neither safe nor secure to initialize guest's view of the 
>>> BARs
>>> +         * with real values which are used by the hardware domain, so 
>>> assign
>>> +         * all zeros to guest's view of the BARs, so the guest can perform
>>> +         * proper PCI device enumeration and assign BARs on its own.
>>> +         */
>>> +        bars[i].guest_addr = 0;
>> I'm afraid I don't understand the comment: Without memory decoding
>> enabled, the BARs are simple registers (with a few r/o bits).
> 
> My first implementation was that bar->guest_addr was initialized with
> the value of bar->addr (physical BAR value), but talking on IRC with
> Roger he suggested that this might be a security issue to let guest
> a hint about physical values, so then I changed the assignment to be 0.

Well, yes, that's certainly correct. It's perhaps too unobvious to me
why one may want to use the host value here in the first place. It
simply has no meaning here.

>>> --- a/xen/include/xen/pci_regs.h
>>> +++ b/xen/include/xen/pci_regs.h
>>> @@ -103,6 +103,7 @@
>>>   #define  PCI_BASE_ADDRESS_MEM_TYPE_64     0x04    /* 64 bit address */
>>>   #define  PCI_BASE_ADDRESS_MEM_PREFETCH    0x08    /* prefetchable? */
>>>   #define  PCI_BASE_ADDRESS_MEM_MASK        (~0x0fUL)
>>> +#define  PCI_BASE_ADDRESS_MEM_MASK_32      (~0x0fU)
>> Please don't introduce an identical constant that's merely of
>> different type. (uint32_t)PCI_BASE_ADDRESS_MEM_MASK at the use
>> site (if actually still needed as per the comment above) would
>> seem more clear to me.
> Ok, I thought type casting is a bigger evil here

Often it is, but imo here it is not. I hope you realize that ~0x0fU
if not necessarily 0xfffffff0? We make certain assumptions on type
widths. For unsigned int we assume it to be at least 32 bits wide.
We should avoid assumptions of it being exactly 32 bits wide. Just
like we cannot (more obviously) assume the width of unsigned long.
(Which tells us that for 32-bit arches PCI_BASE_ADDRESS_MEM_MASK is
actually of the wrong type. This constant should be the same no
matter the bitness.)

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.