[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 5/9] xen/vpci: add handlers to map the BARs



>>> On 22.06.17 at 19:13, <roger.pau@xxxxxxxxxx> wrote:
> On Fri, May 19, 2017 at 09:21:56AM -0600, Jan Beulich wrote:
>> >>> On 27.04.17 at 16:35, <roger.pau@xxxxxxxxxx> wrote:
>> > +static int vpci_modify_bars(struct pci_dev *pdev, const bool map)
>> > +{
>> > +    struct vpci_header *header = &pdev->vpci->header;
>> > +    unsigned int i;
>> > +    int rc = 0;
>> > +
>> > +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> > +    {
>> > +        paddr_t gaddr = map ? header->bars[i].gaddr
>> > +                            : header->bars[i].mapped_addr;
>> > +        paddr_t paddr = header->bars[i].paddr;
>> > +
>> > +        if ( header->bars[i].type != VPCI_BAR_MEM &&
>> > +             header->bars[i].type != VPCI_BAR_MEM64_LO )
>> > +            continue;
>> > +
>> > +        rc = modify_mmio(pdev->domain, _gfn(PFN_DOWN(gaddr)),
>> > +                         _mfn(PFN_DOWN(paddr)), 
>> > PFN_UP(header->bars[i].size),
>> 
>> The PFN_UP() indicates a problem: For sub-page BARs you can't
>> blindly map/unmap them without taking into consideration other
>> devices sharing the same page.
> 
> I'm not sure I follow, the start address of BARs is always aligned to
> a 4KB boundary, so there's no chance of the same page being used by
> two different BARs at the same time.

I'm not sure where you're taking this from. Modern BIOSes may
aim at doing so, but for one I'm sure I've seen smaller alignment
quite often on older machines, and then my most modern AMD
one has these three devices, for example:

00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 
SATA Controller [AHCI mode] (prog-if 01 [AHCI 1.0])
        Subsystem: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA 
Controller [AHCI mode]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64
        Interrupt: pin A routed to IRQ 22
        Region 0: I/O ports at 2430 [size=8]
        Region 1: I/O ports at 2424 [size=4]
        Region 2: I/O ports at 2428 [size=8]
        Region 3: I/O ports at 2420 [size=4]
        Region 4: I/O ports at 2400 [size=16]
        Region 5: Memory at c8014000 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [60] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [70] SATA HBA v1.0 InCfgSpace
        Kernel driver in use: ahci
        Kernel modules: ahci

00:12.2 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 
USB EHCI Controller (prog-if 20 [EHCI])
        Subsystem: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
EHCI Controller
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64, Cache Line Size: 32 bytes
        Interrupt: pin B routed to IRQ 17
        Region 0: Memory at c8014400 (32-bit, non-prefetchable) [size=256]
        Capabilities: [c0] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0+,D1+,D2+,D3hot+,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
                Bridge: PM- B3+
        Capabilities: [e4] Debug port: BAR=1 offset=00e0
        Kernel driver in use: ehci_hcd
        Kernel modules: ehci-hcd

00:13.2 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 
USB EHCI Controller (prog-if 20 [EHCI])
        Subsystem: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB 
EHCI Controller
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64, Cache Line Size: 32 bytes
        Interrupt: pin B routed to IRQ 19
        Region 0: Memory at c8014800 (32-bit, non-prefetchable) [size=256]
        Capabilities: [c0] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0+,D1+,D2+,D3hot+,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
                Bridge: PM- B3+
        Capabilities: [e4] Debug port: BAR=1 offset=00e0
        Kernel driver in use: ehci_hcd
        Kernel modules: ehci-hcd

> The size is indeed not aligned to 4KB, but I don't see how this can
> cause collisions with other BARs unless the domain is actively trying
> to make the BARs overlap, in which case there's not much Xen can do.

The above is not what Dom0 did, but how the system boots up.
And this "there's not much Xen can do" is what I've been trying
to get at with my comment: A solution is needed here for your
approach to vPCI handling to be viable.

>> > +static int vpci_cmd_write(struct pci_dev *pdev, unsigned int reg,
>> > +                          union vpci_val val, void *data)
>> > +{
>> > +    struct vpci_header *header = data;
>> > +    uint16_t new_cmd, saved_cmd;
>> > +    uint8_t seg = pdev->seg, bus = pdev->bus;
>> > +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
>> > +    int rc;
>> > +
>> > +    new_cmd = val.word;
>> > +    saved_cmd = header->command;
>> > +
>> > +    if ( !((new_cmd ^ saved_cmd) & PCI_COMMAND_MEMORY) )
>> > +        goto out;
>> > +
>> > +    /* Memory space access change. */
>> > +    rc = vpci_modify_bars(pdev, new_cmd & PCI_COMMAND_MEMORY);
>> > +    if ( rc )
>> > +    {
>> > +        dprintk(XENLOG_ERR,
>> > +                "%04x:%02x:%02x.%u:unable to %smap BARs: %d\n",
>> > +                seg, bus, slot, func,
>> > +                new_cmd & PCI_COMMAND_MEMORY ? "" : "un", rc);
>> > +        return rc;
>> 
>> I guess you can guess the question already: What is the bare
>> hardware equivalent of this failure return?
> 
> Yes, this is already fixed since write handlers simply return void.
> The hw equivalent would be to ignore the write AFAICT (ie: memory
> decoding will not be enabled).
> 
> Are you fine with the dprintk or would you also like me to remove
> that? (IMHO it's helpful for debugging).

I think it can stay there for the initial phase. Later (before
declaring PVHv2 fully supported) we may want to re-consider
which of such messages are useful to keep.

>> > +static int vpci_bar_write(struct pci_dev *pdev, unsigned int reg,
>> > +                          union vpci_val val, void *data)
>> > +{
>> > +    struct vpci_bar *bar = data;
>> > +    uint32_t wdata = val.double_word;
>> > +    bool hi = false, unset = false;
>> > +
>> > +    ASSERT(bar->type == VPCI_BAR_MEM || bar->type == VPCI_BAR_MEM64_LO ||
>> > +           bar->type == VPCI_BAR_MEM64_HI);
>> > +
>> > +    if ( wdata == GENMASK(31, 0) )
>> 
>> I'm afraid this again doesn't match real hardware behavior: As the
>> low bits are r/o, writes with them having any value, but all other
>> bits being 1 should have the same effect. I notice that while I had
>> fixed this for the ROM BAR in Linux'es pciback, I should have also
>> fixed this for ordinary ones.
> 
> I've changed this to:
> 
>     switch ( bar->type )
>     {
>     case VPCI_BAR_MEM:
>         size_mask = GENMASK(31, 12);

Relating to the comment further up - where's this 12 coming from?

>         break;
>     case VPCI_BAR_MEM64_LO:
>         size_mask = GENMASK(31, 26);

And this 26?

>         break;
>     case VPCI_BAR_MEM64_HI:
>         size_mask = GENMASK(31, 0);
>         break;
>     default:
>         ASSERT_UNREACHABLE();
>         break;

You want to return here.

>> > +    }
>> > +
>> > +    ASSERT(IS_ALIGNED(bar->gaddr, PAGE_SIZE));
>> 
>> Urgh.
> 
> Removed.

With your comment further up, you should have refused to do so
(i.e. I'm getting the impression you're not really sure about that
supposed 4k alignment).

>> > +        if ( (bars[i].type == VPCI_BAR_MEM && addr == GENMASK(31, 12)) ||
>> > +             addr == GENMASK(63, 26) )
>> 
>> Where is this 26 coming from?
>> 
>> Perhaps
>> 
>>     if ( addr == GENMASK(bars[i].type == VPCI_BAR_MEM ? 31 : 63, 12) )
> 
> I'm checking the memory decode bit here instead in order to figure out
> if the BAR is not positioned.
> 
>> ? Albeit I'm unconvinced GENMASK() is useful to be used here anyway
>> (see also below).
> 
> Right, regardless of the specific usage above, what would you
> recommend regarding the usage of GENMASK?
> 
> Julien suggested introducing GENMASK_ULL. Should I go that route, or
> introduce something locally for vPCI?

Back when GENMASK() was introduced to our code base I've
already indicated that I'm not really in favor of it. I don't think
it really helps readability all that much (to me, plain hex
numbers are easier to grok, albeit I admit ones extending
beyond 8 or 10 digits are less easy to digest; sadly the once
proposed [by Intel, I think, in the early ia64 days] language
extension to permit _ separators in numbers doesn't appear
to have made it anywhere).

>> > +        } bars[6];
>> 
>> What about the ROM and SR-IOV ones?
> 
> I've implemented support for the expansion ROM BAR (which I still need
> to figure out how to test),

There should be hardly any graphics card without a ROM. For
remote boot purposes also most NICs come with a ROM, albeit
many BIOSes allow turning it off. Most SCSI cards I've seem
have a (configuration) ROM too.

> but I would like to defer SR-IOV for later
> because it involves a non-trivial amount of work, and with this series
> one can already boot a PVH Dom0 (minus SR-IOV of course).

That's likely okay as long as there's a suitable, much beloved
"fixme" comment somewhere.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.