[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 4/4] xen: Fix map/unmap of pcidev to ioreq server



On 06/09/15 10:03, Paul Durrant wrote:
>> -----Original Message-----
>> From: Don Slutz [mailto:dslutz@xxxxxxxxxxx]
>> Sent: 09 June 2015 14:51
>> To: Paul Durrant; Slutz, Donald Christopher; qemu-devel@xxxxxxxxxx; xen-
>> devel@xxxxxxxxxxxxx
>> Cc: Michael S. Tsirkin; Stefano Stabellini; Don Slutz
>> Subject: Re: [PATCH v2 4/4] xen: Fix map/unmap of pcidev to ioreq server
>>
>> On 06/09/15 05:05, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Don Slutz [mailto:dslutz@xxxxxxxxxxx]
>>>> Sent: 08 June 2015 22:19
>>>> To: qemu-devel@xxxxxxxxxx; xen-devel@xxxxxxxxxxxxx
>>>> Cc: Michael S. Tsirkin; Paul Durrant; Stefano Stabellini; Don Slutz; Don 
>>>> Slutz
>>>> Subject: [PATCH v2 4/4] xen: Fix map/unmap of pcidev to ioreq server
>>>>
>>>> The commit 707ff80021ccd7a68f4b3d2c44eebf87efbb41c4 added usage of
>>>> xc_hvm_map_pcidev_from_ioreq_server() and
>>>> xc_hvm_unmap_pcidev_from_ioreq_server().  These routines only
>>>> correctly work if the PCI BDF of a PCI device is unique.  The 3
>>>> parts (bus, device, and function) of a PCI BDF are not required to
>>>> be unique.
>>>>
>>>> The PCI BDF is accessed in QEMU:
>>>>   bus        pci_bus_num()
>>>>   device     PCI_SLOT()
>>>>   function   PCI_FUNC()
>>>>
>>>> Add a hash table to track the current set of PCI BDFs and only call
>>>> on xc_hvm_map_pcidev_from_ioreq_server() and
>>>> xc_hvm_unmap_pcidev_from_ioreq_server() when needed.
>>>>
>>>
>>> This seems to imply that the devices are being realized multiple times
>> without being unrealized?
>>
>> Not directly.  The devices are not being "realized" in QEMU terms.  The
>> PCI to PCI brige is changing state, and the BDF for all PCI devices on
>> the secondary bus changes when the PCI config named
>> PCI_SECONDARY_BUS on
>> the PCI to PCI bridge is changed.  This is what patch #2 is all about.
>>
>>> Is that really the case?
>>>
>>
>> That is what it looks like to Xen.  But the device listener callbacks
>> are not called.
> 
> Then what's calling xc_hvm_map_pcidev_from_ioreq_server()?



Currently (without these patches) the calls on
xc_hvm_map_pcidev_from_ioreq_server() are the ones you added.  These are
from QEMU's device realize.  However since on PCI reset (aka start up)
the seconary bus number is zero.  So the call on
pci_bus_num(pci_dev->bus) will return zero in that case.  So using the
subset trace output (copied from below) and adjusted to be the old way:

xen_map_pcidev id: 1 bdf: 00.01.00
xen_map_pcidev id: 1 bdf: 00.01.00
xen_map_pcidev id: 1 bdf: 00.01.00

Is part of what you see.  It may help to have the Xen config that matters:

device_model_args_hvm = [
 "-monitor",
 "pty",
 "-boot",
 "menu=on",
 "-device",
 "pci-bridge,chassis_nr=1,msi=on,id=pciBridge1.0,addr=0x11.0",
 "-device",
 "pci-bridge,chassis_nr=2,msi=on,id=pciBridge5.0,multifunction=on,addr=0x15.0",
 "-device",
 "pci-bridge,chassis_nr=3,msi=on,id=pciBridge6.0,multifunction=on,addr=0x16.0",
 "-device",
 
"pci-bridge,chassis_nr=4,msi=on,id=pciBridge7.0,multifunction=on,bus=pciBridge1.0,addr=0x17.0",
 "-device",
 "pci-bridge,chassis_nr=5,msi=on,id=pciBridge8.0,multifunction=on,addr=0x18.0",
 "-device",
 "pvscsi,id=scsi0,bus=pciBridge5.0,addr=0x1.0x0",
 "-drive",
 "if=none,id=disk0-0,format=raw,file=/dev/etherd/e500.1",
 "-device",
 "scsi-disk,vendor=VMware,ver=1.0,product=Virtual
disk,bus=scsi0.0,scsi-id=0,drive=disk0-0",
 "-device",
 "pvscsi,id=sas1,bus=pciBridge7.0,addr=0x1.0x0",
 "-drive",
 "if=none,id=disk1-1,format=raw,file=/dev/etherd/e500.2",
 "-device",
 "scsi-disk,vendor=VMware,ver=1.0,product=Virtual
disk,bus=sas1.0,scsi-id=1,drive=disk1-1",
 "-device",
 "pvscsi,id=scsi2,bus=pciBridge1.0,addr=0x3.0x0",
 "-drive",
 "if=none,id=disk2-2,format=raw,file=/dev/etherd/e500.3",
 "-device",
 "scsi-disk,vendor=VMware,ver=1.0,product=Virtual
disk,bus=scsi2.0,scsi-id=2,drive=disk2-2",
 "-device",
 
"e1000,id=nic2,netdev=net2,mac=00:0c:29:86:44:b4,bus=pciBridge5.0,addr=0x3.0x0",
 "-netdev",
 "type=tap,id=net2,ifname=vif.2-emu,script=/etc/qemu-ifup,downscript=no",
 "-device",
 
"vmxnet3,id=nic3,netdev=net3,mac=00:0c:29:86:44:be,bus=pciBridge5.0,addr=0x4.0x0",
 "-netdev",
 "type=tap,id=net3,ifname=vif.3-emu,script=/etc/qemu-ifup,downscript=no",
]


If you want to try this, you would also need a bootable disk listed in
disk=

> I don't understand why you are having to introduce this to avoid
> duplicate calls?
>

That is because in general a PCI device's BDF is not fixed.  The only
fixed ones are the ones on the root (host bus).  It might help to look
at the code for pci_bus_num():

int pci_bus_num(PCIBus *s)
{
    if (pci_bus_is_root(s))
        return 0;       /* pci host bridge */
    return s->parent_dev->config[PCI_SECONDARY_BUS];
}

So the bus number this routine returns depends on what the guest has set
in the PCI config space of the PCI to PCI bridge.

The starting (and reset) value is 0.  So for the config above at QEMU
device realized time there are 3 PCI devices (PIIX4, 2 pvscsi disk
controllers).

   -Don Slutz

>   Paul
>

>>
>>    -Don Slutz
>>
>>>   Paul
>>>
>>>> Also fix the PCI BDF default stderr trace output: bus is seperated
>>>> by ':', and function is only 1 digit.
>>>>
>>>> Here is an example of stderr trace output:
>>>>
>>>> xen_map_pcidev id: 1 bdf: 00:00.0
>>>> xen_map_pcidev id: 1 bdf: 00:01.0
>>>> xen_map_pcidev id: 1 bdf: 00:01.1
>>>> xen_map_pcidev id: 1 bdf: 00:01.3
>>>> xen_map_pcidev id: 1 bdf: 00:02.0
>>>> xen_map_pcidev id: 1 bdf: 00:03.0
>>>> xen_map_pcidev id: 1 bdf: 00:04.0
>>>> xen_map_pcidev id: 1 bdf: 00:05.0
>>>> xen_map_pcidev id: 1 bdf: 00:11.0
>>>> xen_map_pcidev id: 1 bdf: 00:15.0
>>>> xen_map_pcidev id: 1 bdf: 00:16.0
>>>> xen_map_pcidev id: 1 bdf: 00:17.0
>>>> xen_map_pcidev id: 1 bdf: 00:18.0
>>>> xen_map_pcidev_dup id: 1 bdf: 00:01.0 dup: 2
>>>> xen_map_pcidev_dup id: 1 bdf: 00:01.0 dup: 3
>>>> xen_map_pcidev_dup id: 1 bdf: 00:03.0 dup: 2
>>>> xen_map_pcidev_dup id: 1 bdf: 00:03.0 dup: 3
>>>> xen_map_pcidev_dup id: 1 bdf: 00:04.0 dup: 2
>>>> xen_unmap_pcidev_dup id: 1 bdf: 00:03.0 dup: 2
>>>> xen_map_pcidev id: 1 bdf: ff:03.0
>>>> xen_unmap_pcidev id: 1 bdf: 00:17.0
>>>> xen_map_pcidev id: 1 bdf: ff:17.0
>>>> xen_unmap_pcidev_dup id: 1 bdf: 00:01.0 dup: 2
>>>> xen_map_pcidev id: 1 bdf: ff:01.0
>>>> xen_unmap_pcidev_dup id: 1 bdf: 00:03.0 dup: 1
>>>> xen_map_pcidev_dup id: 1 bdf: ff:03.0 dup: 2
>>>> xen_unmap_pcidev_dup id: 1 bdf: 00:04.0 dup: 1
>>>> xen_map_pcidev id: 1 bdf: ff:04.0
>>>> xen_unmap_pcidev_dup id: 1 bdf: ff:03.0 dup: 1
>>>> xen_map_pcidev id: 1 bdf: 01:03.0
>>>> xen_unmap_pcidev id: 1 bdf: ff:17.0
>>>> xen_map_pcidev id: 1 bdf: 01:17.0
>>>> xen_unmap_pcidev_dup id: 1 bdf: 00:01.0 dup: 1
>>>> xen_map_pcidev_dup id: 1 bdf: ff:01.0 dup: 2
>>>> xen_unmap_pcidev_dup id: 1 bdf: ff:01.0 dup: 1
>>>> xen_map_pcidev id: 1 bdf: 02:01.0
>>>> xen_unmap_pcidev id: 1 bdf: ff:01.0
>>>> xen_map_pcidev id: 1 bdf: 03:01.0
>>>> xen_unmap_pcidev id: 1 bdf: ff:03.0
>>>> xen_map_pcidev id: 1 bdf: 03:03.0
>>>> xen_unmap_pcidev id: 1 bdf: ff:04.0
>>>> xen_map_pcidev id: 1 bdf: 03:04.0
>>>> xen_unmap_pcidev id: 1 bdf: 00:04.0
>>>> xen_unmap_pcidev id: 1 bdf: 00:05.0
>>>> xen_unmap_pcidev id: 1 bdf: 01:03.0
>>>> xen_map_pcidev_dup id: 1 bdf: 00:03.0 dup: 2
>>>> xen_unmap_pcidev id: 1 bdf: 01:17.0
>>>> xen_map_pcidev id: 1 bdf: 00:17.0
>>>> xen_unmap_pcidev id: 1 bdf: 03:01.0
>>>> xen_map_pcidev_dup id: 1 bdf: 00:01.0 dup: 2
>>>> xen_unmap_pcidev id: 1 bdf: 03:03.0
>>>> xen_map_pcidev_dup id: 1 bdf: 00:03.0 dup: 3
>>>> xen_unmap_pcidev id: 1 bdf: 03:04.0
>>>> xen_map_pcidev id: 1 bdf: 00:04.0
>>>> xen_unmap_pcidev_dup id: 1 bdf: 00:03.0 dup: 2
>>>> xen_map_pcidev id: 1 bdf: 01:03.0
>>>> xen_unmap_pcidev id: 1 bdf: 00:17.0
>>>> xen_map_pcidev id: 1 bdf: 01:17.0
>>>> xen_unmap_pcidev id: 1 bdf: 02:01.0
>>>> xen_map_pcidev_dup id: 1 bdf: 00:01.0 dup: 3
>>>> xen_unmap_pcidev_dup id: 1 bdf: 00:01.0 dup: 2
>>>> xen_map_pcidev id: 1 bdf: 02:01.0
>>>> xen_unmap_pcidev_dup id: 1 bdf: 00:01.0 dup: 1
>>>> xen_map_pcidev id: 1 bdf: 03:01.0
>>>> xen_unmap_pcidev_dup id: 1 bdf: 00:03.0 dup: 1
>>>> xen_map_pcidev id: 1 bdf: 03:03.0
>>>> xen_unmap_pcidev id: 1 bdf: 00:04.0
>>>> xen_map_pcidev id: 1 bdf: 03:04.0
>>>>
>>>> Signed-off-by: Don Slutz <dslutz@xxxxxxxxxxx>
>>>> CC: Don Slutz <don.slutz@xxxxxxxxx>
>>>> ---
>>>>  include/hw/xen/xen_common.h | 53
>>>> +++++++++++++++++++++++++++++++++++----------
>>>>  trace-events                |  6 +++--
>>>>  xen-hvm.c                   | 15 ++++++++-----
>>>>  3 files changed, 55 insertions(+), 19 deletions(-)
>>>>
>>>> diff --git a/include/hw/xen/xen_common.h
>>>> b/include/hw/xen/xen_common.h
>>>> index 6579b78..260ee58 100644
>>>> --- a/include/hw/xen/xen_common.h
>>>> +++ b/include/hw/xen/xen_common.h
>>>> @@ -223,12 +223,14 @@ static inline void xen_unmap_io_section(XenXC
>> xc,
>>>> domid_t dom,
>>>>
>>>>  static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>>>>                                    ioservid_t ioservid,
>>>> +                                  GHashTable *pci_bdf,
>>>>                                    PCIDevice *pci_dev)
>>>>  {
>>>>  }
>>>>
>>>>  static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>>>>                                      ioservid_t ioservid,
>>>> +                                    GHashTable *pci_bdf,
>>>>                                      PCIDevice *pci_dev,
>>>>                                      uint8_t oldbus)
>>>>  {
>>>> @@ -346,27 +348,54 @@ static inline void xen_unmap_io_section(XenXC
>> xc,
>>>> domid_t dom,
>>>>
>>>>  static inline void xen_map_pcidev(XenXC xc, domid_t dom,
>>>>                                    ioservid_t ioservid,
>>>> +                                  GHashTable *realizedpci_bdf,
>>>>                                    PCIDevice *pci_dev)
>>>>  {
>>>> -    trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>>>> -                         PCI_SLOT(pci_dev->devfn), 
>>>> PCI_FUNC(pci_dev->devfn));
>>>> -    xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid,
>>>> -                                      0, pci_bus_num(pci_dev->bus),
>>>> -                                      PCI_SLOT(pci_dev->devfn),
>>>> -                                      PCI_FUNC(pci_dev->devfn));
>>>> +    gpointer bdf_key = GINT_TO_POINTER((pci_bus_num(pci_dev->bus)
>> <<
>>>> 8) |
>>>> +                                       pci_dev->devfn);
>>>> +    gint bdf_cnt = GPOINTER_TO_INT(g_hash_table_lookup(pci_bdf,
>>>> bdf_key));
>>>> +
>>>> +    if (!bdf_cnt) {
>>>> +        trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus),
>>>> +                             PCI_SLOT(pci_dev->devfn),
>>>> +                             PCI_FUNC(pci_dev->devfn));
>>>> +        g_hash_table_insert(pci_bdf, bdf_key, GINT_TO_POINTER(1));
>>>> +        xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid,
>>>> +                                          0, pci_bus_num(pci_dev->bus),
>>>> +                                          PCI_SLOT(pci_dev->devfn),
>>>> +                                          PCI_FUNC(pci_dev->devfn));
>>>> +    } else {
>>>> +        trace_xen_map_pcidev_dup(ioservid, pci_bus_num(pci_dev->bus),
>>>> +                                 PCI_SLOT(pci_dev->devfn),
>>>> +                                 PCI_FUNC(pci_dev->devfn),
>>>> +                                 bdf_cnt + 1);
>>>> +        g_hash_table_replace(pci_bdf, bdf_key,
>> GINT_TO_POINTER(bdf_cnt +
>>>> 1));
>>>> +    }
>>>>  }
>>>>
>>>>  static inline void xen_unmap_pcidev(XenXC xc, domid_t dom,
>>>>                                      ioservid_t ioservid,
>>>> +                                    GHashTable *pci_bdf,
>>>>                                      PCIDevice *pci_dev,
>>>>                                      uint8_t oldbus)
>>>>  {
>>>> -    trace_xen_unmap_pcidev(ioservid, oldbus,
>>>> -                           PCI_SLOT(pci_dev->devfn), 
>>>> PCI_FUNC(pci_dev->devfn));
>>>> -    xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid,
>>>> -                                          0, oldbus,
>>>> -                                          PCI_SLOT(pci_dev->devfn),
>>>> -                                          PCI_FUNC(pci_dev->devfn));
>>>> +    gpointer bdf_key = GINT_TO_POINTER((oldbus << 8) | pci_dev-
>>> devfn);
>>>> +    gint bdf_cnt = GPOINTER_TO_INT(g_hash_table_lookup(pci_bdf,
>>>> bdf_key));
>>>> +
>>>> +    if (bdf_cnt == 1) {
>>>> +        trace_xen_unmap_pcidev(ioservid, oldbus,
>>>> +                               PCI_SLOT(pci_dev->devfn),
>>>> +                               PCI_FUNC(pci_dev->devfn));
>>>> +        g_hash_table_remove(pci_bdf, bdf_key);
>>>> +        xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid,
>>>> +                                              0, oldbus,
>>>> +                                              PCI_SLOT(pci_dev->devfn),
>>>> +                                              PCI_FUNC(pci_dev->devfn));
>>>> +    } else {
>>>> +        trace_xen_unmap_pcidev_dup(ioservid, oldbus, PCI_SLOT(pci_dev-
>>>>> devfn),
>>>> +                                   PCI_FUNC(pci_dev->devfn), bdf_cnt - 1);
>>>> +        g_hash_table_replace(pci_bdf, bdf_key,
>> GINT_TO_POINTER(bdf_cnt -
>>>> 1));
>>>> +    }
>>>>  }
>>>>
>>>>  static inline int xen_create_ioreq_server(XenXC xc, domid_t dom,
>>>> diff --git a/trace-events b/trace-events
>>>> index a589650..58deeaf 100644
>>>> --- a/trace-events
>>>> +++ b/trace-events
>>>> @@ -930,8 +930,10 @@ xen_map_mmio_range(uint32_t id, uint64_t
>>>> start_addr, uint64_t end_addr) "id: %u
>>>>  xen_unmap_mmio_range(uint32_t id, uint64_t start_addr, uint64_t
>>>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>  xen_map_portio_range(uint32_t id, uint64_t start_addr, uint64_t
>> end_addr)
>>>> "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>>  xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t
>>>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64
>>>> -xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id:
>> %u
>>>> bdf: %02x.%02x.%02x"
>>>> -xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func)
>> "id:
>>>> %u bdf: %02x.%02x.%02x"
>>>> +xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) "id:
>> %u
>>>> bdf: %02x:%02x.%x"
>>>> +xen_map_pcidev_dup(uint32_t id, uint8_t bus, uint8_t dev, uint8_t
>> func,
>>>> int dup) "id: %u bdf: %02x:%02x.%x dup: %d"
>>>> +xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func)
>> "id:
>>>> %u bdf: %02x:%02x.%x"
>>>> +xen_unmap_pcidev_dup(uint32_t id, uint8_t bus, uint8_t dev, uint8_t
>> func,
>>>> int dup) "id: %u bdf: %02x:%02x.%x dup: %d"
>>>>
>>>>  # xen-mapcache.c
>>>>  xen_map_cache(uint64_t phys_addr) "want %#"PRIx64
>>>> diff --git a/xen-hvm.c b/xen-hvm.c
>>>> index 7b6d8f1..f041909 100644
>>>> --- a/xen-hvm.c
>>>> +++ b/xen-hvm.c
>>>> @@ -123,6 +123,7 @@ typedef struct XenIOState {
>>>>      MemoryListener memory_listener;
>>>>      MemoryListener io_listener;
>>>>      DeviceListener device_listener;
>>>> +    GHashTable *pci_bdf;
>>>>      QLIST_HEAD(, XenPhysmap) physmap;
>>>>      hwaddr free_phys_offset;
>>>>      const XenPhysmap *log_for_dirtybit;
>>>> @@ -578,7 +579,8 @@ static void xen_device_realize(DeviceListener
>>>> *listener,
>>>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>>>>          PCIDevice *pci_dev = PCI_DEVICE(dev);
>>>>
>>>> -        xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>>>> +        xen_map_pcidev(xen_xc, xen_domid, state->ioservid, state-
>>> pci_bdf,
>>>> +                       pci_dev);
>>>>      }
>>>>  }
>>>>
>>>> @@ -590,8 +592,8 @@ static void xen_device_unrealize(DeviceListener
>>>> *listener,
>>>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>>>>          PCIDevice *pci_dev = PCI_DEVICE(dev);
>>>>
>>>> -        xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev,
>>>> -                         pci_bus_num(pci_dev->bus));
>>>> +        xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, state-
>>>>> pci_bdf,
>>>> +                         pci_dev, pci_bus_num(pci_dev->bus));
>>>>      }
>>>>  }
>>>>
>>>> @@ -600,8 +602,10 @@ static void
>>>> xen_device_change_pci_bus_num(DeviceListener *listener,
>>>>  {
>>>>      XenIOState *state = container_of(listener, XenIOState,
>> device_listener);
>>>>
>>>> -    xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev,
>>>> oldbus);
>>>> -    xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev);
>>>> +    xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, state-
>>> pci_bdf,
>>>> +                     pci_dev, oldbus);
>>>> +    xen_map_pcidev(xen_xc, xen_domid, state->ioservid, state-
>>> pci_bdf,
>>>> +                   pci_dev);
>>>>  }
>>>>
>>>>  static void xen_sync_dirty_bitmap(XenIOState *state,
>>>> @@ -1318,6 +1322,7 @@ int xen_hvm_init(ram_addr_t
>>>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size,
>>>>      memory_listener_register(&state->io_listener, &address_space_io);
>>>>
>>>>      state->device_listener = xen_device_listener;
>>>> +    state->pci_bdf = g_hash_table_new(NULL, NULL);
>>>>      device_listener_register(&state->device_listener);
>>>>
>>>>      /* Initialize backend core & drivers */
>>>> --
>>>> 1.8.4
>>>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.