WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH V3 08/10] Introduce Xen PCI Passthrough, PCI conf

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH V3 08/10] Introduce Xen PCI Passthrough, PCI config space helpers (2/3)
From: Anthony PERARD <anthony.perard@xxxxxxxxxx>
Date: Fri, 11 Nov 2011 17:40:27 +0000
Cc: Guy Zana <guy@xxxxxxxxxxxx>, Xen Devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Allen Kay <allen.m.kay@xxxxxxxxx>, QEMU-devel <qemu-devel@xxxxxxxxxx>, Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>
Delivery-date: Fri, 11 Nov 2011 09:41:39 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20111110215353.GA23837@xxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <1319814456-8158-1-git-send-email-anthony.perard@xxxxxxxxxx> <1319814456-8158-9-git-send-email-anthony.perard@xxxxxxxxxx> <20111110215353.GA23837@xxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Alpine 2.00 (DEB 1167 2008-08-23)
On Thu, 10 Nov 2011, Konrad Rzeszutek Wilk wrote:

> On Fri, Oct 28, 2011 at 04:07:34PM +0100, Anthony PERARD wrote:
> > From: Allen Kay <allen.m.kay@xxxxxxxxx>
> >
> > Signed-off-by: Allen Kay <allen.m.kay@xxxxxxxxx>
> > Signed-off-by: Guy Zana <guy@xxxxxxxxxxxx>
> > Signed-off-by: Anthony PERARD <anthony.perard@xxxxxxxxxx>
> > ---
> >  Makefile.target                      |    1 +
> >  hw/xen_pci_passthrough.h             |    2 +
> >  hw/xen_pci_passthrough_config_init.c | 2068 
> > ++++++++++++++++++++++++++++++++++
> >  3 files changed, 2071 insertions(+), 0 deletions(-)
> >  create mode 100644 hw/xen_pci_passthrough_config_init.c
> >

[...]

> > +/* A return value of 1 means the capability should NOT be exposed to 
> > guest. */
> > +static int pt_hide_dev_cap(const HostPCIDevice *d, uint8_t grp_id)
> > +{
> > +    switch (grp_id) {
> > +    case PCI_CAP_ID_EXP:
> > +        /* The PCI Express Capability Structure of the VF of Intel 82599 
> > 10GbE
> > +         * Controller looks trivial, e.g., the PCI Express Capabilities
> > +         * Register is 0. We should not try to expose it to guest.
>
> Why not?

Because (an old commit):

passthrough: support the assignment of the VF of Intel 82599 10GbE Controller

The datasheet is available at
http://download.intel.com/design/network/datashts/82599_datasheet.pdf

See 'Table 9.7. VF PCIe Configuration Space' of the datasheet, the PCI
Express Capability Structure of the VF of Intel 82599 10GbE Controller looks
trivial, e.g., the PCI Express Capabilities Register is 0, so the Capability
Version is 0 and pt_pcie_size_init() would fail.

We should not try to expose the PCIe cap of the device to guest.

a patch from Dexuan Cui <dexuan.cui@xxxxxxxxx>

> > +         */
> > +        if (d->vendor_id == PCI_VENDOR_ID_INTEL &&
> > +                d->device_id == PCI_DEVICE_ID_INTEL_82599_VF) {
> > +            return 1;
> > +        }
> > +        break;
> > +    }
> > +    return 0;
> > +}
> > +
> > +/*   find emulate register group entry */
> > +XenPTRegGroup *pt_find_reg_grp(XenPCIPassthroughState *s, uint32_t address)
> > +{
> > +    XenPTRegGroup *entry = NULL;
> > +
> > +    /* find register group entry */
> > +    QLIST_FOREACH(entry, &s->reg_grp_tbl, entries) {
> > +        /* check address */
> > +        if ((entry->base_offset <= address)
> > +            && ((entry->base_offset + entry->size) > address)) {
> > +            return entry;
> > +        }
> > +    }
> > +
> > +    /* group entry not found */
> > +    return NULL;
> > +}
> > +
> > +/* find emulate register entry */
> > +XenPTReg *pt_find_reg(XenPTRegGroup *reg_grp, uint32_t address)
> > +{
> > +    XenPTReg *reg_entry = NULL;
> > +    XenPTRegInfo *reg = NULL;
> > +    uint32_t real_offset = 0;
> > +
> > +    /* find register entry */
> > +    QLIST_FOREACH(reg_entry, &reg_grp->reg_tbl_list, entries) {
> > +        reg = reg_entry->reg;
> > +        real_offset = reg_grp->base_offset + reg->offset;
> > +        /* check address */
> > +        if ((real_offset <= address)
> > +            && ((real_offset + reg->size) > address)) {
> > +            return reg_entry;
> > +        }
> > +    }
> > +
> > +    return NULL;
> > +}
> > +
> > +/* parse BAR */
> > +static PTBarFlag pt_bar_reg_parse(XenPCIPassthroughState *s, XenPTRegInfo 
> > *reg)
> > +{
> > +    PCIDevice *d = &s->dev;
> > +    XenPTRegion *region = NULL;
> > +    PCIIORegion *r;
> > +    int index = 0;
> > +
> > +    /* check 64bit BAR */
> > +    index = pt_bar_offset_to_index(reg->offset);
> > +    if ((0 < index) && (index < PCI_ROM_SLOT)) {
>
> This is  a bit confusing. Can you make the index be on the same
> side, like
>
> if ((0 < index) && (PCI_ROM_SLOT > index)
>
> or better:
>
> if ((index < 0) && (index < PCI_ROM_SLOT))
>
> um, which looks wrong. Should it be 'index > 0' ?

Every other form is a bit confusing to me. I'd like to write
0 < index < ROM_SLOT, so I know that index is between 0 and ROM_SLOT.
But, it's C and not math, so I wrote the closest way I can.

> > +        int flags = s->real_device->io_regions[index - 1].flags;
>
> Do we want to check the index - 1 to make sure it is not negative?

We have:
  0 < index < ROM_SLOT
so (index - 1) give us:
  0 <= index - 1 < ROM_SLOT - 1

So (index - 1) can be 0, but under 0.
;)


[...]

> > +/********************
> > + * Header Type0
> > + */
> > +
> > +static uint32_t pt_vendor_reg_init(XenPCIPassthroughState *s,
> > +                                   XenPTRegInfo *reg, uint32_t real_offset)
> > +{
> > +    return s->real_device->vendor_id;
> > +}
> > +static uint32_t pt_device_reg_init(XenPCIPassthroughState *s,
> > +                                   XenPTRegInfo *reg, uint32_t real_offset)
> > +{
> > +    return s->real_device->device_id;
> > +}
> > +static uint32_t pt_status_reg_init(XenPCIPassthroughState *s,
> > +                                   XenPTRegInfo *reg, uint32_t real_offset)
> > +{
> > +    XenPTRegGroup *reg_grp_entry = NULL;
> > +    XenPTReg *reg_entry = NULL;
> > +    int reg_field = 0;
> > +
> > +    /* find Header register group */
> > +    reg_grp_entry = pt_find_reg_grp(s, PCI_CAPABILITY_LIST);
> > +    if (reg_grp_entry) {
> > +        /* find Capabilities Pointer register */
> > +        reg_entry = pt_find_reg(reg_grp_entry, PCI_CAPABILITY_LIST);
> > +        if (reg_entry) {
> > +            /* check Capabilities Pointer register */
> > +            if (reg_entry->data) {
> > +                reg_field |= PCI_STATUS_CAP_LIST;
> > +            } else {
> > +                reg_field &= ~PCI_STATUS_CAP_LIST;
> > +            }
> > +        } else {
> > +            hw_error("Internal error: Couldn't find pt_reg_tbl for "
> > +                     "Capabilities Pointer register. I/O emulator 
> > exit.\n");
>
> Yikes. abort here? Um, can we just return a fault code instead?

This should probably not happend, I suppose.

> > +        }
> > +    } else {
> > +        hw_error("Internal error: Couldn't find pt_reg_grp_tbl for Header. 
> > "
> > +                 "I/O emulator exit.\n");
> > +    }
> > +
> > +    return reg_field;
> > +}

[...]

> > +static uint32_t pt_bar_reg_init(XenPCIPassthroughState *s, XenPTRegInfo 
> > *reg,
> > +                                uint32_t real_offset)
> > +{
> > +    int reg_field = 0;
> > +    int index;
> > +
> > +    /* get BAR index */
> > +    index = pt_bar_offset_to_index(reg->offset);
> > +    if (index < 0) {
> > +        hw_error("Internal error: Invalid BAR index[%d]. "
> > +                 "I/O emulator exit.\n", index);
> > +    }
> > +
> > +    /* set initial guest physical base address to -1 */
> > +    s->bases[index].e_physbase = -1;
>
> Um, use that define PCI_.. something macro.

:(, e_physbase is uint32, and PCI_BAR_UNMAPPED is uint64.

> > +
> > +    /* set BAR flag */
> > +    s->bases[index].bar_flag = pt_bar_reg_parse(s, reg);
> > +    if (s->bases[index].bar_flag == PT_BAR_FLAG_UNUSED) {
> > +        reg_field = PT_INVALID_REG;
> > +    }
> > +
> > +    return reg_field;
> > +}

[...]

> > +
> > +
> > +/*****************************
> > + * PCI Express Capability
> > + */
> > +
> > +/* initialize Link Control register */
> > +static uint32_t pt_linkctrl_reg_init(XenPCIPassthroughState *s,
> > +                                     XenPTRegInfo *reg, uint32_t 
> > real_offset)
> > +{
> > +    uint8_t cap_ver = 0;
> > +    uint8_t dev_type = 0;
> > +
> > +    /* TODO maybe better to use fonction from hw/pcie.c */
>
> function
> > +    cap_ver = pci_get_byte(s->dev.config + real_offset - reg->offset
> > +                           + PCI_EXP_FLAGS)
> > +        & PCI_EXP_FLAGS_VERS;
> > +    dev_type = (pci_get_byte(s->dev.config + real_offset - reg->offset
> > +                             + PCI_EXP_FLAGS)
> > +                & PCI_EXP_FLAGS_TYPE) >> 4;
> > +
> > +    /* no need to initialize in case of Root Complex Integrated Endpoint
> > +     * with cap_ver 1.x
>
> Why?

Who knows? I don't. And `git log` does not give me more information.

> > +     */
> > +    if ((dev_type == PCI_EXP_TYPE_RC_END) && (cap_ver == 1)) {
> > +        return PT_INVALID_REG;
> > +    }
> > +
> > +    return reg->init_val;
> > +}
> > +/* initialize Device Control 2 register */
> > +static uint32_t pt_devctrl2_reg_init(XenPCIPassthroughState *s,
> > +                                     XenPTRegInfo *reg, uint32_t 
> > real_offset)
> > +{
> > +    uint8_t cap_ver = 0;
> > +
> > +    cap_ver = pci_get_byte(s->dev.config + real_offset - reg->offset
> > +                           + PCI_EXP_FLAGS)
> > +        & PCI_EXP_FLAGS_VERS;
> > +
> > +    /* no need to initialize in case of cap_ver 1.x */
> > +    if (cap_ver == 1) {
> > +        return PT_INVALID_REG;
> > +    }
> > +
> > +    return reg->init_val;
> > +}
> > +/* initialize Link Control 2 register */
> > +static uint32_t pt_linkctrl2_reg_init(XenPCIPassthroughState *s,
> > +                                      XenPTRegInfo *reg, uint32_t 
> > real_offset)
> > +{
> > +    int reg_field = 0;
> > +    uint8_t cap_ver = 0;
> > +
> > +    cap_ver = pci_get_byte(s->dev.config + real_offset - reg->offset
> > +                           + PCI_EXP_FLAGS)
> > +        & PCI_EXP_FLAGS_VERS;
>
> This looks like a weird tab issue, but it might be just my mailer.

Nop, there is no tab.

Maybe writing it like that:
> cap_ver = pci_get_byte(s->dev.config + real_offset - reg->offset
>                        + PCI_EXP_FLAGS);
> cap_ver &= PCI_EXP_FLAGS_VERS;
would be better.

> > +
> > +    /* no need to initialize in case of cap_ver 1.x */
> > +    if (cap_ver == 1) {
> > +        return PT_INVALID_REG;
> > +    }
> > +
> > +    /* set Supported Link Speed */
> > +    reg_field |= PCI_EXP_LNKCAP_SLS &
> > +        pci_get_byte(s->dev.config + real_offset - reg->offset
> > +                     + PCI_EXP_LNKCAP);
> > +
> > +    return reg_field;
> > +}
> > +

[...]

> > +/*********************************
> > + * Power Management Capability
> > + */
> > +
> > +/* initialize Power Management Capabilities register */
> > +static uint32_t pt_pmc_reg_init(XenPCIPassthroughState *s,
> > +                                XenPTRegInfo *reg, uint32_t real_offset)
> > +{
> > +    PCIDevice *d = &s->dev;
> > +
> > +    if (!s->power_mgmt) {
> > +        return reg->init_val;
> > +    }
> > +
> > +    /* set Power Management Capabilities register */
> > +    s->pm_state->pmc_field = pci_get_word(d->config + real_offset);
> > +
> > +    return reg->init_val;
> > +}
> > +/* initialize PCI Power Management Control/Status register */
> > +static uint32_t pt_pmcsr_reg_init(XenPCIPassthroughState *s,
> > +                                  XenPTRegInfo *reg, uint32_t real_offset)
> > +{
> > +    PCIDevice *d = &s->dev;
> > +    uint16_t cap_ver  = 0;
> > +
> > +    if (!s->power_mgmt) {
> > +        return reg->init_val;
> > +    }
> > +
> > +    /* check PCI Power Management support version */
> > +    cap_ver = s->pm_state->pmc_field & PCI_PM_CAP_VER_MASK;
> > +
> > +    if (cap_ver > 2) {
> > +        /* set No Soft Reset */
> > +        s->pm_state->no_soft_reset =
> > +            pci_get_byte(d->config + real_offset) & 
> > PCI_PM_CTRL_NO_SOFT_RESET;
> > +    }
> > +
> > +    /* wake up real physical device */
> > +    switch (host_pci_get_word(s->real_device, real_offset)
> > +            & PCI_PM_CTRL_STATE_MASK) {
> > +    case 0:
> > +        break;
> > +    case 1:
> > +        PT_LOG("Power state transition D1 -> D0active\n");
> > +        host_pci_set_word(s->real_device, real_offset, 0);
> > +        break;
> > +    case 2:
> > +        PT_LOG("Power state transition D2 -> D0active\n");
> > +        host_pci_set_word(s->real_device, real_offset, 0);
> > +        usleep(200);
>
> Heheh..

I don't know if I can remove it safely, or not.

> > +        break;
> > +    case 3:
> > +        PT_LOG("Power state transition D3hot -> D0active\n");
> > +        host_pci_set_word(s->real_device, real_offset, 0);
> > +        usleep(10 * 1000);

Same for this one.

> > +        pt_init_pci_config(s);
> > +        break;
> > +    }
> > +
> > +    return reg->init_val;
> > +}

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel