[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [rfc 00/18] ioemu: use devfn instead of slots as the unit for passthrough



On Mon, Mar 02, 2009 at 10:08:29AM +0000, Keir Fraser wrote:
> On 02/03/2009 09:53, "Simon Horman" <horms@xxxxxxxxxxxx> wrote:
> 
> >> Not really what I had in mind. Xend can do the GSI->slot mapping, to ensure
> >> non-conflicting GSIs. I don't think any hypervisor changes are required, 
> >> let
> >> alone substantial ones.
> > 
> > Is the idea that xend would allocate a gsi to a device and
> > then pass that gsi along as part of the device configuration
> > to the device model?
> > 
> > If so, I think something similar to what I wrote, but moved
> > into xend could work quite well. But I sense that wasn't what
> > you had in mind either.
> 
> I mean that xend can pick a virtual devfn for the device that it knows has a
> non-conflicting GSI. This avoids any need for dynamic mapping between devfn
> and GSI (which would be more of a pain in the neck -- for example, your
> patch doesn't work because certain parts of BIOS info tables need to be
> dynamically generated, as currently they hardcode the devfn-GSI
> relationship).

Hi Keir,

I tried coding up this allocation idea in python with a view to
plugging it into xend. A description of that code and the code itself
is below. However, I think that I must still be misunderstanding what
you have in mind.

In order to allocate devfn for pass-through devices in xend it is
necessary to know about all devfn that are in use (or going to be used)
by qemu-dm. This includes ioemu devices (unless you don't care about
them sharing GSI with pass-through devices). But as far as I can see
xend doesn't know anything about some ioemu devices, beyond say
requesting a nic.

Adding some simple debugging code to pci_register_device(), I see
that it allocates the following devfn on boot.

pci_register_device: name=i440FX devfn=0
pci_register_device: name=PIIX3 devfn=16
pci_register_device: name=Cirrus VGA devfn=8
pci_register_device: name=xen-platform devfn=24
pci_register_device: name=RTL8139 devfn=32
pci_register_device: name=PIIX3 IDE devfn=17
pci_register_device: name=PIIX4 ACPI devfn=19

I am guessing that your idea was not for xend to allocate all
of those devfn and pass them to qemu-dm on the command line.

My instinct is that it would really be easier to allocate devn -
using something like the algorithm I describe below - in qemu-dm
rather than xend.


Code decription
---------------

It tries to handle alocation
of (single-function) ioemu, single-function pass-through and
multi-function pass-through devices.

It tries to allocate devices such that there are as few
GSI clashes as possible (no classhes when there are few devices).
After that is allos ioemu devices to share a GSI with other devices.
But it doesn't allow two pass-through devices to share a GSI.

It handles multi-function by reserving 4 GSI when function 0 of
a multi-function device is assigned.

Unassignment is supported only for pass-through devices
as ioemu devices don't support hot-plug.

The code I have so far is below.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en


#============================================================================
# This library is free software; you can redistribute it and/or
# modify it under the terms of version 2.1 of the GNU Lesser General Public
# License as published by the Free Software Foundation.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#============================================================================
# Copyright (C) 2009 Simon Horman <horms@xxxxxxxxxxxx>
#============================================================================

#from xen.xend.XendError import VmError
#from xen.xend.XendLogging import log

import logging as log
log.basicConfig(level=log.DEBUG)


NR_PCI_DEV = 32
NR_PCI_INTX = 4
NR_PCI_DEVINTX = NR_PCI_DEV * NR_PCI_INTX

VIOAPIC_NUM_PINS = 48
NR_ISAIRQS = 16
NR_PCIGSI = VIOAPIC_NUM_PINS - NR_ISAIRQS

# No flags: IOEMU device
# DEVITNX_F_PT only: Passed-through as a single-function device
# DEVITNX_F_PT and DEVITNX_F_MULTI: Passed-through as a multi-function device
# DEVITNX_F_MULTI only: Place holder.
#                       Function 0 of a multi-function device
#                       has been passed through as a multi-function device
#                       and the subsequent 3 gsi are reserved for
#                       use by functions of the same physical device.
#                       If other functions are assigned they
#                       can use this place-holder if it hasn't
#                       already been used by another function of
#                       the same device.

DEVITNX_F_PT    = 0x1
DEVITNX_F_MULTI = 0x2
DEVITNX_F_HP    = 0x4

class gsi:
    """@assign devfn as per available gsi"""

    def __init__(self):
        self.pcigsi_devintx = []
        for pcigsi in range(NR_PCIGSI):
            self.pcigsi_devintx += [ [] ]

    def entry_to_phys(self, entry):
        return entry[0]

    def entry_to_virt(self, entry):
        return entry[1]

    def entry_to_intx(self, entry):
        return entry[2]

    def entry_to_flag(self, entry):
        return entry[3]

    def entry_update(self, entry, phys, v_devfn, intx, flag):
        entry[0] = phys
        entry[1] = v_devfn
        entry[2] = intx
        entry[3] = flag
        return v_devfn

    def entry(self, phys, virt, intx, flag):
        return [phys, virt, intx, flag]

    def devfn_to_dev(self, devfn):
        return devfn >> 3

    def devfn_to_fn(self, devfn):
        return devfn & 0x7

    def devfn(self, dev, fn):
        return (dev << 3) | fn

    def pcigsi_intx_to_dev(self, pcigsi, intx):
        # 
http://lists.xensource.com/archives/html/xen-devel/2009-02/pngmIO1Sm0VEX.png
        row = (pcigsi + NR_PCI_INTX - intx) % NR_PCI_INTX
        column = ((pcigsi + NR_PCIGSI - intx) % NR_PCIGSI)/NR_PCI_INTX
        return column + (8 * row)

    def dev_intx_to_pcigsi(self, dev, intx):
        return (((dev)<<2) + ((dev)>>3) + (intx)) & (NR_PCIGSI - 1)

    def pcigsi_wrap(self, pcigsi):
        return (pcigsi) % NR_PCIGSI

    def density(self, pcigsi, width):
        density = 0
        for i in range(width):
            idx = self.pcigsi_wrap(pcigsi + i)
            density += len(self.pcigsi_devintx[idx])
        return density

    def collision(self, pcigsi, width, phys_devfn, flag):
        # ioemu devices don't care about collisions
        if flag & DEVITNX_F_PT|DEVITNX_F_MULTI == 0:
            return False
        phys_dev = self.devfn_to_dev(phys_devfn)
        for i in range(width):
            idx = self.pcigsi_wrap(pcigsi + i)
            for j in range(len(self.pcigsi_devintx[idx])):
                entry = self.pcigsi_devintx[idx][j]
                e_flag = self.entry_to_flag(entry)
                e_dev = self.devfn_to_dev(self.entry_to_phys(entry))
                if e_flag & DEVITNX_F_PT or                             \
                   (not flag & DEVITNX_F_MULTI and e_flag & DEVITNX_F_MULTI):
                    return True
        return False

    def find_empty(self, intx, width, phys_devfn, flag):
        min_density = NR_PCI_DEVINTX
        min_pcigsi = 0
        # Don't do this, GSIs need to be used densely to maximise the
        # posibility of space being available for multi-function devices,
        # which need 4 contigous GSIs
        # # Loop over devices and then convert them to pcigsi rather than
        # # looping over pcigsi directly to prefer lower numbered and
        # # contigious virtual device assignment
        # for v_devfn in range(NR_PCI_DEV):
        #     pcigsi = self.dev_intx_to_pcigsi(v_devfn, intx)
        for pcigsi in range(NR_PCIGSI):
            if self.collision(pcigsi, width, phys_devfn, flag):
                continue
            density = self.density(pcigsi, width)
            if min_density > density:
                min_density = density
                min_pcigsi = pcigsi
        if min_density == NR_PCI_DEVINTX:
            return -1
        return min_pcigsi

    def assign(self, phys, intx, pcigsi, v_fn, flag):
        v_dev = self.pcigsi_intx_to_dev(pcigsi, intx)
        v_devfn = self.devfn(v_dev, v_fn)
        entry = self.entry(phys, v_devfn, intx, flag)
        self.pcigsi_devintx[pcigsi].append(entry)
        return v_devfn

    def pv_multi_fn0_pcigsi(self, phys_dev):
        for pcigsi in range(NR_PCIGSI):
            for i in range(len(self.pcigsi_devintx[pcigsi])):
                entry = self.pcigsi_devintx[pcigsi][i]
                flag = self.entry_to_flag(entry)
                devfn = self.entry_to_phys(entry)
                dev = self.devfn_to_dev(devfn)
                fn = self.devfn_to_fn(devfn)
                if flag & DEVITNX_F_PT and flag & DEVITNX_F_MULTI and  \
                   dev == phys_dev and fn == 0:
                    return pcigsi
        return -1

    def assign_pt_multi_fn0(self, phys_dev):
        # Function 0 of a multi-function device reserves 4 gsi
        # As the device may use INTB, C or D as well as INTA,
        # which is used by function 0
        phys_devfn = self.devfn(phys_dev, 0)
        pcigsi = self.find_empty(0, 4, phys_devfn,                      \
                                 DEVITNX_F_PT|DEVITNX_F_MULTI)
        if pcigsi < 0:
            log.debug("Can't find 4 contigous free slots\n")
            return -1
        for i in range(1, 4):
            self.assign(phys_devfn, i, self.pcigsi_wrap(pcigsi + i), 0, \
                                  DEVITNX_F_MULTI)
        return self.assign(phys_devfn, 0, pcigsi, 0,                    \
                           DEVITNX_F_PT|DEVITNX_F_MULTI)

    def assign_pt_multi_fnx(self, phys_dev, phys_fn, phys_intx, fn0_pcigsi):
        # For the non-zeorth function of a multi-function device
        # when the fucntion 0 has already been inserted
        pcigsi = self.pcigsi_wrap(fn0_pcigsi + phys_intx)
        phys_devfn = self.devfn(phys_dev, phys_fn)
        for i in range(len(self.pcigsi_devintx[pcigsi])):
             entry = self.pcigsi_devintx[pcigsi][i]
             e_flag = self.entry_to_flag(entry)
             e_virt = self.entry_to_virt(entry)
             e_devfn = self.entry_to_phys(entry)
             e_dev = self.devfn_to_dev(e_devfn)
             if not e_flag & DEVITNX_F_PT and                           \
                e_flag & DEVITNX_F_MULTI and e_dev == phys_dev:
                 self.entry_update(entry, phys_devfn, e_virt,           \
                                   phys_intx, DEVITNX_F_PT|DEVITNX_F_MULTI)
                 return phys_devfn
        return self.assign(phys_devfn, phys_intx, pcigsi, phys_fn,      \
                               DEVITNX_F_PT|DEVITNX_F_MULTI)

    def unassign(self, pcigsi, idx):
        self.pcigsi_devintx[pcigsi].pop(idx)

    def pv_multi_dev_in_use_by_non_zero(self, phys_dev):
        ret = []
        for pcigsi in range(NR_PCIGSI):
            for i in range(len(self.pcigsi_devintx[pcigsi])):
                entry = self.pcigsi_devintx[pcigsi][i]
                e_flag = self.entry_to_flag(entry)
                e_devfn = self.entry_to_phys(entry)
                e_dev = self.devfn_to_dev(e_devfn)
                e_fn = self.devfn_to_fn(e_devfn)
                if e_flag & DEVITNX_F_PT and e_dev == phys_dev and e_fn != 0:
                    ret.append(entry)
        return ret

    def unassign_pt_multi_fn0(self, phys_dev, phys_fn):
        # Function 0 of a devices that has been passed through
        # as multi-function can't be removed as long as other
        # functions are passed through as part of the same
        # virtual device
        error = self.pv_multi_dev_in_use_by_non_zero(phys_dev)
        if len(error):
            s = "Can't unassign dev=0x%02x,func=0x%x because "          \
                "it is still in use by:" % (phys_dev, phys_fn)
            for i in range(len(error)):
                entry = error[i]
                e_devfn = self.entry_to_phys(entry)
                e_dev = self.devfn_to_dev(e_devfn)
                e_fn = self.devfn_to_fn(e_devfn)
                s += " dev=0x%02x,func=0x%x" % (e_dev, e_fn)
            log.error(s)
            return -1
        # Remove the device and any place holders
        for pcigsi in range(NR_PCIGSI):
            remove = []
            for i in range(len(self.pcigsi_devintx[pcigsi])):
                entry = self.pcigsi_devintx[pcigsi][i]
                e_devfn = self.entry_to_phys(entry)
                e_dev = self.devfn_to_dev(e_devfn)
                e_flag = self.entry_to_flag(entry)
                if e_flag & DEVITNX_F_PT|DEVITNX_F_MULTI and            \
                   e_dev == phys_dev:
                    remove.append(i)
            removed = 0
            for i in remove:
                self.unassign(pcigsi, i - removed)
                removed += 1
        return 0

    def unassign_pt_multi_fnx(self, phys_dev, phys_fn, pcigsi, idx):
        # If this is the last function for this device
        # attached to this gsi then convert it to a place holder.
        # Otherwise remove it.
        density = 0
        for i in range(len(self.pcigsi_devintx[pcigsi])):
            entry = self.pcigsi_devintx[pcigsi][i]
            e_devfn = self.entry_to_phys(entry)
            e_dev = self.devfn_to_dev(e_devfn)
            e_flag = self.entry_to_flag(entry)
            if e_flag & DEVITNX_F_PT|DEVITNX_F_MULTI and               \
               e_dev == phys_dev:
                density += 1
        if density > 1:
            self.unassign(pcigsi, idx)
            return
        entry = self.pcigsi_devintx[pcigsi][idx]
        e_devfn = self.entry_to_phys(entry)
        e_dev = self.devfn_to_dev(e_devfn)
        self.entry_update(entry, self.devfn(e_dev, 0), 0, 0, DEVITNX_F_MULTI)

    # Functions below this line are intended to be public

    # idx is just a key for debuging
    # It could be removed. Or it could be used to search for devices
    # that have already been insergted.
    # Removal doesn't occur for ioemu devices, but if it did,
    # then idx could be used for that too.
    def assign_ioemu(self, idx):
        # IOEMU devices always use function 0 and thus INTA
        pcigsi = self.find_empty(0, 1, -1, 0)
        if pcigsi < 0:
            return -1
        return self.assign(idx, 0, pcigsi, 0, 0)

    # assign_pt_single and assign_pt_multi should probably
    # be a single function.

    def assign_pt_single(self, phys_dev, phys_fn):
        phys_devfn = self.devfn(phys_dev, phys_fn)
        pcigsi = self.find_empty(0, 1, phys_devfn, DEVITNX_F_PT)
        if pcigsi < 0:
            return -1
        return self.assign(phys_devfn, 0, pcigsi, 0, DEVITNX_F_PT)

    def assign_pt_multi(self, phys_dev, phys_fn, phys_intx):
        if phys_fn == 0:
            return self.assign_pt_multi_fn0(phys_dev)
        pcigsi = self.pv_multi_fn0_pcigsi(phys_dev)
        if pcigsi >= 0:
            return self.assign_pt_multi_fnx(phys_dev, phys_fn,          \
                                            phys_intx, pcigsi)
        # This does not get the DEVITNX_F_MULTI flag set because
        # it is not assigned to the same virtual device as function 0
        # and thus is passed-through as a single-function
        return self.assign_pt_single(phys_dev, phys_fn)

    def unassign_pt(self, phys_dev, phys_fn):
        phys_devfn = self.devfn(phys_dev, phys_fn)
        for pcigsi in range(NR_PCIGSI):
            for i in range(len(self.pcigsi_devintx[pcigsi])):
                entry = self.pcigsi_devintx[pcigsi][i]
                e_devfn = self.entry_to_phys(entry)
                e_flag = self.entry_to_flag(entry)
                if phys_devfn != e_devfn or not e_flag & DEVITNX_F_PT:
                    continue
                # If it isn't marked multi-function
                # just remove it as it doesn't affect any other entries
                e_fn = self.devfn_to_fn(e_devfn)
                if not e_flag & DEVITNX_F_MULTI:
                    self.unassign(pcigsi, i)
                    return 0
                # If it is a non-zero function, it can be removed as it
                # doesn't affect any other entries, but be sure to leave a
                # place holder as neccessary
                e_fn = self.devfn_to_fn(e_devfn)
                if e_fn != 0:
                    self.unassign_pt_multi_fnx(phys_dev, phys_fn, pcigsi, i)
                    return 0
                # Function 0 of a devices that has been passed through
                # as multi-function can't be removed as long as other
                # functions are passed through as part of the same
                # virtual device
                return self.unassign_pt_multi_fn0(phys_dev, phys_fn)
        log.error("Can\'t unassign device dev=0x%02x,func=0x%x, "      \
                  "it hasn't been assigned" % (dev, fn))
        return -1

    def dump(self):
        log.debug("-------")
        for pcigsi in range(NR_PCIGSI):
            if not self.density(pcigsi, 1):
                continue
            log.debug("pcigsi=%02x:" % pcigsi)
            for i in range(len(self.pcigsi_devintx[pcigsi])):
                s = "\t"
                entry = self.pcigsi_devintx[pcigsi][i]
                if self.entry_to_flag(entry) & DEVITNX_F_PT|DEVITNX_F_MULTI:
                    s += "phys=0x%02x,0x%x " %                          \
                        (self.devfn_to_dev(self.entry_to_phys(entry)),  \
                         self.devfn_to_fn(self.entry_to_phys(entry)))
                else:
                    s += "phys=0x%02x     " % (self.entry_to_phys(entry))
                s += "virt=0x%02x,0x%x intx=0x%x flag=0x%x" %           \
                         (self.devfn_to_dev(self.entry_to_virt(entry)), \
                          self.devfn_to_fn(self.entry_to_virt(entry)),  \
                          self.entry_to_intx(entry),
                          self.entry_to_flag(entry))
                if self.entry_to_flag(entry) == DEVITNX_F_PT:
                    s += "(pv)"
                elif self.entry_to_flag(entry) ==                       \
                         DEVITNX_F_PT|DEVITNX_F_MULTI:
                    s += "(pv,multi)"
                elif self.entry_to_flag(entry) == DEVITNX_F_MULTI:
                    s += "(multi)"
                else:
                    s += "(ioemu)"
                log.debug("%s", s)

gsi = gsi()
#for i in range(128):

for dev in range(12):
    if dev % 4 == 0:
        v_dev = gsi.assign_ioemu(dev)
        if v_dev < 0:
            log.error("Can\'t assign ioemu device idx=0x%02x" % dev)
            gsi.dump()
        #gsi.dump()
    if dev % 4 == 1:
        for fn in range(8):
            v_dev = gsi.assign_pt_single(dev, fn)
            if v_dev < 0:
                log.error("Can\'t assign pt device dev=0x%02x,func=0x%x" %\
                              (dev, fn))
                gsi.dump()
        #gsi.dump()
        #for fn in range(8):
        #    v_dev = gsi.unassign_pt(dev, fn)
        #    if v_dev < 0:
        #        log.error("Can\'t unassign pv device dev=0x%02x,func=0x%x" %\
        #                      (dev, fn))
        #        gsi.dump()
        #gsi.dump()
    if dev % 4 == 2:
        for fn in range(8):
            intx = fn % 4
            v_dev = gsi.assign_pt_multi(dev, fn, intx)
            if v_dev < 0:
                log.error("Can\'t assign pv multi-function device "     \
                          "dev=0x%02x,func=0x%x,intx=0x%x" % (dev, fn, intx))
                gsi.dump()
        #gsi.dump()
        #for fn in range(8):
        #    intx = fn % 4
        #    v_dev = gsi.unassign_pt(dev, fn)
        #    if v_dev < 0:
        #        log.error("Can\'t unassign pv multi-function device "     \
        #                  "dev=0x%02x,func=0x%x" % (dev, fn))
        #        gsi.dump()
        #gsi.dump()
    if dev % 4 == 3:
        for fn in range(8):
            intx = fn % 4
            v_dev = gsi.assign_pt_multi(dev, fn, intx)
            if v_dev < 0:
                log.error("Can\'t assign pv multi-function device "     \
                          "dev=0x%02x,func=0x%x,intx=0x%x" % (dev, fn, intx))
                gsi.dump()
            v_dev = gsi.unassign_pt(dev, fn)
            if v_dev < 0:
                log.error("Can\'t unassign pv multi-function device "     \
                          "dev=0x%02x,func=0x%x" % (dev, fn))
                gsi.dump()
            v_dev = gsi.assign_pt_multi(dev, fn, intx)
            if v_dev < 0:
                log.error("Can\'t assign pv multi-function device "     \
                          "dev=0x%02x,func=0x%x,intx=0x%x" % (dev, fn, intx))
                gsi.dump()
        #gsi.dump()
        #for fn in [7, 6, 5, 4, 3, 2, 1, 0]:
        #    intx = fn % 4
        #    v_dev = gsi.unassign_pt(dev, fn)
        #    if v_dev < 0:
        #        log.error("Can\'t unassign pv multi-function device "     \
        #                  "dev=0x%02x,func=0x%x" % (dev, fn))
        #        gsi.dump()
        #    v_dev = gsi.assign_pt_multi(dev, fn, intx)
        #    if v_dev < 0:
        #        log.error("Can\'t assign pv multi-function device "     \
        #                  "dev=0x%02x,func=0x%x,intx=0x%x" % (dev, fn, intx))
        #        gsi.dump()
        #    v_dev = gsi.unassign_pt(dev, fn)
        #    if v_dev < 0:
        #        log.error("Can\'t unassign pv multi-function device "     \
        #                  "dev=0x%02x,func=0x%x" % (dev, fn))
        #        gsi.dump()
        #    gsi.dump()

gsi.dump()


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.