[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: attaching device to PV guest broken by your rework of libxl's PCI handling?



On 09/12/2021 04:17, Jan Beulich wrote:
Paul,

in 0fdb48ffe7a1 ("libxl: Make sure devices added by pci-attach are
reflected in the config") you've moved down the invocation of
libxl__create_pci_backend() from libxl__device_pci_add_xenstore().
In the PV case, soon after the original invocation place there is

     if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
         if (libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", 
XenbusStateConnected)) < 0)
             return ERROR_FAIL;
     }

Afaict the only way this wait could succeed is if the backend was
created up front. The lack thereof does, I think, explain a report
we've had:

vh015:~ # xl -vvv pci-attach sles-15-sp4-64-pv-def-net 63:11.4
libxl: debug: libxl_pci.c:1561:libxl_device_pci_add: Domain 18:ao 
0x55a517704170: create: how=(nil) callback=(nil) poller=0x55a517704210
libxl: debug: libxl_qmp.c:1921:libxl__ev_qmp_dispose:  ev 0x55a5177047e8
libxl: error: libxl_device.c:1393:libxl__wait_for_backend: Backend 
/local/domain/0/backend/pci/18/0 does not exist
libxl: error: libxl_pci.c:1779:device_pci_add_done: Domain 
18:libxl__device_pci_add failed for PCI device 0:63:11.4 (rc -3)
libxl: error: libxl_device.c:1420:device_addrm_aocomplete: unable to add device


Wow. It must be a year since those patches went in... Most of the context has disappeared from my mind.

Since I don't fully understand what that commit as a whole is
doing, and since the specific change in the sequence of operations
also isn't explained in the description (or at least not in a way
for me to recognize the connection), I'm afraid I can't see how a
possible solution to this could look like. The best guess I could
come up with so far is that the code quoted above may also need
moving down, but I can't tell at all whether doing this after the
various other intermediate steps wouldn't be too late. Your input
(or even better a patch) would be highly appreciated.

The commit comment explains the problem that it is trying to fix but I agree that it does not call out the new sequence. The issue IIRC was in what happened before the call to device_add_domain_config() and what happened afterwards. In fixing that I guess I missed this immediate use of xenstore.

I *think* the correct fix would be to move the wait into the end of libxl__create_pci_backend(), which is where the frontend and backend state nodes are now set.

  Paul



Thanks, Jan





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.