[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] migration regression in xen-4.11 and qemu-2.11 and qcow2


  • To: xen-devel@xxxxxxxxxxxxx
  • From: Olaf Hering <olaf@xxxxxxxxx>
  • Date: Tue, 8 May 2018 13:31:43 +0200
  • Delivery-date: Tue, 08 May 2018 11:32:02 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Am Mon, 7 May 2018 17:19:46 +0200
schrieb Olaf Hering <olaf@xxxxxxxxx>:

> What I gathered during debugging so far is that somehow qemu on the receiving 
> side locks a region twice:

After further debugging with many wild printfs:
On the receiving side blockdev_init sets BDRV_O_INACTIVE because 
RUN_STATE_INMIGRATE is true.
BDRV_O_INACTIVE causes bdrv_is_writable to return false.
As a result bdrv_format_default_perms does not set BLK_PERM_WRITE in perms.

On the sending side offset 0xc9 is unlocked on the other fd, which allows 
F_WRLCK to succeed:
2018-05-08T11:20:54.491168Z qemu-system-i386: qemu_lock_fcntl: 28 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:20:54.492162Z qemu-system-i386: qemu_lock_fd_test: 28 c9 1 
F_WRLCK>F_UNLCK 0 Success
2018-05-08T11:20:54.494752Z qemu-system-i386: qemu_lock_fcntl: 28 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:05.189455Z qemu-system-i386: qemu_lock_fcntl: 28 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:05.190460Z qemu-system-i386: qemu_lock_fd_test: 28 c9 1 
F_WRLCK>F_UNLCK 0 Success
2018-05-08T11:21:05.192726Z qemu-system-i386: qemu_lock_fcntl: 28 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:05.194298Z qemu-system-i386: qemu_lock_fcntl: 28 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:05.195079Z qemu-system-i386: qemu_lock_fd_test: 28 c9 1 
F_WRLCK>F_UNLCK 0 Success
2018-05-08T11:21:05.197123Z qemu-system-i386: qemu_lock_fcntl: 28 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:05.199378Z qemu-system-i386: qemu_lock_fcntl: 28 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:05.201108Z qemu-system-i386: qemu_lock_fcntl: 28 c9 1 
F_UNLCK>F_UNLCK 0 Success
2018-05-08T11:21:05.344335Z qemu-system-i386: qemu_lock_fcntl: 27 c9 1 
F_UNLCK>F_UNLCK 0 Success
2018-05-08T11:21:05.345969Z qemu-system-i386: qemu_lock_fcntl: 27 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:05.346836Z qemu-system-i386: qemu_lock_fd_test: 27 c9 1 
F_WRLCK>F_UNLCK 0 Success
2018-05-08T11:21:05.348937Z qemu-system-i386: qemu_lock_fcntl: 27 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:05.359691Z qemu-system-i386: qemu_lock_fcntl: 27 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:05.360632Z qemu-system-i386: qemu_lock_fd_test: 27 c9 1 
F_WRLCK>F_UNLCK 0 Success
2018-05-08T11:21:05.363221Z qemu-system-i386: qemu_lock_fcntl: 27 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:05.364781Z qemu-system-i386: qemu_lock_fcntl: 27 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:05.365607Z qemu-system-i386: qemu_lock_fd_test: 27 c9 1 
F_WRLCK>F_UNLCK 0 Success
2018-05-08T11:21:05.367794Z qemu-system-i386: qemu_lock_fcntl: 27 c9 1 
F_RDLCK>F_RDLCK 0 Success

It seems on the receiving side some code forgets to unclock offset 0xc9, which 
causes F_WRLCK to fail:
2018-05-08T11:21:52.108809Z qemu-system-i386: qemu_lock_fcntl: 27 c9 1 
F_UNLCK>F_UNLCK 0 Success
2018-05-08T11:21:52.112193Z qemu-system-i386: qemu_lock_fcntl: 27 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:52.113028Z qemu-system-i386: qemu_lock_fd_test: 27 c9 1 
F_WRLCK>F_UNLCK 0 Success
2018-05-08T11:21:52.115401Z qemu-system-i386: qemu_lock_fcntl: 27 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:52.122037Z qemu-system-i386: qemu_lock_fcntl: 27 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:52.122886Z qemu-system-i386: qemu_lock_fd_test: 27 c9 1 
F_WRLCK>F_UNLCK 0 Success
2018-05-08T11:21:52.125189Z qemu-system-i386: qemu_lock_fcntl: 27 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:52.126969Z qemu-system-i386: qemu_lock_fcntl: 27 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:52.127801Z qemu-system-i386: qemu_lock_fd_test: 27 c9 1 
F_WRLCK>F_UNLCK 0 Success
2018-05-08T11:21:52.130109Z qemu-system-i386: qemu_lock_fcntl: 27 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:52.859199Z qemu-system-i386: qemu_lock_fcntl: 39 c9 1 
F_UNLCK>F_UNLCK 0 Success
2018-05-08T11:21:52.862010Z qemu-system-i386: qemu_lock_fcntl: 39 c9 1 
F_RDLCK>F_RDLCK 0 Success
2018-05-08T11:21:52.862673Z qemu-system-i386: qemu_lock_fd_test: 39 c9 1 
F_WRLCK>F_RDLCK 0 Success
2018-05-08T11:21:53.112935Z qemu-system-i386: qemu_lock_fd_test: 39 c9 1 
F_WRLCK>F_RDLCK 0 Success
2018-05-08T11:21:53.363246Z qemu-system-i386: qemu_lock_fd_test: 39 c9 1 
F_WRLCK>F_RDLCK 0 Success
2018-05-08T11:21:53.615668Z qemu-system-i386: qemu_lock_fcntl: 39 c9 1 
F_UNLCK>F_UNLCK 0 Success
2018-05-08T11:21:53.616426Z qemu-system-i386: qemu_lock_fcntl: 39 c9 1 
F_UNLCK>F_UNLCK 0 Success
2018-05-08T11:21:53.616816Z qemu-system-i386: qemu_lock_fcntl: 39 c9 1 
F_UNLCK>F_UNLCK 0 Success


It is unclear why that was never noticed in xen-4.10, qemu-2.9 did not have 
that bug.
Also, if a KVM or Xen guest is migrated should make zero difference for the 
qcow2 driver...


Olaf

Attachment: pgpb1A2SdimIo.pgp
Description: Digitale Signatur von OpenPGP

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.