This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


[Xen-API] [PATCH 0 of 4] Fix a deadlock found by stress testing

To: xen-api@xxxxxxxxxxxxxxxxxxx
Subject: [Xen-API] [PATCH 0 of 4] Fix a deadlock found by stress testing
From: David Scott <dave.scott@xxxxxxxxxxxxx>
Date: Thu, 10 Dec 2009 23:04:54 +0000
Delivery-date: Thu, 10 Dec 2009 15:07:51 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-api-request@lists.xensource.com?subject=help>
List-id: Discussion of API issues surrounding Xen <xen-api.lists.xensource.com>
List-post: <mailto:xen-api@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-api>, <mailto:xen-api-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-api>, <mailto:xen-api-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-api-bounces@xxxxxxxxxxxxxxxxxxx
CA-33707: fix a nasty queueing deadlock found by stress testing.

In the API call paths for {clean,hard}_{shutdown,reboot} we used to hold the 
per-VM mutex and then block in the vm_lifecycle_op queue. It was possible to 
deadlock if an event was generated first (eg DevThread), the handler was 
queued, the handler reaches the head of the queue and then deadlocks trying to 
acquire the same per-VM lock.

Instead we only hold the per-VM lock when doing the actual domain destroy and 
recreate operations. We ask domains to shutdown without any lock held: this 
means we may interleave (eg) an API call VM.clean_shutdown with an internal 
guest reboot. Conflicts are resolved in favour of the API calls.

We should never block with the per-VM mutex held.

Note that both the event thread and the main API call path both now use the 
same queue 'domU_internal_shutdown'.

We also add a set of unit tests to quicktest to check that for every relevant 
API call + every relevant domain shutdown + both possible codepaths 
(synchronous API + asynchronous event thread) the result is as expected.

Signed-off-by: David Scott <dave.scott@xxxxxxxxxxxxx>

15 files changed, 540 insertions(+), 170 deletions(-)
ocaml/idl/api_errors.ml           |    3 
ocaml/idl/datamodel.ml            |    7 
ocaml/xapi/OMakefile              |    2 
ocaml/xapi/events.ml              |   23 ++-
ocaml/xapi/quicktest.ml           |    1 
ocaml/xapi/quicktest_lifecycle.ml |  194 ++++++++++++++++++++++++++
ocaml/xapi/vmops.ml               |   91 ++++++------
ocaml/xapi/xapi_fist.ml           |   17 ++
ocaml/xapi/xapi_vm.ml             |  273 ++++++++++++++++++++++++++-----------
ocaml/xapi/xapi_vm.mli            |    5 
ocaml/xapi/xapi_vm_migrate.ml     |   15 +-
ocaml/xenops/domain.ml            |   62 ++++----
ocaml/xenops/domain.mli           |    5 
ocaml/xenops/watch.ml             |    5 
ocaml/xenops/xenops.ml            |    7 

xen-api mailing list