[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/3] piix4acpi, xen, hotplug: Fix race with ACPI AML code and hotplug.



On Mon, May 13, 2013 at 02:20:00PM -0400, Konrad Rzeszutek Wilk wrote:
> On Mon, May 13, 2013 at 06:17:50PM +0100, Stefano Stabellini wrote:
> > On Mon, 13 May 2013, Konrad Rzeszutek Wilk wrote:
> > > This is a race so the amount varies but on a 4PCPU box
> > > I seem to get only ~14 out of 16 vCPUs I want to online.
> > > 
> > > The issue at hand is that QEMU xenstore.c hotplug code changes
> > > the vCPU array and triggers an ACPI SCI for each vCPU
> > > online/offline change. That means we modify the array of vCPUs
> > > as the guests ACPI AML code is reading it - resulting in
> > > the guest reading the data only once and not changing the
> > > CPU states appropiately.
> > > 
> > > The fix is to seperate the vCPU array changes from the ACPI SCI
> > > notification. The code now will enumerate all of the vCPUs
> > > and change the vCPU array if there is a need for a change.
> > > If a change did occur then only _one_ ACPI SCI pulse is sent
> > > to the guest. The vCPU array at that point has the online/offline
> > > modified to what the user wanted to have.
> > > 
> > > Specifically, if a user provided this command:
> > >  xl vcpu-set latest 16
> > > 
> > > (guest config has vcpus=1, maxvcpus=32) QEMU and the guest
> > > (in this case Linux) would do:
> > > 
> > > QEMU:                                           Guest OS:
> > > -xenstore_process_vcpu_set_event
> > >  -> Gets an XenBus notification for CPU1
> > >  -> Updates the gpe_state.cpus_state bitfield.
> > >         -> Pulses the ACPI SCI
> > >                                                 - ACPI SCI kicks in
> > > 
> > >  -> Gets an XenBus notification for CPU2
> > >  -> Updates the gpe_state.cpus_state bitfield.
> > >         -> Pulses the ACPI SCI
> > > 
> > >  -> Gets an XenBus notification for CPU3
> > >  -> Updates the gpe_state.cpus_state bitfield.
> > >         -> Pulses the ACPI SCI
> > >    ...
> > >                                                  - Method(PRST) invoked
> > > 
> > >  -> Gets an XenBus notification for CPU12
> > >  -> Updates the gpe_state.cpus_state bitfield.
> > >         -> Pulses the ACPI SCI
> > >                                                   - reads AF00 for CPU 
> > > state
> > >                                                     [gets 0xff]
> > >                                                   - reads AF02 [gets 0x7f]
> > > 
> > >  -> Gets an XenBus notification for CPU13
> > >  -> Updates the gpe_state.cpus_state bitfield.
> > >         -> Pulses the ACPI SCI
> > > 
> > >         .. until VCPU 16
> > >                                                  - Method PRST updates
> > >                                                    PR01 through 13 FLG
> > >                                                    entry.
> > >                                                  - PR01->PR13 _MAD
> > >                                                    invoked.
> > > 
> > >                                                  - Brings up 13 CPUs.
> > > 
> > > While QEMU updates the rest of the cpus_state bitfields the ACPI AML
> > > only does the CPU hotplug on those it had read.
> > > 
> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> > > ---
> > >  xenstore.c | 68 
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
> > >  1 file changed, 62 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/xenstore.c b/xenstore.c
> > > index c861d36..31dbbe5 100644
> > > --- a/xenstore.c
> > > +++ b/xenstore.c
> > > @@ -999,25 +999,24 @@ void xenstore_record_dm_state(const char *state)
> > >  {
> > >      xenstore_record_dm("state", state);
> > >  }
> > > -
> > > -static void xenstore_process_vcpu_set_event(char **vec)
> > > +static int xenstore_process_one_vcpu_set_event(char *node)
> > >  {
> > >      char *act = NULL;
> > > -    char *vcpustr, *node = vec[XS_WATCH_PATH];
> > > +    char *vcpustr;
> > >      unsigned int vcpu, len;
> > >      int changed = -EINVAL;
> > >  
> > >      vcpustr = strstr(node, "cpu/");
> > >      if (!vcpustr) {
> > >          fprintf(stderr, "vcpu-set: watch node error.\n");
> > > -        return;
> > > +        return changed;
> > >      }
> > >      sscanf(vcpustr, "cpu/%u", &vcpu);
> > >  
> > >      act = xs_read(xsh, XBT_NULL, node, &len);
> > >      if (!act) {
> > >          fprintf(stderr, "vcpu-set: no command yet.\n");
> > > -        return;
> > > +        return changed;
> > >      }
> > >  
> > >      if (!strncmp(act, "online", len))
> > > @@ -1028,8 +1027,65 @@ static void xenstore_process_vcpu_set_event(char 
> > > **vec)
> > >          fprintf(stderr, "vcpu-set: command error.\n");
> > >  
> > >      free(act);
> > > -    if (changed > 0)
> > > +    return changed;
> > > +}
> > > +static void xenstore_process_vcpu_set_event(char **vec)
> > > +{
> > > +    int changed = 0, rc, i, num = 0;
> > > +    char *vcpu, **dir;
> > > +    char *path = vec[XS_WATCH_PATH];
> > > +
> > > +    /*
> > > +     * Process the event right away in case the loop below fails
> > > +     * to enumerate the correct vCPU.
> > > +     */
> > > +    rc = xenstore_process_one_vcpu_set_event(path);
> > > +    if (rc > 0)
> > > +        changed = 1;
> > 
> > I am not sure I follow you here: why the loop below would fail?
> 
> If there is an error (for example the xs_read fails), then the loop
> below would stop as rc == -EINVAL.
> 
> > 
> > > +    /*
> > > +     * We get: /local/domain/<domid>/cpu/<vcpu>/availability or
> > > +     * (at init) /local/domain/<domid>/cpu [ignore it] and need to
> > > +     * iterate over /local/domain/<domid>/cpu/ directory.
> > > +     */
> > > +    vcpu = strstr(path, "cpu/");
> > > +    if (!vcpu) {
> > > +        fprintf(stderr,"[%s]: %s has no CPU!\n", __func__, path);
> > > +        return;
> > > +    }
> > > +    /* Eliminate '/availability' */
> > > +    vcpu[3] = '\0';
> > > +    dir = xs_directory(xsh, XBT_NULL, path, &num);
> > > +
> > > +    if (!dir) {
> > > +        fprintf(stderr, "[%s]: directory %s has no dirs!\n", __func__, 
> > > path);
> > > +        return;
> > > +    }
> > > +    if (num != vcpus)
> > > +        fprintf(stderr, "[%s]: %d (number of vcpu entries) != %d 
> > > (maxvcpus)! "\
> > > +                "Continuing on..\n", __func__, num, vcpus);
> > > +
> > > +    for (i = 0; i < num; i++) {
> > > +        char *attr;
> > > +        ssize_t len = strlen(path) + strlen(dir[i]) + 
> > > strlen("availability") +
> > > +                      3 /* account for two '/' and '\0'*/;
> > > +        attr = malloc(len);
> > 
> > just use XEN_BUFSIZE and put in on the stack
> 
> OK.
> > 
> > 
> > > +
> > > +        /* Construct "/local/domain/<domid>/cpu" (path) with <vcpu> 
> > > (attr),
> > > +         * and "availability" with '/' sprinkled around. */
> > > +        snprintf(attr, len, "%s/%s/%s", path, dir[i],  "availability");
> > > +        rc = xenstore_process_one_vcpu_set_event(attr);
> > > +
> > > +        free(attr);
> > > +        if (rc > 0)
> > > +            changed = 1;
> > > +        if (rc < 0)
> > > +            break;
> > 
> > Given that the user could provide a mask of vcpus to enable/disable,
> > shouldn't we just process them all anyway?
> 
> rc < 0 is if an error occurs. 0 is if there are no changes.
> 
> And xenstore_process_one_vcpu_set_event could fail (say xs_read). The top
> of the function would process the vCPU for which the event occured. Please
> keep in mind that this function is called for _each_ vCPU that has changed.
> 
> > > +    free (dir);
> > > +    if (changed > 0) {
> > > +        fprintf(stderr, "Notifying OS about CPU hotplug changes.\n");
> > >          qemu_cpu_notify();
> > > +    }
> > >      return;
> > >  }
> > >  
> > > -- 

New patch [I am still unsure whether you want me to just continue with the loop
and ignore errors - and still at the start of the function process the event or
restructure it so that the errors are ignored and the loop iterates over all of
the vCPUs?]


>From 950424a5e6a481dd1b2b4fb2645e7a852c199b7f Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Date: Mon, 13 May 2013 11:55:42 -0400
Subject: [PATCH] piix4acpi, xen, hotplug: Fix race with ACPI AML code and
 hotplug.

This is a race so the amount varies but on a 4PCPU box
I seem to get only ~14 out of 16 vCPUs I want to online.

The issue at hand is that QEMU xenstore.c hotplug code changes
the vCPU array and triggers an ACPI SCI for each vCPU
online/offline change. That means we modify the array of vCPUs
as the guests ACPI AML code is reading it - resulting in
the guest reading the data only once and not changing the
CPU states appropiately.

The fix is to seperate the vCPU array changes from the ACPI SCI
notification. The code now will enumerate all of the vCPUs
and change the vCPU array if there is a need for a change.
If a change did occur then only _one_ ACPI SCI pulse is sent
to the guest. The vCPU array at that point has the online/offline
modified to what the user wanted to have.

Specifically, if a user provided this command:
 xl vcpu-set latest 16

(guest config has vcpus=1, maxvcpus=32) QEMU and the guest
(in this case Linux) would do:

QEMU:                                           Guest OS:
-xenstore_process_vcpu_set_event
 -> Gets an XenBus notification for CPU1
 -> Updates the gpe_state.cpus_state bitfield.
        -> Pulses the ACPI SCI
                                                - ACPI SCI kicks in

 -> Gets an XenBus notification for CPU2
 -> Updates the gpe_state.cpus_state bitfield.
        -> Pulses the ACPI SCI

 -> Gets an XenBus notification for CPU3
 -> Updates the gpe_state.cpus_state bitfield.
        -> Pulses the ACPI SCI
   ...
                                                 - Method(PRST) invoked

 -> Gets an XenBus notification for CPU12
 -> Updates the gpe_state.cpus_state bitfield.
        -> Pulses the ACPI SCI
                                                  - reads AF00 for CPU state
                                                    [gets 0xff]
                                                  - reads AF02 [gets 0x7f]

 -> Gets an XenBus notification for CPU13
 -> Updates the gpe_state.cpus_state bitfield.
        -> Pulses the ACPI SCI

        .. until VCPU 16
                                                 - Method PRST updates
                                                   PR01 through 13 FLG
                                                   entry.
                                                 - PR01->PR13 _MAD
                                                   invoked.

                                                 - Brings up 13 CPUs.

While QEMU updates the rest of the cpus_state bitfields the ACPI AML
only does the CPU hotplug on those it had read.

[v1: Use stack for the 'attr' instead of malloc/free]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
---
 xenstore.c |   65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/xenstore.c b/xenstore.c
index c861d36..b0d6f77 100644
--- a/xenstore.c
+++ b/xenstore.c
@@ -22,6 +22,7 @@
 #include "pci.h"
 #include "qemu-timer.h"
 #include "qemu-xen.h"
+#include "xen_backend.h"
 
 struct xs_handle *xsh = NULL;
 static char *media_filename[MAX_DRIVES+1];
@@ -999,25 +1000,24 @@ void xenstore_record_dm_state(const char *state)
 {
     xenstore_record_dm("state", state);
 }
-
-static void xenstore_process_vcpu_set_event(char **vec)
+static int xenstore_process_one_vcpu_set_event(char *node)
 {
     char *act = NULL;
-    char *vcpustr, *node = vec[XS_WATCH_PATH];
+    char *vcpustr;
     unsigned int vcpu, len;
     int changed = -EINVAL;
 
     vcpustr = strstr(node, "cpu/");
     if (!vcpustr) {
         fprintf(stderr, "vcpu-set: watch node error.\n");
-        return;
+        return changed;
     }
     sscanf(vcpustr, "cpu/%u", &vcpu);
 
     act = xs_read(xsh, XBT_NULL, node, &len);
     if (!act) {
         fprintf(stderr, "vcpu-set: no command yet.\n");
-        return;
+        return changed;
     }
 
     if (!strncmp(act, "online", len))
@@ -1028,8 +1028,61 @@ static void xenstore_process_vcpu_set_event(char **vec)
         fprintf(stderr, "vcpu-set: command error.\n");
 
     free(act);
-    if (changed > 0)
+    return changed;
+}
+static void xenstore_process_vcpu_set_event(char **vec)
+{
+    int changed = 0, rc, i, num = 0;
+    char *vcpu, **dir;
+    char *path = vec[XS_WATCH_PATH];
+
+    /*
+     * Process the event right away in case the loop below fails
+     * to get to vCPU that is in the event.
+     */
+    rc = xenstore_process_one_vcpu_set_event(path);
+    if (rc > 0)
+        changed = 1;
+    /*
+     * We get: /local/domain/<domid>/cpu/<vcpu>/availability or
+     * (at init) /local/domain/<domid>/cpu [ignore it] and need to
+     * iterate over /local/domain/<domid>/cpu/ directory.
+     */
+    vcpu = strstr(path, "cpu/");
+    if (!vcpu) {
+        fprintf(stderr,"[%s]: %s has no CPU!\n", __func__, path);
+        return;
+    }
+    /* Eliminate '/availability' */
+    vcpu[3] = '\0';
+    dir = xs_directory(xsh, XBT_NULL, path, &num);
+
+    if (!dir) {
+        fprintf(stderr, "[%s]: directory %s has no dirs!\n", __func__, path);
+        return;
+    }
+    if (num != vcpus)
+        fprintf(stderr, "[%s]: %d (number of vcpu entries) != %d (maxvcpus)! "\
+                "Continuing on..\n", __func__, num, vcpus);
+
+    for (i = 0; i < num; i++) {
+        char attr[XEN_BUFSIZE];
+
+        /* Construct "/local/domain/<domid>/cpu" (path) with <vcpu> (attr),
+         * and "availability" with '/' sprinkled around. */
+        snprintf(attr, XEN_BUFSIZE, "%s/%s/%s", path, dir[i],  "availability");
+        rc = xenstore_process_one_vcpu_set_event(attr);
+
+        if (rc > 0)
+            changed = 1;
+        if (rc < 0) /* say xs_read failed */
+            break;
+    }
+    free (dir);
+    if (changed > 0) {
+        fprintf(stderr, "Notifying OS about CPU hotplug changes.\n");
         qemu_cpu_notify();
+    }
     return;
 }
 
-- 
1.7.7.6


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.