[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags



On 08/12/2014 11:35 AM, Jan Beulich wrote:
On 12.08.14 at 17:12, <boris.ostrovsky@xxxxxxxxxx> wrote:
On 08/12/2014 06:37 AM, Jan Beulich wrote:
On 08.08.14 at 18:55, <boris.ostrovsky@xxxxxxxxxx> wrote:
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1482,7 +1482,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
if ( is_hvm_vcpu(prev) )
       {
-        if (prev != next)
+        if ( (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
               vpmu_save(prev);
if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
@@ -1526,7 +1526,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
                              !is_hardware_domain(next->domain));
       }
- if (is_hvm_vcpu(next) && (prev != next) )
+    if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) 
)
           /* Must be done with interrupts enabled */
           vpmu_load(next);
Wouldn't such vPMU internals be better hidden in the functions
themselves? I realize you can save the calls this way, but if the
condition changes again later, we'll again have to adjust this
core function rather than just the vPMU code. It's bad enough that
the vpmu_mode variable is visible to non-vPMU code.
How about an inline function?
Yeah, that would perhaps do too. Albeit I'd still prefer all vPMU
logic to be handle in the called vPMU functions.

I was thinking about an inline in vpmu.h. Something like

inline void vpmu_next(struct vcpu *prev, struct vcpu *next)
{
if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
        /* Must be done with interrupts enabled */
         vpmu_load(next);
}

(and similar for vpmu_save())


I would prefer the whole series to get in in one go (with patch 2
possibly being the only exception). All other patches (including patch
1) are not particularly useful on their own.
Okay. In that case in patch 1, please consider swapping struct
xenpf_symdata's address and name fields (I had put a respective
note on the patch for when committing it), shrinking the structure
for compat mode guests.

+        goto cont_wait;
+
+    cpumask_andnot(&allbutself, &cpu_online_map,
+                   cpumask_of(smp_processor_id()));
+
+    sync_task = xmalloc_array(struct tasklet, allbutself_num);
+    if ( !sync_task )
+    {
+        printk("vpmu_force_context_switch: out of memory\n");
+        return -ENOMEM;
+    }
+
+    for ( i = 0; i < allbutself_num; i++ )
+        tasklet_init(&sync_task[i], vpmu_sched_checkin, 0);
+
+    atomic_set(&vpmu_sched_counter, 0);
+
+    j = 0;
+    for_each_cpu ( i, &allbutself )
This looks to be the only use for the (on stack) allbutself variable,
but you could easily avoid this by using for_each_online_cpu() and
skipping the local one. I'd also recommend that you count
allbutself_num here rather than up front, since that will much more
obviously prove that you wait for exactly as many CPUs as you
scheduled. The array allocation above is bogus anyway, as on a
huge system this can easily be more than a page in size.
I don't understand the last sentence. What does page-sized allocation
have to do with this?
We dislike greater than page size allocations at runtime.


I can allocate array of pointers and then allocate tasklets in the for(i) loop above if that makes it more palatable but I still need allbutself_num value earlier than what you suggested.




+long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
+{
+    int ret = -EINVAL;
+    xen_pmu_params_t pmu_params;
+
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+    {
+        static DEFINE_SPINLOCK(xenpmu_mode_lock);
+        uint32_t current_mode;
+
+        if ( !is_control_domain(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        if ( pmu_params.val & ~XENPMU_MODE_SELF )
+            return -EINVAL;
+
+        /*
+         * Return error is someone else is in the middle of changing mode ---
+         * this is most likely indication of two system administrators
+         * working against each other
+         */
+        if ( !spin_trylock(&xenpmu_mode_lock) )
+            return -EAGAIN;
+
+        current_mode = vpmu_mode;
+        vpmu_mode = pmu_params.val;
+
+        if ( vpmu_mode == XENPMU_MODE_OFF )
+        {
+            /*
+             * Make sure all (non-dom0) VCPUs have unloaded their VPMUs. This
+             * can be achieved by having all physical processors go through
+             * context_switch().
+             */
+            ret = vpmu_force_context_switch(arg);
+            if ( ret )
+                vpmu_mode = current_mode;
+        }
+        else
+            ret = 0;
+
+        spin_unlock(&xenpmu_mode_lock);
+        break;
This still isn't safe: There's nothing preventing another vCPU to issue
another XENPMU_mode_set operation while the one turning the vPMU
off is still in the process of waiting, but having exited the lock protected
region in order to allow other processing to occur.
I think that's OK: one of these two competing requests will timeout in
the while loop of vpmu_force_context_switch(). It will, in fact, likely
be the first caller because the second one will get right into the while
loop, without setting up the waiting data. This is a little
counter-intuitive but trying to set mode simultaneously from two
different places is asking for trouble anyway.
Sure it is, but we nevertheless want the hypervisor to be safe. So I
consider what you currently have acceptable only if you can prove
that nothing bad can happen no matter in what order multiple
requests get issued - from looking over your code I wasn't able to
convince myself of this.


Would a comment with what I said above (possibly a bit massaged) be sufficient?

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.