[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] mistakenly wake in Xen's credit scheduler



On Tue, Oct 27, 2015 at 2:11 PM, suokun <suokunstar@xxxxxxxxx> wrote:
> On Tue, Oct 27, 2015 at 3:44 AM, George Dunlap <dunlapg@xxxxxxxxx> wrote:
>> On Tue, Oct 27, 2015 at 5:59 AM, suokun <suokunstar@xxxxxxxxx> wrote:
>>> Hi all,
>>>
>>> The BOOST mechanism in Xen credit scheduler is designed to prioritize
>>> VM which has I/O-intensive application to handle the I/O request in
>>> time. However, this does not always work as expected.
>>
>> Thanks for the exploration, and the analysis.
>>
>> The BOOST mechanism is part of the reason I began to write the credit2
>> scheduler, which we are  hoping (any day now) to make the default
>> scheduler.  It was designed specifically with the workload you mention
>> in mind.  Would you care to try your test again and see how it fares?
>>
>
> Hi, George,
>
> Thank you for your reply. I have test credit2 this morning. The I/O
> performance is correct, however, the CPU accounting seems not correct.
> Here is my experiment on credit2:
>
> VM-IO:          1-vCPU pinned to a pCPU, running netperf
> VM-CPU:      1-vCPU pinned the the same pCPU, running a while(1) loop
> The throughput of netperf is the same(941Mbps) as VM-IO runs alone.
>
> However, when I use xl top to show the VM CPU utilization, VM-IO takes
> 73% of CPU time and VM-CPU takes 99% CPU time. Their sum is more than
> 100%. I doubt it is due to the CPU utilization accounting in credit2
> scheduler.
>
>
>> Also, do you have a patch to fix it in credit1? :-)
>>
>
> For the patch to my problem in credit1. I have two ideas:
>
> 1) if the vCPU cannot migrate(e.g. pinned, CPU affinity, even only has
> one physical CPU), do not set the _VPF_migrating flag.
>
> 2) let the BOOST state can preempt with each other.
>
> Actually I have tested both separately and they both work. But
> personally I prefer the first option because it solved the problem
> from the source.
>
> Best
> Tony

Here is my patch:

+++ /xen/common/sched_credit.c

if ( new_idlers_empty && new->pri > cur->pri )
{
    SCHED_STAT_CRANK(tickle_idlers_none);
    SCHED_VCPU_STAT_CRANK(cur, kicked_away);
    SCHED_VCPU_STAT_CRANK(cur, migrate_r);
    SCHED_STAT_CRANK(migrate_kicked_away);

+   /* migration can happen only cpu number greater than 1 and vcpu is
not pinned to a single physical CPU */
+   if(num_online_cpus() > 1 &&
cpumask_weight((cur->vcpu)->cpu_hard_affinity) > 1) {
        set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
+   }
    cpumask_set_cpu(cpu, &mask);
}

Best
Tony


>
>>  -George
>>
>>>
>>>
>>> (1) Problem description
>>> --------------------------------
>>> Suppose two VMs(named VM-I/O and VM-CPU) both have one virtual CPU and
>>> they are pinned to the same physical CPU. An I/O-intensive
>>> application(e.g. Netperf) runs in the VM-I/O and a CPU-intensive
>>> application(e.g. Loop) runs in the VM-CPU. When a client is sending
>>> I/O requests to VM-I/O, its vCPU cannot become BOOST state but obtains
>>> very little CPU cycles(less than 1% in Xen 4.6). Both the throughput
>>> and latency are very terrible.
>>>
>>>
>>>
>>> (2) Problem analysis
>>> --------------------------------
>>> This problem is due to the wake mechanism in Xen and CPU-intensive
>>> workload will be waked and boosted by mistake.
>>>
>>> Suppose the vCPU of VM-CPU is running and an I/O request comes, the
>>> current vCPU(vCPU of VM-CPU) will be marked as _VPF_migrating.
>>>
>>> static inline void __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
>>> {
>>> ...
>>>            if ( new_idlers_empty && new->pri > cur->pri )
>>>            {
>>>                SCHED_STAT_CRANK(tickle_idlers_none);
>>>                SCHED_VCPU_STAT_CRANK(cur, kicked_away);
>>>                SCHED_VCPU_STAT_CRANK(cur, migrate_r);
>>>                SCHED_STAT_CRANK(migrate_kicked_away);
>>>                set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
>>>                __cpumask_set_cpu(cpu, &mask);
>>>            }
>>> }
>>>
>>>
>>> next time when the schedule happens and the prev is the vCPU of
>>> VM-CPU, the context_saved(vcpu) will be executed. Because the vCPU has
>>> been marked as _VPF_migrating and it will then be waked up.
>>>
>>> void context_saved(struct vcpu *prev)
>>> {
>>>     ...
>>>
>>>     if ( unlikely(test_bit(_VPF_migrating, &prev->pause_flags)) )
>>>         vcpu_migrate(prev);
>>> }
>>>
>>> Once the state of vCPU of VM-CPU is UNDER, it will be changed into
>>> BOOST state which is designed originally for I/O-intensive vCPU. If
>>> this happen, even though the vCPU of VM-I/O becomes BOOST, it cannot
>>> get the physical CPU immediately but wait until the vCPU of VM-CPU is
>>> scheduled out. That will harm the I/O performance significantly.
>>>
>>>
>>>
>>> (3) Our Test results
>>> --------------------------------
>>> Hypervisor: Xen 4.6
>>> Dom 0 & Dom U: Linux 3.18
>>> Client: Linux 3.18
>>> Network: 1 Gigabit Ethernet
>>>
>>> Throughput:
>>> Only VM-I/O: 941 Mbps
>>> co-Run VM-I/O and VM-CPU: 32 Mbps
>>>
>>> Latency:
>>> Only VM-I/O: 78 usec
>>> co-Run VM-I/O and VM-CPU: 109093 usec
>>>
>>>
>>>
>>> This bug has been there since Xen 4.2 and still exists in the latest Xen 
>>> 4.6.
>>> Thanks.
>>> Reported by Tony Suo and Yong Zhao from UCCS
>>>
>>> --
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxx
>>> http://lists.xen.org/xen-devel



-- 

**********************************
> Kun SUO
> Email: suokunstar@xxxxxxxxx   |   ksuo@xxxxxxxx
> University of Colorado at Colorado Springs
> 1420 Austin Bluffs Pkwy, Colorado Springs, CO 80918
**********************************

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.