Thanks for your reply. Please see embedded comments.
Petersson, Mats write on 2006年12月6日 22:14:
>> -----Original Message-----
>> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Emmanuel
>> Ackaouy Sent: 06 December 2006 14:02
>> To: Xu, Anthony
>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; xen-ia64-devel
>> Subject: Re: [Xen-devel] unnecessary VCPU migration happens again
>> Hi Anthony.
>> Could you send xentrace output for scheduling operations
>> in your setup?
I'm not sure xentrace works on IPF side. I'm trying.
>> Perhaps we're being a little too aggressive spreading
>> work across sockets. We do this on vcpu_wake right now.
I think below logic also does spreading work.
1. in csched_load_balance, below code segment sets _VCPUF_migrating flag
in peer_vcpu, as the comment said,
* If we failed to find any remotely queued VCPUs to move here,
* see if it would be more efficient to move any of the running
* remote VCPUs over here.
/* Signal the first candidate only. */
if ( !is_idle_vcpu(peer_vcpu) &&
__csched_running_vcpu_is_stealable(cpu, peer_vcpu) )
2. When this peer_vcpu is scheduled out, migration happens,
void context_saved(struct vcpu *prev)
if ( unlikely(test_bit(_VCPUF_migrating, &prev->vcpu_flags)) )
>From this logic, the migration happens frequently if the numbers VCPU
is less than the number of logic CPU.
>> I'm not sure I understand why HVM VCPUs would block
>> and wake more often than PV VCPUs though. Can you
> Whilst I don't know any of the facts of the original poster, I can
> tell you why HVM and PV guests have differing number of scheduling
> Every time you get a IOIO/MMIO vmexit that leads to a qemu-dm
> interaction, you'll get a context switch. So for an average IDE block
> read/write (for example) on x86, you get 4-5 IOIO intercepts to send
> the command to qemu, then an interrupt is sent to the guest to
> indicate that the operation is finished, followed by a 256 x 16-bit
> IO read/write of the sector content (which is normally just one IOIO
> intercept unless the driver is "stupid"). This means around a dozen
> or so schedule operations to do one disk IO operation.
> The same operation in PV (or using PV driver in HVM guest of course)
> would require a single transaction from DomU to Dom0 and back, so only
> two schedule operations.
> The same "problem" occurs of course for other hardware devices such as
> network, keyboard, mouse, where a transaction consists of more than a
> single read or write to a single register.
That I want to highlight is,
When HVM VCPU is executing IO operation,
This HVM VCPU is blocked by HV, until this IO operation
is emulated by Qemu. Then HV wakes up this HVM VCPU.
While PV VCPU will not be blocked by PV driver.
I can give below senario.
There are two sockets, two core per socket.
Assume, dom0 is running on socket1 core1,
vti1 is runing on socket1 core2,
Vti 2 is runing on socket2 core1,
Socket2 core2 is idle.
If vti2 is blocked by IO operation, then socket2 core1 is idle,
That means two cores in socket2 are idle,
While dom0 and vti1 are running on two cores of socket1,
Then scheduler will try to spread dom0 and vti1 on these two sockets.
Then migration happens. This is no necessary.
>> If you could gather some scheduler traces and send
>> results, it will give us a good idea of what's going
>> on and why. The multi-core support is new and not
>> widely tested so it's possible that it is being
>> overly aggressive or perhaps even buggy.
>> Xen-devel mailing list
Xen-devel mailing list