[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] null scheduler bug



Hi Dario,

On 09/25/2018 10:02 AM, Dario Faggioli wrote:
On Mon, 2018-09-24 at 22:46 +0100, Julien Grall wrote:
On 09/21/2018 05:20 PM, Dario Faggioli wrote:

What I'm after, is how log, after domain_destroy(),
complete_domain_destroy() is called, and whether/how it relates the
the
grace period idle timer we've added in the RCU code.

NULL scheduler and vwfi=native will inevitably introduce a latency
when
destroying a domain. vwfi=native means the guest will not trap when
it
has nothing to do and switch to the idle vCPU. So, in such
configuration, it is extremely unlikely the execute the idle_loop or
even enter in the hypervisor unless there are an interrupt on that
pCPU.

Ah! I'm not familiar with wfi=native --and in fact I was completely
ignoring it-- but this analysis makes sense to me.

Per my understanding of call_rcu, the calls will be queued until the
RCU
reached a threshold. We don't have many place where call_rcu is
called,
so reaching the threeshold may just never happen. But nothing will
tell
that vCPU to go in Xen and say "I am done with RCU". Did I miss
anything?

Yeah, and in fact we added the timer _but_, in this case, it does not
look that the timer is firing. It looks much more like "some random
interrupt happens", as you're suggesting. OTOH, in the case where there
are no printk()s, it might be that the timer does fire, but the vcpu
has not gone through Xen, so the grace period is, as far as we know,
not expired yet (which is also in accordance with Julien's analysis, as
far as I understood it).

The timer is only activated when sched_tick_suspend() is called. With vwfi=native, you will never reach the idle_loop() and therefore never setup a timer.

Milan confirmed that guest can be destroyed with vwfi=native removed. So this is confirming my thinking. Trapping wfi will end up to switch to idle vCPU and trigger the grace period.

I am not entirely sure you will be able to reproduce it on x86, but I don't think it is a Xen Arm specific.

When I looked at the code, I don't see any grace period in other context than idle_loop. Rather than adding another grace period, I would just force quiescence for every call_rcu.

This should not be have a big performance impact as we don't use much call_rcu and it would allow domain to be fully destroyed in timely manner.

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.