[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sched=null vwfi=native and call_rcu()



On Fri, 14 Jan 2022, Dario Faggioli wrote:
> On Thu, 2022-01-06 at 17:52 -0800, Stefano Stabellini wrote:
> > On Thu, 6 Jan 2022, Julien Grall wrote:
> > > 
> > > This issue and solution were discussed numerous time on the ML. In
> > > short, we
> > > want to tell the RCU that CPU running in guest context are always
> > > quiesced.
> > > For more details, you can read the previous thread (which also
> > > contains a link
> > > to the one before):
> > > 
> > > https://lore.kernel.org/xen-devel/fe3dd9f0-b035-01fe-3e01-ddf065f182ab@xxxxxxxxx/
> > 
> > Thanks Julien for the pointer!
> > 
> > Dario, I forward-ported your three patches to staging:
> > https://gitlab.com/xen-project/people/sstabellini/xen/-/tree/rcu-quiet
> > 
> Hi Stefano!
> 
> I definitely would like to see the end of this issue, so thanks a lot
> for your interest and your help with the patches.
> 
> > I can confirm that they fix the bug. Note that I had to add a small
> > change on top to remove the ASSERT at the beginning of
> > rcu_quiet_enter:
> > https://gitlab.com/xen-project/people/sstabellini/xen/-/commit/6fc02b90814d3fe630715e353d16f397a5b280f9
> > 
> Yeah, that should be fine.
> 
> > Would you be up for submitting them for upstreaming? I would prefer
> > if
> > you send out the patches because I cannot claim to understand them
> > completely (except for the one doing renaming :-P )
> > 
> Haha! So, I am up for properly submitting, but there's one problem. As
> you've probably got, the idea here is to use transitions toward the
> guest and inside the hypervisor as RCU quiescence and "activation"
> points.
> 
> Now, on ARM, that just meant calling rcu_quiet_exit() in
> enter_hypervisor_from_guest() and calling rcu_quiet_enter() in
> leave_hypervisor_to_guest(). Nice and easy, and even myself --and I'm
> definitely not an ARM person-- cloud understand it (although with some
> help from Julien) and put the patches together.
> 
> However, the problem is really arch independent and, despite not
> surfacing equally frequently, it affects x86 as well. And for x86 the
> situation is by far not equally nice, when it comes to identifying all
> the places from where to call rcu_quiet_{enter,exit}().
> 
> And finding out where to put them, among the various functions that we
> have in the various entry.S variants is where I stopped. The plan was
> to get back to it, but as shamefully as it sounds, I could not do that
> yet.
> 
> So, if anyone wants to help with this, handing over suggestions for
> potential good spots, that would help a lot.

Unfortunately I cannot volunteer due to time and also because I wouldn't
know where to look and I don't have a reproducer or a test environment
on x86. I would be flying blind.


> Alternatively, we can submit the series as ARM-only... But I fear that
> the x86 side of things would then be easily forgotten. :-(

I agree with you on this, but at the same time we are having problems
with customers in the field -- it is not like we can wait to solve the
problem on ARM any longer. And the issue is certainly far less likely to
happen on x86 (there is no vwfi=native, right?) In other words, I think
it is better to have half of the solution now to solve the worst part of
the problem than to wait more months for a full solution.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.