[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH for-4.12 V3] x86/altp2m: fix HVMOP_altp2m_set_domain_state race

On 2/12/19 9:31 AM, Jan Beulich wrote:
On 11.02.19 at 18:21, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
On 2/11/19 6:59 PM, Jan Beulich wrote:
Thanks for noticing, actually this appears to invalidate the whole
purpose of the patch (I should have tested this more before sumbitting
V3, sorry).

The whole point of the new boolean is to have p2m assigned an altp2m
regardless of altp2m_active() (hence the change) - which now no longer
happens. I got carried away with this change.

The fact that this is so easy to get wrong is the reason why I've
preferred the domain_pause() solution. There appears to be no clean way
to fix this otherwise, and if this is so easy to misunderstand it'll
break just as easily with further changes.

I suppose I could just pass the bool along to p2m_get_altp2m() (and
indeed remove the if())...

I think the best that can be done here is to check if altp2m_active()
early in p2m_switch_vcpu_altp2m_by_id() and
p2m_switch_domain_altp2m_by_id(), then bail if it's not active. Since
these are only called by HVMOP_altp2m_* (and thus serialized by the
domain lock), it should be enough WRT HVMOP_altp2m_set_domain_state.

I'm confused: Where do you see the domain lock used there?
Plus I can't see p2m_switch_vcpu_altp2m_by_id() called for
any HVMOP_altp2m_* at all. One of the actual callers is guarded
by altp2m_active(), but the other isn't.

do_altp2m_op() does d = rcu_lock_domain_by_any_id(a.domain);, and
unlocks it before it exits.

But that's not the "domain lock". This is just making sure the domain
won't disappear behind your back.

But you're right, p2m_switch_vcpu_altp2m_by_id() is not called for any
HVMOP_altp2m_*, I've misread that. Hence, I believe both callers
ofp2m_switch_vcpu_altp2m_by_id() may race with HVMOP_altp2m_*.

Would you like me to add the altp2m_active() check in both
p2m_switch_domain_altp2m_by_id() and p2m_switch_vcpu_altp2m_by_id(), and
make it harder to race (it still won't be impossible, since the bool may
become false between the check and the actual function logic for
p2m_switch_vcpu_altp2m_by_id(), as you've noticed)?

Well, altp2m being Tech Preview, any partial fix _could_ do in
principle. But personally I dislike such half hearted approaches,
and hence I'd recommend to address the issue in full, i.e.
eliminate race potential altogether. In the end you're going to
be bitten more than me by not getting this into proper shape.

I'll attempt a V4 doing my best to check altp2m_active() then, and see if that's at least satisfactory for sane use-cases.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.