[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/string: correct memmove()'s forwarding to memcpy()
On 03/02/2021 15:31, Jan Beulich wrote: > On 03.02.2021 16:01, Andrew Cooper wrote: >> On 03/02/2021 14:22, Jan Beulich wrote: >>> With memcpy() expanding to the compiler builtin, we may not hand it >>> overlapping source and destination. We strictly mean to forward to our >>> own implementation (a few lines up in the same source file). >>> >>> Fixes: 78825e1c60fa ("x86/string: Clean up x86/string.h") >>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >> I agree that the current logic is buggy, but I'm not sure this is an >> improvement. >> >> You've switched from relying on GCC's builtin to operate forwards, to >> relying on Xen's implementation operating forwards. > Is there such a guarantee for the compiler builtin? If so - no > need for this patch indeed. But I couldn't find any doc saying > so. I've never seen it emit anything which isn't a forwards operation (i.e. I think the compiled result tended to be safe in practice), but C's flexibility does explicitly permit a backwards implementation. > >> At the very least, can we get a code comment stating something like >> "depends on Xen's implementation operating forwards" ? > No problem at all. In which case Acked-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> to avoid a round trip. > >>> --- >>> An alternative would be to "#undef memcpy" near the top of the file. But >>> I think the way it's done now is more explicit to the reader. An #undef >>> would be the only way if the macro was an object-like one. >> I chose not to use undef's to avoid impacting the optimisation of other >> functions in this file. I can't remember if this made a difference in >> practice. >> >>> At least with gcc10 this does alter generated code: The builtin gets >>> expanded into a tail call, while after this change our memcpy() gets >>> inlined into memmove(). This would change again once we separate the 3 >>> functions here into their own CUs for placing them in an archive. >> As (perhaps) a tangent, how do we plan to provide x86-optimised versions >> in combination with the library work? > By specifying the per-arch lib.a first. Ok - good to hear. >> We're long overdue needing to >> refresh our fast-strings support to include fast rep-mov/stosb. > That's pretty much orthogonal to the code movement though. Yes, but it does need doing. We're perpetually playing catchup, and there are meaningful improvements to be had for logic such as clear_page() which is fairly poor, optimisation wise atm. ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |