[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH 00/10] Preemption in hypervisor (ARM only)
Julien Grall writes: > On Wed, 24 Feb 2021 at 20:58, Volodymyr Babchuk > <Volodymyr_Babchuk@xxxxxxxx> wrote: >> >> >> Hi Julien, >> >> Julien Grall writes: >> >>> On 23/02/2021 12:06, Volodymyr Babchuk wrote: >>>> Hi Julien, >>> >>> Hi Volodymyr, >>> >>>> Julien Grall writes: >>>>> On 23/02/2021 02:34, Volodymyr Babchuk wrote: >>>>> ... just rescheduling the vCPU. It will also give the opportunity for >>>>> the guest to handle interrupts. >>>>> >>>>> If you don't return to the guest, then risk to get an RCU sched stall >>>>> on that the vCPU (some hypercalls can take really really long). >>>> Ah yes, you are right. I'd only wish that hypervisor saved context >>>> of >>>> hypercall on it's side... >>>> I have example of OP-TEE before my eyes. They have special return >>>> code >>>> "task was interrupted" and they use separate call "continue execution of >>>> interrupted task", which takes opaque context handle as a >>>> parameter. With this approach state of interrupted call never leaks to > >>>> rest of the system. >>> >>> Feel free to suggest a new approach for the hypercals. >>> >> >> I believe, I suggested it right above. There are some corner cases, that >> should be addressed, of course. > > If we wanted a clean break, then possibly yes. But I meant one that doesn't > break all the existing users and doesn't put Xen at risk. > > I don't believe your approach fulfill it. Of course, we can't touch any hypercalls that are part of stable ABI. But if I got this right, domctls and sysctls are not stable, so one can change theirs behavior quite drastically in major releases. >> >>>>> >>>>>> This approach itself have obvious >>>>>> problems: code that executes hypercall is responsible for preemption, >>>>>> preemption checks are infrequent (because they are costly by >>>>>> themselves), hypercall execution state is stored in guest-controlled >>>>>> area, we rely on guest's good will to continue the hypercall. >>>>> >>>>> Why is it a problem to rely on guest's good will? The hypercalls >>>>> should be preempted at a boundary that is safe to continue. >>>> Yes, and it imposes restrictions on how to write hypercall >>>> handler. >>>> In other words, there are much more places in hypercall handler code >>>> where it can be preempted than where hypercall continuation can be >>>> used. For example, you can preempt hypercall that holds a mutex, but of >>>> course you can't create an continuation point in such place. >>> >>> I disagree, you can create continuation point in such place. Although >>> it will be more complex because you have to make sure you break the >>> work in a restartable place. >> >> Maybe there is some misunderstanding. You can't create hypercall >> continuation point in a place where you are holding mutex. Because, >> there is absolutely not guarantee that guest will restart the >> hypercall. > > I don't think we are disagreeing here. My point is you should rarely > need to hold a mutex for a long period, so you could break your work > in smaller chunk. In which cases, you can use hypercall continuation. Let's say in this way: generally you can hold a mutex much longer than you can hold a spinlock. And nothing catastrophic will happen if you are preempted while holding a mutex. Better to avoid, this of course. > >> >> But you can preempt vCPU while holding mutex, because xen owns scheduler >> and it can guarantee that vCPU will be scheduled eventually to continue >> the work and release mutex. > > The problem is the "eventually". If you are accounting the time spent > in the hypervisor to the vCPU A, then there is a possibility that it > has exhausted its time slice. In which case, your vCPU A may be > sleeping for a while with a mutex held. > > If another vCPU B needs the mutex, it will either have to wait > potentially for a long time or we need to force vCPU A to run on > borrowed time. Yes, of course. >> >>> I would also like to point out that preemption also have some drawbacks. >>> With RT in mind, you have to deal with priority inversion (e.g. a >>> lower priority vCPU held a mutex that is required by an higher >>> priority). >> >> Of course. This is not as simple as "just call scheduler when we want >> to". > > Your e-mail made it sounds like it was easy to add preemption in > Xen. ;) I'm sorry for that :) Actually, there is lots of work to do. It appeared to me that "current" needs to be reworked, preempt_enable/disable needs to be reworked, per-cpu variables also should be reworked. And this is just to ensure consistency of already existing code. And I am not mentioning x86 support there... >>> Outside of RT, you have to be careful where mutex are held. In your >>> earlier answer, you suggested to held mutex for the memory >>> allocation. If you do that, then it means a domain A can block >>> allocation for domain B as it helds the mutex. >> >> As long as we do not exit to a EL1 with mutex being held, domain A can't >> block anything. Of course, we have to deal with priority inversion, but >> malicious domain can't cause DoS. > > It is not really a priority inversion problem outside of RT because > all the tasks will have the same priority. It is more a time > accounting problem because each vCPU may have a different number of > credits. Speaking of that, RTDS does not use concept of priority. And ARINC of course neither. >>>>> I am really surprised that this is the only changes necessary in >>>>> Xen. For a first approach, we may want to be conservative when the >>>>> preemption is happening as I am not convinced that all the places are >>>>> safe to preempt. >>>>> >>>> Well, I can't say that I ran extensive tests, but I played with this >>>> for >>>> some time and it seemed quite stable. Of course, I had some problems >>>> with RTDS... >>>> As I see it, Xen is already supports SMP, so all places where races >>>> are >>>> possible should already be covered by spinlocks or taken into account by >>>> some other means. >>> That's correct for shared resources. I am more worried for any >>> hypercalls that expected to run more or less continuously (e.g not >>> taking into account interrupt) on the same pCPU. >> >> Are there many such hypercalls? They can disable preemption if they >> really need to run on the same pCPU. As I understand, they should be >> relatively fast, because they can't create continuations anyway. > > Well, I never tried to make Xen preemptible... My comment is based on > the fact that the use preempt_{enable, disable}() was mostly done on a > best effort basis. > > The usual suspects are anything using this_cpu() or interacting with > the per-CPU HW registers. > > From a quick look here a few things (only looked at Arm): > * map_domain_page() in particular on arm32 because this is using > per-CPU page-tables > * guest_atomics_* as this uses this_cpu() > * virt_to_mfn() in particular the failure path > * Incorrect use (or missing) rcu locking. (Hopefully Juergen's > recent work in the RCU mitigate the risk) > > I can provide guidance, but you will have to go through the code and > check what's happening. Thank you for the list. Of course, I need to see thru all the code. I already had a bunch of problems with per_cpu variables... -- Volodymyr Babchuk at EPAM
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |