[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH 00/10] Preemption in hypervisor (ARM only)
On Wed, 24 Feb 2021 at 20:58, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx> wrote: > > > Hi Julien, > > Julien Grall writes: > > > On 23/02/2021 12:06, Volodymyr Babchuk wrote: > >> Hi Julien, > > > > Hi Volodymyr, > > > >> Julien Grall writes: > >>> On 23/02/2021 02:34, Volodymyr Babchuk wrote: > >>> ... just rescheduling the vCPU. It will also give the opportunity for > >>> the guest to handle interrupts. > >>> > >>> If you don't return to the guest, then risk to get an RCU sched stall > >>> on that the vCPU (some hypercalls can take really really long). > >> Ah yes, you are right. I'd only wish that hypervisor saved context > >> of > >> hypercall on it's side... > >> I have example of OP-TEE before my eyes. They have special return > >> code > >> "task was interrupted" and they use separate call "continue execution of > >> interrupted task", which takes opaque context handle as a > >> parameter. With this approach state of interrupted call never leaks to > > >> rest of the system. > > > > Feel free to suggest a new approach for the hypercals. > > > > I believe, I suggested it right above. There are some corner cases, that > should be addressed, of course. If we wanted a clean break, then possibly yes. But I meant one that doesn't break all the existing users and doesn't put Xen at risk. I don't believe your approach fulfill it. > > >>> > >>>> This approach itself have obvious > >>>> problems: code that executes hypercall is responsible for preemption, > >>>> preemption checks are infrequent (because they are costly by > >>>> themselves), hypercall execution state is stored in guest-controlled > >>>> area, we rely on guest's good will to continue the hypercall. > >>> > >>> Why is it a problem to rely on guest's good will? The hypercalls > >>> should be preempted at a boundary that is safe to continue. > >> Yes, and it imposes restrictions on how to write hypercall > >> handler. > >> In other words, there are much more places in hypercall handler code > >> where it can be preempted than where hypercall continuation can be > >> used. For example, you can preempt hypercall that holds a mutex, but of > >> course you can't create an continuation point in such place. > > > > I disagree, you can create continuation point in such place. Although > > it will be more complex because you have to make sure you break the > > work in a restartable place. > > Maybe there is some misunderstanding. You can't create hypercall > continuation point in a place where you are holding mutex. Because, > there is absolutely not guarantee that guest will restart the > hypercall. I don't think we are disagreeing here. My point is you should rarely need to hold a mutex for a long period, so you could break your work in smaller chunk. In which cases, you can use hypercall continuation. > > But you can preempt vCPU while holding mutex, because xen owns scheduler > and it can guarantee that vCPU will be scheduled eventually to continue > the work and release mutex. The problem is the "eventually". If you are accounting the time spent in the hypervisor to the vCPU A, then there is a possibility that it has exhausted its time slice. In which case, your vCPU A may be sleeping for a while with a mutex held. If another vCPU B needs the mutex, it will either have to wait potentially for a long time or we need to force vCPU A to run on borrowed time. > > > I would also like to point out that preemption also have some drawbacks. > > With RT in mind, you have to deal with priority inversion (e.g. a > > lower priority vCPU held a mutex that is required by an higher > > priority). > > Of course. This is not as simple as "just call scheduler when we want > to". Your e-mail made it sounds like it was easy to add preemption in Xen. ;) > > > Outside of RT, you have to be careful where mutex are held. In your > > earlier answer, you suggested to held mutex for the memory > > allocation. If you do that, then it means a domain A can block > > allocation for domain B as it helds the mutex. > > As long as we do not exit to a EL1 with mutex being held, domain A can't > block anything. Of course, we have to deal with priority inversion, but > malicious domain can't cause DoS. It is not really a priority inversion problem outside of RT because all the tasks will have the same priority. It is more a time accounting problem because each vCPU may have a different number of credits. > >>> I am really surprised that this is the only changes necessary in > >>> Xen. For a first approach, we may want to be conservative when the > >>> preemption is happening as I am not convinced that all the places are > >>> safe to preempt. > >>> > >> Well, I can't say that I ran extensive tests, but I played with this > >> for > >> some time and it seemed quite stable. Of course, I had some problems > >> with RTDS... > >> As I see it, Xen is already supports SMP, so all places where races > >> are > >> possible should already be covered by spinlocks or taken into account by > >> some other means. > > That's correct for shared resources. I am more worried for any > > hypercalls that expected to run more or less continuously (e.g not > > taking into account interrupt) on the same pCPU. > > Are there many such hypercalls? They can disable preemption if they > really need to run on the same pCPU. As I understand, they should be > relatively fast, because they can't create continuations anyway. Well, I never tried to make Xen preemptible... My comment is based on the fact that the use preempt_{enable, disable}() was mostly done on a best effort basis. The usual suspects are anything using this_cpu() or interacting with the per-CPU HW registers. >From a quick look here a few things (only looked at Arm): * map_domain_page() in particular on arm32 because this is using per-CPU page-tables * guest_atomics_* as this uses this_cpu() * virt_to_mfn() in particular the failure path * Incorrect use (or missing) rcu locking. (Hopefully Juergen's recent work in the RCU mitigate the risk) I can provide guidance, but you will have to go through the code and check what's happening. Cheers,
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |