[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V3 15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed



On Wed, 9 Dec 2020, Julien Grall wrote:
> On 09/12/2020 23:35, Stefano Stabellini wrote:
> > On Wed, 9 Dec 2020, Stefano Stabellini wrote:
> > > On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
> > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>
> > > > 
> > > > This patch adds proper handling of return value of
> > > > vcpu_ioreq_handle_completion() which involves using a loop
> > > > in leave_hypervisor_to_guest().
> > > > 
> > > > The reason to use an unbounded loop here is the fact that vCPU
> > > > shouldn't continue until an I/O has completed. In Xen case, if an I/O
> > > > never completes then it most likely means that something went horribly
> > > > wrong with the Device Emulator. And it is most likely not safe to
> > > > continue. So letting the vCPU to spin forever if I/O never completes
> > > > is a safer action than letting it continue and leaving the guest in
> > > > unclear state and is the best what we can do for now.
> > > > 
> > > > This wouldn't be an issue for Xen as do_softirq() would be called at
> > > > every loop. In case of failure, the guest will crash and the vCPU
> > > > will be unscheduled.
> > > 
> > > Imagine that we have two guests: one that requires an ioreq server and
> > > one that doesn't. If I am not mistaken this loop could potentially spin
> > > forever on a pcpu, thus preventing any other guest being scheduled, even
> > > if the other guest doesn't need any ioreq servers.
> > > 
> > > 
> > > My other concern is that we are busy-looping. Could we call something
> > > like wfi() or do_idle() instead? The ioreq server event notification of
> > > completion should wake us up?
> > > 
> > > Following this line of thinking, I am wondering if instead of the
> > > busy-loop we should call vcpu_block_unless_event_pending(current) in
> > > try_handle_mmio if IO_RETRY. Then when the emulation is done, QEMU (or
> > > equivalent) calls xenevtchn_notify which ends up waking up the domU
> > > vcpu. Would that work?
> > 
> > I read now Julien's reply: we are already doing something similar to
> > what I suggested with the following call chain:
> > 
> > check_for_vcpu_work -> vcpu_ioreq_handle_completion -> wait_for_io ->
> > wait_on_xen_event_channel
> > 
> > So the busy-loop here is only a safety-belt in cause of a spurious
> > wake-up, in which case we are going to call again check_for_vcpu_work,
> > potentially causing a guest reschedule.
> > 
> > Then, this is fine and addresses both my concerns. Maybe let's add a note
> > in the commit message about it.
> 
> Damm, I hit the "sent" button just a second before seen your reply. :/ Oh
> well. I suggested the same because I have seen the same question multiple
> time.

:-)

 
> > I am also wondering if there is any benefit in calling wait_for_io()
> > earlier, maybe from try_handle_mmio if IO_RETRY?
> 
> wait_for_io() may end up to deschedule the vCPU. I would like to avoid this to
> happen in the middle of the I/O emulation because we need to happen it without
> lock held at all.
> 
> I don't think there are locks involved today, but the deeper in the call stack
> the scheduling happens, the more chance we may screw up in the future.
> 
> However...
> 
> > leave_hypervisor_to_guest is very late for that.
> 
> ... I am not sure what's the problem with that. The IOREQ will be notified of
> the pending I/O as soon as try_handle_mmio() put the I/O in the shared page.
> 
> If the IOREQ server is running on a different pCPU, then it might be possible
> that the I/O has completed before reached leave_hypervisor_to_guest(). In this
> case, we would not have to wait for the I/O.

Yeah, I was thinking about that too. Actually it could be faster
this way we end up being lucky.

The reason for moving it earlier would be that by the time
leave_hypervisor_to_guest is called "Xen" has already decided to
continue running this particular vcpu. If we called wait_for_io()
earlier, we would give important information to the scheduler before any
decision is made. This is more "philosophical" than practical though.
Let's leave it as is.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.