[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common



On Wed, 5 Aug 2020, Jan Beulich wrote:
> On 04.08.2020 21:11, Stefano Stabellini wrote:
> >> The point of the check isn't to determine whether to wait, but
> >> what to do after having waited. Reads need a retry round through
> >> the emulator (to store the result in the designated place),
> >> while writes don't have such a requirement (and hence guest
> >> execution can continue immediately in the general case).
> > 
> > The x86 code looks like this:
> > 
> >             rc = hvm_send_ioreq(s, &p, 0);
> >             if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
> >                 vio->io_req.state = STATE_IOREQ_NONE;
> >             else if ( !hvm_ioreq_needs_completion(&vio->io_req) )
> >                 rc = X86EMUL_OKAY;
> > 
> > Basically hvm_send_ioreq is expected to return RETRY.
> > Then, if it is a PIO write operation only, it is turned into OKAY right
> > away. Otherwise, rc stays as RETRY.
> > 
> > So, normally, hvmemul_do_io is expected to return RETRY, because the
> > emulator is not done yet. Am I understanding the code correctly?
> 
> "The emulator" unfortunately is ambiguous here: Do you mean qemu
> (or whichever else ioreq server) or the x86 emulator inside Xen?

I meant QEMU. I'll use "QEMU" instead of "emulator" in this thread going
forward for clarity.


> There are various conditions leading to RETRY. As far as
> hvm_send_ioreq() goes, it is expected to return RETRY whenever
> some sort of response is to be expected (the most notable
> exception being the hvm_send_buffered_ioreq() path), or when
> submitting the request isn't possible in the first place.
> 
> > If so, who is handling RETRY on x86? It tried to follow the call chain
> > but ended up in the x86 emulator and got lost :-)
> 
> Not sure I understand the question correctly, but I'll try an
> answer nevertheless: hvm_send_ioreq() arranges for the vCPU to be
> put to sleep (prepare_wait_on_xen_event_channel()). Once the event
> channel got signaled (and vCPU unblocked), hvm_do_resume() ->
> handle_hvm_io_completion() -> hvm_wait_for_io() then check whether
> the wait reason has been satisfied (wait_on_xen_event_channel()),
> and ...
> 
> > At some point later, after the emulator (QEMU) has completed the
> > request, handle_hvm_io_completion gets called which ends up calling
> > handle_mmio() finishing the job on the Xen side too.
> 
> ..., as you say, handle_hvm_io_completion() invokes the retry of
> the original operation (handle_mmio() or handle_pio() in
> particular) if need be.

OK, thanks for the details. My interpretation seems to be correct.

In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv should
return IO_RETRY. Then, xen/arch/arm/traps.c:do_trap_stage2_abort_guest
also needs to handle try_handle_mmio returning IO_RETRY the first
around, and IO_HANDLED when after QEMU does its job.

What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
early and let the scheduler do its job? Something like:

            enum io_state state = try_handle_mmio(regs, hsr, gpa);

            switch ( state )
            {
            case IO_ABORT:
                goto inject_abt;
            case IO_HANDLED:
                advance_pc(regs, hsr);
                return;
            case IO_RETRY:
                /* finish later */
                return;
            case IO_UNHANDLED:
                /* IO unhandled, try another way to handle it. */
                break;
            default:
                ASSERT_UNREACHABLE();
            }

Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
handle_hvm_io_completion() after QEMU completes the emulation. Today,
handle_mmio just sets the user register with the read value.

But it would be better if it called again the original function
do_trap_stage2_abort_guest to actually retry the original operation.
This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
IO_HANDLED instead of IO_RETRY, thus, it will advance_pc (the program
counter) completing the handling of this instruction.

The user register with the read value could be set by try_handle_mmio if
try_fwd_ioserv returns IO_HANDLED instead of IO_RETRY.

Is that how the state machine is expected to work?


> What's potentially confusing is that there's a second form of
> retry, invoked by the x86 insn emulator itself when it needs to
> split complex insns (the repeated string insns being the most
> important example). This results in actually exiting back to guest
> context without having advanced rIP, but after having updated
> other register state suitably (to express the progress made so
> far).

Ah! And it seems to be exactly the same label: X86EMUL_RETRY. It would
be a good idea to differentiate between them.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.