[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] Added the logic to decode 32 bit ldr/str post-indexing instructions



On Fri, 26 Nov 2021 15:28:06 +0000
Ayan Kumar Halder <ayan.kumar.halder@xxxxxxxxxx> wrote:

Hi Ayan,

> Many thanks for your inputs.
> Apologies if I sound dumb, but I need a few clarifications.
> 
> On 26/11/2021 13:14, Andre Przywara wrote:
> > On Fri, 19 Nov 2021 16:52:02 +0000
> > Ayan Kumar Halder <ayan.kumar.halder@xxxxxxxxxx> wrote:
> > 
> > Hi,
> >   
> >> At present, post indexing instructions are not emulated by Xen.
> >> When Xen gets the exception, EL2_ESR.ISV bit not set. Thus as a
> >> result, data abort is triggered.
> >>
> >> Added the logic to decode ldr/str post indexing instructions.
> >> With this, Xen can decode instructions like these:-
> >> ldr w2, [x1], #4
> >> Thus, domU can read ioreg with post indexing instructions.  
> > 
> > Where do those instructions come from? A (C) compiler? (Some mail in
> > another thread from Stefano suggests so)
> > If yes, I would argue that is broken:
> > IIUC C compilers assume normal memory attributes for every pointer they
> > handle, so they are free to use unaligned accesses, load/store exclusives,
> > split accesses (two halfword reads) and what not when generating code.
> > The GIC needs to be mapped as device memory, which explicitly forbids
> > unaligned accesses and exclusives (as in: always traps), so you cannot let
> > compiler-generated code access the GIC (or most other MMIO devices, for
> > that matter).
> > I know, this somewhat works(TM) in practise, because a uint32_t assignment
> > is very likely to end up in an ldr/str, but please let me know which car
> > this code ends up in, so that can I avoid this brand ;-)
> > 
> > You can tell the compiler to avoid unaligned accesses with -mstrict-align
> > (and should definitely do so when you are running C code with the MMU
> > off), but that still leaves exclusives and split accesses at the
> > compiler's discretion. A variation on the topic of split access is merged
> > writes, where the compiler uses NEON or SVE instructions, for instance, to
> > cover multiple words at once, possibly via some memset()/memcpy() routine.  
> 
> I understand that we should be using inline assembly instructions to 
> access any MMIO region. This is to prevent the compiler doing any tricks.
> 
> But is there a restriction that post indexing instructions can never be 
> used to access MMIO region ?

No, this is a pure virtualisation restriction, see below. On real
hardware/bare-metal, ldr/str with post or pre-indexing works and is fine
to use for MMIO.
But we need to have the right access width, matching the MMIO device's
expectation. So ldp/stp would probably be problematic, for instance.

> > On top there is this architectural restriction of the ARMv7/v8
> > virtualisation extension to not decode many "advanced" load/store
> > instructions in ESR_EL2.  
> Where do I find this restriction ?

That's described in the ESR_ELx syndrome description in the ARMv8 ARM (DDI
0487G.b), section "ISS encoding for an exception from a Data Abort" (page
D13-3219 in my Issue G.b copy):
"For other faults reported in ESR_EL2, ISV is 0 except for the following stage 
2 aborts: ...."

> Are you telling me that load/store with post indexing is an "advanced" 
> instruction and ArmV8 does not allow decoding of these instructions in 
> ESR_EL2 ?

Yes, it is in the group of instructions for which the hardware does not
provide syndrome information in ESR_EL2: " .... but excluding Load
Exclusive or Store Exclusive and excluding those with writeback)."

> Isn't that a very strong limitation ?

I don't know about that, it's what it is and what it was for years. Linux
deliberately chose ldr/str only for readl/writel to be able to trap and
handle MMIO aborts in hypervisors.

If you do MMIO accesses the right way, using (inline) assembly only, then
you don't have the problem, and also avoid many others, see my previous
mail.

If you think of it from an architectural and implementation point of view
(and keep the RISC idea in mind): it should happen rarely, but would
require many gates for something that you can do in software as well.

> Also what is your opinion on Xen decoding these instructions ?

I would be careful, we deliberately avoid this in KVM. This bubbles up
from time to time, though, so we now allow delegating this case to
userland, so the VMM can do the decoding there.
In Xen you have less issues with walking the guest's page tables,
though (a major problem in KVM), but it still adds complexity to a
hypervisor which aims to be lean by design.
Another argument would be that just post/pre does not cover everything, and
the cases start to pile up quickly: what about the immediate versions,
ldxr, stp, NEON/SVE load/stores, etc. Since many of those are not safe for
MMIO anyway, you add a lot of code for little use (and which gets little
testing!).

Cheers,
Andre

> > Linux deliberately coded readl/writel using inline assembly, to only
> > use instructions that provide syndrome information, plus guarantee
> > device-memory compatible semantics.
> > Check out https://lwn.net/Articles/698014/ for a comprehensive
> > discussion of this whole MMIO topic.
> > 
> > So I think you should do the same in your guest/bare metal code: define
> > {read,write}{b,h,l,q} as inline assembly functions, using ldr?/str?
> > only. See xen/include/asm-arm/arm64/io.h for an example that uses
> > static inline functions in a header file, to generate most optimal
> > code. Then always do MMIO only via those accessors. That prevents any
> > future compiler surprises, plus makes you perfectly virtualisable.
> > 
> > Cheers,
> > Andre.
> >   
> >> Signed-off-by: Ayan Kumar Halder <ayankuma@xxxxxxxxxx>
> >> ---
> >> Note to reviewer:-
> >> This patch is based on an issue discussed in
> >> https://lists.xenproject.org/archives/html/xen-devel/2021-11/msg00969.html
> >> "Xen/ARM - Query about a data abort seen while reading GICD registers"
> >>
> >>
> >>   xen/arch/arm/decode.c | 77
> >> +++++++++++++++++++++++++++++++++++++++++++ xen/arch/arm/io.c     |
> >> 14 ++++++-- 2 files changed, 88 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
> >> index 792c2e92a7..7b60bedbc5 100644
> >> --- a/xen/arch/arm/decode.c
> >> +++ b/xen/arch/arm/decode.c
> >> @@ -84,6 +84,80 @@ bad_thumb2:
> >>       return 1;
> >>   }
> >>   
> >> +static inline int32_t extract32(uint32_t value, int start, int
> >> length) +{
> >> +    int32_t ret;
> >> +
> >> +    if ( !(start >= 0 && length > 0 && length <= 32 - start) )
> >> +        return -EINVAL;
> >> +
> >> +    ret = (value >> start) & (~0U >> (32 - length));
> >> +
> >> +    return ret;
> >> +}
> >> +
> >> +static int decode_64bit_loadstore_postindexing(register_t pc, struct
> >> hsr_dabt *dabt) +{
> >> +    uint32_t instr;
> >> +    int size;
> >> +    int v;
> >> +    int opc;
> >> +    int rt;
> >> +    int imm9;
> >> +
> >> +    /* For details on decoding, refer to Armv8 Architecture
> >> reference manual
> >> +     * Section - "Load/store register (immediate post-indexed)", Pg
> >> 318
> >> +    */
> >> +    if ( raw_copy_from_guest(&instr, (void * __user)pc, sizeof
> >> (instr)) )
> >> +        return -EFAULT;
> >> +
> >> +    /* First, let's check for the fixed values */
> >> +
> >> +    /*  As per the "Encoding table for the Loads and Stores group",
> >> Pg 299
> >> +     * op4 = 1 - Load/store register (immediate post-indexed)
> >> +     */
> >> +    if ( extract32(instr, 10, 2) != 1 )
> >> +        goto bad_64bit_loadstore;
> >> +
> >> +    /* For the following, refer to "Load/store register (immediate
> >> post-indexed)"
> >> +     * to get the fixed values at various bit positions.
> >> +     */
> >> +    if ( extract32(instr, 21, 1) != 0 )
> >> +        goto bad_64bit_loadstore;
> >> +
> >> +    if ( extract32(instr, 24, 2) != 0 )
> >> +        goto bad_64bit_loadstore;
> >> +
> >> +    if ( extract32(instr, 27, 3) != 7 )
> >> +        goto bad_64bit_loadstore;
> >> +
> >> +    size = extract32(instr, 30, 2);
> >> +    v = extract32(instr, 26, 1);
> >> +    opc = extract32(instr, 22, 1);
> >> +
> >> +    /* At the moment, we support STR(immediate) - 32 bit variant and
> >> +     * LDR(immediate) - 32 bit variant only.
> >> +     */
> >> +    if (!((size==2) && (v==0) && ((opc==0) || (opc==1))))
> >> +        goto bad_64bit_loadstore;
> >> +
> >> +    rt = extract32(instr, 0, 5);
> >> +    imm9 = extract32(instr, 12, 9);
> >> +
> >> +    if ( imm9 < 0 )
> >> +        update_dabt(dabt, rt, size, true);
> >> +    else
> >> +        update_dabt(dabt, rt, size, false);
> >> +
> >> +    dabt->valid = 1;
> >> +
> >> +
> >> +    return 0;
> >> +bad_64bit_loadstore:
> >> +    gprintk(XENLOG_ERR, "unhandled 64bit instruction 0x%x\n", instr);
> >> +    return 1;
> >> +}
> >> +
> >>   static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
> >>   {
> >>       uint16_t instr;
> >> @@ -155,6 +229,9 @@ int decode_instruction(const struct cpu_user_regs
> >> *regs, struct hsr_dabt *dabt) if ( is_32bit_domain(current->domain)
> >> && regs->cpsr & PSR_THUMB ) return decode_thumb(regs->pc, dabt);
> >>   
> >> +    if ( is_64bit_domain(current->domain) )
> >> +        return decode_64bit_loadstore_postindexing(regs->pc, dabt);
> >> +
> >>       /* TODO: Handle ARM instruction */
> >>       gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
> >>   
> >> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> >> index 729287e37c..49e80358c0 100644
> >> --- a/xen/arch/arm/io.c
> >> +++ b/xen/arch/arm/io.c
> >> @@ -106,14 +106,13 @@ enum io_state try_handle_mmio(struct
> >> cpu_user_regs *regs, .gpa = gpa,
> >>           .dabt = dabt
> >>       };
> >> +    int rc;
> >>   
> >>       ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
> >>   
> >>       handler = find_mmio_handler(v->domain, info.gpa);
> >>       if ( !handler )
> >>       {
> >> -        int rc;
> >> -
> >>           rc = try_fwd_ioserv(regs, v, &info);
> >>           if ( rc == IO_HANDLED )
> >>               return handle_ioserv(regs, v);
> >> @@ -123,7 +122,16 @@ enum io_state try_handle_mmio(struct
> >> cpu_user_regs *regs, 
> >>       /* All the instructions used on emulated MMIO region should be
> >> valid */ if ( !dabt.valid )
> >> -        return IO_ABORT;
> >> +    {
> >> +        /*
> >> +         * Post indexing ldr/str instructions are not emulated by
> >> Xen. So, the
> >> +         * ISS is invalid. In such a scenario, we try to manually
> >> decode the
> >> +         * instruction from the program counter.
> >> +         */
> >> +        rc = decode_instruction(regs, &info.dabt);
> >> +        if ( rc )
> >> +            return IO_ABORT;
> >> +    }
> >>   
> >>       /*
> >>        * Erratum 766422: Thumb store translation fault to Hypervisor
> >> may  
> >   




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.