[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH] xen: arm: force guest memory accesses to cacheable when MMU is disabled



On ARM guest OSes are started with MMU and Caches disables (as they are on
native) however caching is enabled in the domain running the builder and
therefore we must ensure cache consistency.

The existing solution to this problem (a0035ecc0d82 "tools: libxc: flush data
cache after loading images into guest memory") is to flush the caches after
loading the various blobs into guest RAM. However this approach has two short
comings:

 - The cache flush primitives available to userspace on arm32 are not
   sufficient for our needs.
 - There is a race between the cache flush and the unmap of the guest page
   where the processor might speculatively dirty the cache line again.

(of these the second is the more fundamental)

This patch makes use of the the hardware functionality to force all accesses
made from guest mode to be cached (the HCR.DC == default cached bit). This
means that we don't need to worry about the domain builder's writes being
cached because the guests "uncached" accesses will actually be cached.

Unfortunately the use of HCR.DC is incompatible with the guest enabling its
MMU (SCTLR.M bit). Therefore we must trap accesses to the SCTLR so that we can
detect when this happens and disable HCR.DC. This is done with the HCR.TVM
(trap virtual memory controls) bit which also causes various other registers
to be trapped, all of which can be passed straight through to the underlying
register. Once the guest has enabled its MMU we no longer need to trap so
there is no ongoing overhead. In my tests Linux makes about half a dozen
accesses to these registers before the MMU is enabled, I would expect other
OSes to behave similarly (the sequence of writes needed to setup the MMU is
pretty obvious).

Apart from this unfortunate need to trap these accesses this approach is
incompatible with guests which attempt to do DMA operations with their MMU
disabled. In practice this means guests with passthrough which we do not yet
support. Since a typical guest (including dom0) does not access devices which
require DMA until after it is fully up and running with paging enabled the
main risk is to in-guest firmware which does DMA i.e. running EFI in a guest,
with a disk passed through and booting from that disk. Since we know that dom0
is not using any such firmware and we do not support device passthrough to
guests yet we can live with this restriction. Once passthrough is implemented
this will need to be revisited.

The patch includes a couple of seemingly unrelated but necessary changes:

 - HSR_SYSREG_CRN_MASK was incorrectly defined, which happened to be benign
   with the existing set of system register we handled, but broke with the new
   ones introduced here.
 - The defines used to decode the HSR system register fields were named the
   same as the register. This breaks the accessor macros. This had gone
   unnoticed because the handling of the existing trapped registers did not
   require accessing the underlying hardware register. Rename those constants
   with an HSR_SYSREG prefix (in line with HSR_CP32/64 for 32-bit registers).

This patch has survived thousands of boot loops on a Midway system.

Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
---
My preferred solution here would be for the tools to use an uncached mapping
of guest memory when building the guest, which requires adding a new privmcd
ioctl (a relatively straightforward patch) and plumbing a "cached" flag
through the libxc foreign mapping interfaces (a twisty maze of passage, all
alike).  IMO the libxc side of this patch was not looking suitable for a 4.4
freeze exception, since it was quite large (because we have 4 or more mapping
interfaces in libxc, some of which call back into others).

So I propose this version for 4.4. The uncached mapping solution should be
revisited for a future release.

At the risk of appearing to be going mad:

<speaking hat="submitter">

This bug results in memory corruption bugs in the guest, which mostly manifest
as a failure to boot the guest (subsequent bad behaviour is possible but, I
think, unlikely) the frequency of failures is perhaps 1/10 times. This would
not constitute an awesome release.

Although the patch is large most of it is repetative and mechanical (made
explicit through the use of macros in many cases). The biggest risk is that
one of the registers is not passed through correctly (i.e. the wrong size or
target registers). The ones which Linux uses have been tested and appear to
function OK.  The others might be buggy but this is mitigated through the use
of the same set of macros.

I think the chance of the patch having a bug wrt my understanding of the
hardware behaviour is pretty low. WRT there being bugs in my understanding of
the hardware documentation, I would say middle to low, but I have discussed it
with some folks at ARM and they didn't call me an idiot (in fact pretty much
the same thing has been proposed for KVM).

Overall I think the benefits outweigh the risks.

One thing I'm not sure about is reverting the previous fix in a0035ecc0d82.
It's reasonably recent so reverting it takes us back to a pretty well
understood state in the libraries. The functionality is harmless if
incomplete. I think given the first argument I would lean towards reverting.

</speaking>

<speaking hat="stand in release manager">

OK.

</speaking>

Obviously if you think I'm being to easy on myself please say so.

Actually, if you think my judgement is right I'd appreciate being told so too.
---
 xen/arch/arm/domain.c           |    5 ++
 xen/arch/arm/domain_build.c     |    1 +
 xen/arch/arm/traps.c            |  153 ++++++++++++++++++++++++++++++++++++++-
 xen/arch/arm/vtimer.c           |    6 +-
 xen/include/asm-arm/cpregs.h    |    4 +
 xen/include/asm-arm/processor.h |    2 +-
 xen/include/asm-arm/sysregs.h   |   19 ++++-
 7 files changed, 182 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 124cccf..104d228 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -219,6 +219,11 @@ static void ctxt_switch_to(struct vcpu *n)
     else
         hcr |= HCR_RW;
 
+    if ( n->arch.sctlr & SCTLR_M )
+        hcr &= ~(HCR_TVM|HCR_DC);
+    else
+        hcr |= (HCR_TVM|HCR_DC);
+
     WRITE_SYSREG(hcr, HCR_EL2);
     isb();
 
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 47b781b..bb31db8 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1026,6 +1026,7 @@ int construct_dom0(struct domain *d)
     else
         WRITE_SYSREG(READ_SYSREG(HCR_EL2) | HCR_RW, HCR_EL2);
 #endif
+    WRITE_SYSREG(READ_SYSREG(HCR_EL2) | HCR_DC | HCR_TVM, HCR_EL2);
 
     /*
      * kernel_load will determine the placement of the initrd & fdt in
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 7c5ab19..d00bba3 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1279,6 +1279,23 @@ static void advance_pc(struct cpu_user_regs *regs, union 
hsr hsr)
     regs->pc += hsr.len ? 4 : 2;
 }
 
+static void update_sctlr(uint32_t v)
+{
+    /*
+     * If MMU (SCTLR_M) is now enabled then we must disable HCR.DC
+     * because they are incompatible.
+     *
+     * Once HCR.DC is disabled then we do not need HCR_TVM either,
+     * since it's only purpose was to catch the MMU being enabled.
+     *
+     * Both are set appropriately on context switch but we need to
+     * clear them now since we may not context switch on return to
+     * guest.
+     */
+    if ( v & SCTLR_M )
+        WRITE_SYSREG(READ_SYSREG(HCR_EL2) & ~(HCR_DC|HCR_TVM), HCR_EL2);
+}
+
 static void do_cp15_32(struct cpu_user_regs *regs,
                        union hsr hsr)
 {
@@ -1338,6 +1355,89 @@ static void do_cp15_32(struct cpu_user_regs *regs,
         if ( cp32.read )
            *r = v->arch.actlr;
         break;
+
+/* Passthru a 32-bit AArch32 register which is also 32-bit under AArch64 */
+#define CP32_PASSTHRU32(R...) do {              \
+    if ( cp32.read )                            \
+        *r = READ_SYSREG32(R);                  \
+    else                                        \
+        WRITE_SYSREG32(*r, R);                  \
+} while(0)
+
+/*
+ * Passthru a 32-bit AArch32 register which is 64-bit under AArch64.
+ * Updates the lower 32-bits and clears the upper bits.
+ */
+#define CP32_PASSTHRU64(R...) do {              \
+    if ( cp32.read )                            \
+        *r = (uint32_t)READ_SYSREG64(R);        \
+    else                                        \
+        WRITE_SYSREG64((uint64_t)*r, R);        \
+} while(0)
+
+/*
+ * Passthru a 32-bit AArch32 register which is 64-bit under AArch64.
+ * Updates either the HI ([63:32]) or LO ([31:0]) 32-bits preserving
+ * the other half.
+ */
+#ifdef CONFIG_ARM_64
+#define CP32_PASSTHRU64_HI(R...) do {                   \
+    if ( cp32.read )                                    \
+        *r = (uint32_t)(READ_SYSREG64(R) >> 32);        \
+    else                                                \
+    {                                                   \
+        uint64_t t = READ_SYSREG64(R) & 0xffffffffUL;   \
+        t |= ((uint64_t)(*r)) << 32;                    \
+        WRITE_SYSREG64(t, R);                           \
+    }                                                   \
+} while(0)
+#define CP32_PASSTHRU64_LO(R...) do {                           \
+    if ( cp32.read )                                            \
+        *r = (uint32_t)(READ_SYSREG64(R) & 0xffffffff);         \
+    else                                                        \
+    {                                                           \
+        uint64_t t = READ_SYSREG64(R) & 0xffffffff00000000UL;   \
+        t |= *r;                                                \
+        WRITE_SYSREG64(t, R);                                   \
+    }                                                           \
+} while(0)
+#endif
+
+    /* HCR.TVM */
+    case HSR_CPREG32(SCTLR):
+        CP32_PASSTHRU32(SCTLR_EL1);
+        update_sctlr(*r);
+        break;
+    case HSR_CPREG32(TTBR0_32):   CP32_PASSTHRU64(TTBR0_EL1);      break;
+    case HSR_CPREG32(TTBR1_32):   CP32_PASSTHRU64(TTBR1_EL1);      break;
+    case HSR_CPREG32(TTBCR):      CP32_PASSTHRU32(TCR_EL1);        break;
+    case HSR_CPREG32(DACR):       CP32_PASSTHRU32(DACR32_EL2);     break;
+    case HSR_CPREG32(DFSR):       CP32_PASSTHRU32(ESR_EL1);        break;
+    case HSR_CPREG32(IFSR):       CP32_PASSTHRU32(IFSR32_EL2);     break;
+    case HSR_CPREG32(ADFSR):      CP32_PASSTHRU32(AFSR0_EL1);      break;
+    case HSR_CPREG32(AIFSR):      CP32_PASSTHRU32(AFSR1_EL1);      break;
+    case HSR_CPREG32(CONTEXTIDR): CP32_PASSTHRU32(CONTEXTIDR_EL1); break;
+
+#ifdef CONFIG_ARM_64
+    case HSR_CPREG32(DFAR):       CP32_PASSTHRU64_LO(FAR_EL1);     break;
+    case HSR_CPREG32(IFAR):       CP32_PASSTHRU64_HI(FAR_EL1);     break;
+    case HSR_CPREG32(MAIR0):      CP32_PASSTHRU64_LO(MAIR_EL1);    break;
+    case HSR_CPREG32(MAIR1):      CP32_PASSTHRU64_HI(MAIR_EL1);    break;
+    case HSR_CPREG32(AMAIR0):     CP32_PASSTHRU64_LO(AMAIR_EL1);   break;
+    case HSR_CPREG32(AMAIR1):     CP32_PASSTHRU64_HI(AMAIR_EL1);   break;
+#else
+    case HSR_CPREG32(DFAR):       CP32_PASSTHRU32(DFAR);           break;
+    case HSR_CPREG32(IFAR):       CP32_PASSTHRU32(IFAR);           break;
+    case HSR_CPREG32(MAIR0):      CP32_PASSTHRU32(MAIR0);          break;
+    case HSR_CPREG32(MAIR1):      CP32_PASSTHRU32(MAIR1);          break;
+    case HSR_CPREG32(AMAIR0):     CP32_PASSTHRU32(AMAIR0);         break;
+    case HSR_CPREG32(AMAIR1):     CP32_PASSTHRU32(AMAIR1);         break;
+#endif
+
+#undef CP32_PASSTHRU32
+#undef CP32_PASSTHRU64
+#undef CP32_PASSTHRU64_LO
+#undef CP32_PASSTHRU64_HI
     default:
         printk("%s p15, %d, r%d, cr%d, cr%d, %d @ 0x%"PRIregister"\n",
                cp32.read ? "mrc" : "mcr",
@@ -1351,6 +1451,9 @@ static void do_cp15_64(struct cpu_user_regs *regs,
                        union hsr hsr)
 {
     struct hsr_cp64 cp64 = hsr.cp64;
+    uint32_t *r1 = (uint32_t *)select_user_reg(regs, cp64.reg1);
+    uint32_t *r2 = (uint32_t *)select_user_reg(regs, cp64.reg2);
+    uint64_t r;
 
     if ( !check_conditional_instr(regs, hsr) )
     {
@@ -1368,6 +1471,26 @@ static void do_cp15_64(struct cpu_user_regs *regs,
             domain_crash_synchronous();
         }
         break;
+
+#define CP64_PASSTHRU(R...) do {                                  \
+    if ( cp64.read )                                            \
+    {                                                           \
+        r = READ_SYSREG64(R);                                   \
+        *r1 = r & 0xffffffffUL;                                 \
+        *r2 = r >> 32;                                          \
+    }                                                           \
+    else                                                        \
+    {                                                           \
+        r = (*r1) | (((uint64_t)(*r2))<<32);                    \
+        WRITE_SYSREG64(r, R);                                   \
+    }                                                           \
+} while(0)
+
+    case HSR_CPREG64(TTBR0): CP64_PASSTHRU(TTBR0_EL1); break;
+    case HSR_CPREG64(TTBR1): CP64_PASSTHRU(TTBR1_EL1); break;
+
+#undef CP64_PASSTHRU
+
     default:
         printk("%s p15, %d, r%d, r%d, cr%d @ 0x%"PRIregister"\n",
                cp64.read ? "mrrc" : "mcrr",
@@ -1382,11 +1505,12 @@ static void do_sysreg(struct cpu_user_regs *regs,
                       union hsr hsr)
 {
     struct hsr_sysreg sysreg = hsr.sysreg;
+    register_t *x = select_user_reg(regs, sysreg.reg);
 
     switch ( hsr.bits & HSR_SYSREG_REGS_MASK )
     {
-    case CNTP_CTL_EL0:
-    case CNTP_TVAL_EL0:
+    case HSR_SYSREG_CNTP_CTL_EL0:
+    case HSR_SYSREG_CNTP_TVAL_EL0:
         if ( !vtimer_emulate(regs, hsr) )
         {
             dprintk(XENLOG_ERR,
@@ -1394,6 +1518,31 @@ static void do_sysreg(struct cpu_user_regs *regs,
             domain_crash_synchronous();
         }
         break;
+
+#define SYSREG_PASSTHRU(R...) do {              \
+    if ( sysreg.read )                          \
+        *x = READ_SYSREG(R);                    \
+    else                                        \
+        WRITE_SYSREG(*x, R);                    \
+} while(0)
+
+    case HSR_SYSREG_SCTLR_EL1:
+        SYSREG_PASSTHRU(SCTLR_EL1);
+        update_sctlr(*x);
+        break;
+    case HSR_SYSREG_TTBR0_EL1:      SYSREG_PASSTHRU(TTBR0_EL1);      break;
+    case HSR_SYSREG_TTBR1_EL1:      SYSREG_PASSTHRU(TTBR1_EL1);      break;
+    case HSR_SYSREG_TCR_EL1:        SYSREG_PASSTHRU(TCR_EL1);        break;
+    case HSR_SYSREG_ESR_EL1:        SYSREG_PASSTHRU(ESR_EL1);        break;
+    case HSR_SYSREG_FAR_EL1:        SYSREG_PASSTHRU(FAR_EL1);        break;
+    case HSR_SYSREG_AFSR0_EL1:      SYSREG_PASSTHRU(AFSR0_EL1);      break;
+    case HSR_SYSREG_AFSR1_EL1:      SYSREG_PASSTHRU(AFSR1_EL1);      break;
+    case HSR_SYSREG_MAIR_EL1:       SYSREG_PASSTHRU(MAIR_EL1);       break;
+    case HSR_SYSREG_AMAIR_EL1:      SYSREG_PASSTHRU(AMAIR_EL1);      break;
+    case HSR_SYSREG_CONTEXTIDR_EL1: SYSREG_PASSTHRU(CONTEXTIDR_EL1); break;
+
+#undef SYSREG_PASSTHRU
+
     default:
         printk("%s %d, %d, c%d, c%d, %d %s x%d @ 0x%"PRIregister"\n",
                sysreg.read ? "mrs" : "msr",
diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
index 433ad55..e325f78 100644
--- a/xen/arch/arm/vtimer.c
+++ b/xen/arch/arm/vtimer.c
@@ -240,18 +240,18 @@ static int vtimer_emulate_sysreg(struct cpu_user_regs 
*regs, union hsr hsr)
 
     switch ( hsr.bits & HSR_SYSREG_REGS_MASK )
     {
-    case CNTP_CTL_EL0:
+    case HSR_SYSREG_CNTP_CTL_EL0:
         vtimer_cntp_ctl(regs, &r, sysreg.read);
         if ( sysreg.read )
             *x = r;
         return 1;
-    case CNTP_TVAL_EL0:
+    case HSR_SYSREG_CNTP_TVAL_EL0:
         vtimer_cntp_tval(regs, &r, sysreg.read);
         if ( sysreg.read )
             *x = r;
         return 1;
 
-    case HSR_CPREG64(CNTPCT):
+    case HSR_SYSREG_CNTPCT_EL0:
         return vtimer_cntpct(regs, x, sysreg.read);
 
     default:
diff --git a/xen/include/asm-arm/cpregs.h b/xen/include/asm-arm/cpregs.h
index f0f1d53..508467a 100644
--- a/xen/include/asm-arm/cpregs.h
+++ b/xen/include/asm-arm/cpregs.h
@@ -121,6 +121,8 @@
 #define TTBR0           p15,0,c2        /* Translation Table Base Reg. 0 */
 #define TTBR1           p15,1,c2        /* Translation Table Base Reg. 1 */
 #define HTTBR           p15,4,c2        /* Hyp. Translation Table Base 
Register */
+#define TTBR0_32        p15,0,c2,c0,0   /* 32-bit access to TTBR0 */
+#define TTBR1_32        p15,0,c2,c0,1   /* 32-bit access to TTBR1 */
 #define HTCR            p15,4,c2,c0,2   /* Hyp. Translation Control Register */
 #define VTCR            p15,4,c2,c1,2   /* Virtualization Translation Control 
Register */
 #define VTTBR           p15,6,c2        /* Virtualization Translation Table 
Base Register */
@@ -260,7 +262,9 @@
 #define CPACR_EL1               CPACR
 #define CSSELR_EL1              CSSELR
 #define DACR32_EL2              DACR
+#define ESR_EL1                 DFSR
 #define ESR_EL2                 HSR
+#define FAR_EL1                 HIFAR
 #define FAR_EL2                 HIFAR
 #define HCR_EL2                 HCR
 #define HPFAR_EL2               HPFAR
diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h
index dfe807d..06e638f 100644
--- a/xen/include/asm-arm/processor.h
+++ b/xen/include/asm-arm/processor.h
@@ -342,7 +342,7 @@ union hsr {
 #define HSR_SYSREG_OP0_SHIFT (20)
 #define HSR_SYSREG_OP1_MASK (0x0001c000)
 #define HSR_SYSREG_OP1_SHIFT (14)
-#define HSR_SYSREG_CRN_MASK (0x00003800)
+#define HSR_SYSREG_CRN_MASK (0x00003c00)
 #define HSR_SYSREG_CRN_SHIFT (10)
 #define HSR_SYSREG_CRM_MASK (0x0000001e)
 #define HSR_SYSREG_CRM_SHIFT (1)
diff --git a/xen/include/asm-arm/sysregs.h b/xen/include/asm-arm/sysregs.h
index 48ad07e..0cee0e9 100644
--- a/xen/include/asm-arm/sysregs.h
+++ b/xen/include/asm-arm/sysregs.h
@@ -40,8 +40,23 @@
     ((__HSR_SYSREG_##crm) << HSR_SYSREG_CRM_SHIFT) | \
     ((__HSR_SYSREG_##op2) << HSR_SYSREG_OP2_SHIFT)
 
-#define CNTP_CTL_EL0  HSR_SYSREG(3,3,c14,c2,1)
-#define CNTP_TVAL_EL0 HSR_SYSREG(3,3,c14,c2,0)
+#define HSR_SYSREG_SCTLR_EL1      HSR_SYSREG(3,0,c1, c0,0)
+#define HSR_SYSREG_TTBR0_EL1      HSR_SYSREG(3,0,c2, c0,0)
+#define HSR_SYSREG_TTBR1_EL1      HSR_SYSREG(3,0,c2, c0,1)
+#define HSR_SYSREG_TCR_EL1        HSR_SYSREG(3,0,c2, c0,2)
+#define HSR_SYSREG_AFSR0_EL1      HSR_SYSREG(3,0,c5, c1,0)
+#define HSR_SYSREG_AFSR1_EL1      HSR_SYSREG(3,0,c5, c1,1)
+#define HSR_SYSREG_ESR_EL1        HSR_SYSREG(3,0,c5, c2,0)
+#define HSR_SYSREG_FAR_EL1        HSR_SYSREG(3,0,c6, c0,0)
+#define HSR_SYSREG_MAIR_EL1       HSR_SYSREG(3,0,c10,c2,0)
+#define HSR_SYSREG_AMAIR_EL1      HSR_SYSREG(3,0,c10,c3,0)
+#define HSR_SYSREG_CONTEXTIDR_EL1 HSR_SYSREG(3,0,c13,c0,1)
+
+#define HSR_SYSREG_CNTPCT_EL0     HSR_SYSREG(3,3,c14,c0,0)
+#define HSR_SYSREG_CNTP_CTL_EL0   HSR_SYSREG(3,3,c14,c2,1)
+#define HSR_SYSREG_CNTP_TVAL_EL0  HSR_SYSREG(3,3,c14,c2,0)
+
+
 #endif
 
 #endif
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.