|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC PATCH 4/4] arm/mem_access: Add software guest-page-table walk
Hi Julien,
On 05/02/2017 05:17 PM, Julien Grall wrote:
> Hi Sergej,
>
> On 30/04/17 20:48, Sergej Proskurin wrote:
>> The function p2m_mem_access_check_and_get_page in mem_access.c
>> translates a gva to an ipa by means of the hardware functionality
>> implemented in the function gva_to_ipa. If mem_access is active,
>> hardware-based gva to ipa translation might fail, as gva_to_ipa uses the
>> guest's translation tables, access to which might be restricted by the
>> active VTTBR. To address this issue, we perform the gva to ipa
>> translation in software.
>>
>> Signed-off-by: Sergej Proskurin <proskurin@xxxxxxxxxxxxx>
>> ---
>> Cc: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>
>> Cc: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>
>> Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>
>> Cc: Julien Grall <julien.grall@xxxxxxx>
>> ---
>> xen/arch/arm/mem_access.c | 140
>> +++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 139 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/arm/mem_access.c b/xen/arch/arm/mem_access.c
>> index 04b1506b00..352eb6073f 100644
>> --- a/xen/arch/arm/mem_access.c
>> +++ b/xen/arch/arm/mem_access.c
>> @@ -20,6 +20,7 @@
>> #include <xen/monitor.h>
>> #include <xen/sched.h>
>> #include <xen/vm_event.h>
>> +#include <xen/domain_page.h>
>> #include <public/vm_event.h>
>> #include <asm/event.h>
>>
>> @@ -90,6 +91,129 @@ static int __p2m_get_mem_access(struct domain *d,
>> gfn_t gfn,
>> return 0;
>> }
>>
>> +static int
>> +p2m_gva_to_ipa(struct p2m_domain *p2m, vaddr_t gva,
>> + paddr_t *ipa, unsigned int flags)
>
> I don't think this function belongs to mem_access.c. It would be
> better in p2m.c. Also, the name is a bit confusing, a user would not
> know the difference between p2m_gva_to_ipa and gva_to_ipa.
Fair enough, thank you.
>
>> +{
>> + int level=0, t0_sz, t1_sz;
>
> Coding style level = 0.
>
> Also please use unsigned int for level.
>
> t0_sz and t1_sz should be stay signed.
>
>> + unsigned long t0_max, t1_min;
>> + lpae_t pte, *table;
>> + mfn_t root_mfn;
>> + uint64_t ttbr;
>> + uint32_t sctlr = READ_SYSREG(SCTLR_EL1);
>> + register_t ttbcr = READ_SYSREG(TCR_EL1);
>
> So you are assuming that the current vCPU is running, right? If so,
> this should be clarified in the commit message and in a comment above
> the function.
I will state that in a comment, thank you.
>
> Also s/ttbcr/tcr/
>
>> + struct domain *d = p2m->domain;
>> +
>> + const unsigned int offsets[4] = {
>> +#ifdef CONFIG_ARM_64
>> + zeroeth_table_offset(gva),
>> +#endif
>> + first_table_offset(gva),
>> + second_table_offset(gva),
>> + third_table_offset(gva)
>> + };
>
> Offsets are based on the granularity of Xen (currently 4KB). However
> the guests can support 4KB, 16KB, 64KB. Please handle everything
> correctly.
True, ARM ist quite flexible. Yet, I have not seen an OS implementation
that is supported by Xen and makes use of page sizes other than 4KB and
their supersets (2/4MB, 1GB) (please let me know, if you know some),
which is why I doubt that we need that. Please let me know why you think
we need that kind of flexibility in software?
If you should nevertheless insist on that functionality, I would include
it in the next patch and try not to blow up the code too much.
>
>> +
>> + const paddr_t masks[4] = {
>> +#ifdef CONFIG_ARM_64
>> + ZEROETH_SIZE - 1,
>> +#endif
>> + FIRST_SIZE - 1,
>> + SECOND_SIZE - 1,
>> + THIRD_SIZE - 1
>> + };
>> +
>> + /* If the MMU is disabled, there is no need to translate the
>> gva. */
>> + if ( !(sctlr & SCTLR_M) )
>> + {
>> + *ipa = gva;
>> +
>> + return 0;
>> + }
>> +
>> + if ( is_32bit_domain(d) )
>> + {
>> + /*
>> + * XXX: We do not support 32-bit domain translation table
>> walks for
>> + * domains using the short-descriptor translation table
>> format, yet.
>> + */
>
> Debian ARM 32bit is using short-descriptor translation table. So are
> you suggesting that memaccess will not correctly with Debian guest?
>
Yes, as stated in the comment, the current implementation does not
support the short-descriptor translation table format. As this is an RFC
patch, I wanted to see your opinion on that functionality in general
before implementing all corner cases of the ARM architecture.
As mentioned in my previous reply in patch (patch 2/4), I would prefer
to separate the long-descriptor from the short-descriptor translation in
the future implementation.
>> + if ( !(ttbcr & TTBCR_EAE) )
>
> See my comment on patch #2 about the naming convention.
>
Done, thanks.
>> + return -EFAULT;
>> +
>> +#ifdef CONFIG_ARM_64
>> + level = 1;
>
> This sounds wrong to me. The first lookup level is determined by the
> value of T0SZ and T1SZ (see B3-1352 in ARM DDI 0406C.c).
>
> For the AArch64 version see D4.2.5 in ARM DDI 0487A.k_iss10775.
>
Thank you for the hint.
>> +#endif
>> + }
>> +
>> +#ifdef CONFIG_ARM_64
>> + if ( is_64bit_domain(d) )
>> + {
>> + /* Get the max GVA that can be translated by TTBR0. */
>> + t0_sz = (ttbcr >> TCR_T0SZ_SHIFT) & TCR_SZ_MASK;
>> + t0_max = (1UL << (64 - t0_sz)) - 1;
>> +
>> + /* Get the min GVA that can be translated by TTBR1. */
>> + t1_sz = (ttbcr >> TCR_T1SZ_SHIFT) & TCR_SZ_MASK;
>> + t1_min = ~0UL - (1UL << (64 - t1_sz)) + 1;
>> + }
>> + else
>> +#endif
>> + {
>> + /* Get the max GVA that can be translated by TTBR0. */
>> + t0_sz = (ttbcr >> TCR_T0SZ_SHIFT) & TTBCR_SZ_MASK;
>
> See my comment on patch #2 for the naming convention.
>
>> + t0_max = (1U << (32 - t0_sz)) - 1;
>> +
>> + /* Get the min GVA that can be translated by TTBR1. */
>> + t1_sz = (ttbcr >> TCR_T1SZ_SHIFT) & TTBCR_SZ_MASK;
>> + t1_min = ~0U - (1U << (32 - t1_sz)) + 1;
>
> 1U << (32 - t1_sz) will not fit in an unsigned long if t1_sz = 0.
An unsigned long long should fix that (however not in the 64-bit case),
thanks.
>
> Also, this code looks wrong to me. Looking at B3.6.4 in DDI 0406C.c,
> you don't handle properly t0_max and t1_min when T0SZ or T1SZ is 0.
>
> I think it would be worth for you to have a look to the pseudo-code in
> the ARM ARM for TranslationTableWalk (see B3.19.3 in ARM DDI 0406C.c
> and J1-5295 in ARM DDI 0487A.k_iss10775) which gives you all the
> details for doing properly translation table walk.
I will have a deeper look and adopt parts of the given pseudocode in my
implementation. You are right: there might be an off-by-one issue in the
computation above.
>
>> + }
>> +
>> + if ( t0_max >= gva )
>> + /* Use TTBR0 for GVA to IPA translation. */
>> + ttbr = READ_SYSREG64(TTBR0_EL1);
>> + else if ( t1_min <= gva )
>> + /* Use TTBR1 for GVA to IPA translation. */
>> + ttbr = READ_SYSREG64(TTBR1_EL1);
>> + else
>> + /* GVA out of bounds of TTBR(0|1). */
>> + return -EFAULT;
>> +
>> + /* Bits [63..48] might be used by an ASID. */
>> + root_mfn = p2m_lookup(d, _gfn(paddr_to_pfn(ttbr &
>> ((1ULL<<48)-1))), NULL);
>
> Please don't hardcode the mask.
>
>> +
>> + /* Check, whether TTBR holds a valid address. */
>> + if ( mfn_eq(root_mfn, INVALID_MFN) )
>> + return -EFAULT;
>> +
>> + table = map_domain_page(root_mfn);
>
> Nothing prevents the guest to give back the page table whilst you
> browsing it. You may want to have a look to patch "ARM: introduce
> vgic_access_guest_memory()" [1] to generalize the function and avoid
> re-inventing the wheel.
>
I will have a look at this commit, thanks.
>> +
>> + for ( ; ; level++ )
>> + {
>> + pte = table[offsets[level]];
>
> You likely want to use ACCESS_ONCE here to prevent the compiler to
> read the pte twice from the memory.
I tried that, however the compiler did not seem to like it. I will try
again and let you know if I got that to work.
>
>> +
>> + if ( level == 3 || !pte.walk.valid || !pte.walk.table )
>> + break;
>> +
>> + unmap_domain_page(table);
>> +
>> + root_mfn = p2m_lookup(d, _gfn(pte.walk.base), NULL);
>
> No check on root_mfn?
Done, thanks.
>
>> + table = map_domain_page(root_mfn);
>> + }
>> +
>> + unmap_domain_page(table);
>> +
>> + if ( !pte.walk.valid )
>> + return -EFAULT;
>> +
>> + /* Make sure the entry holds the requested access attributes. */
>> + if ( ((flags & GV2M_WRITE) == GV2M_WRITE) && pte.pt.ro )
>> + return -EFAULT;
>
> IHMO, this function should return the access attribute of the page and
> let the caller decides what to do. This would make this function more
> generic.
>
Makes sense. This would make the function indeed more generic. Since
there are already multiple gpt-walk stubs throughout the codebase, we
could make use of only one.
>> +
>> + *ipa = pfn_to_paddr(pte.walk.base) | (gva & masks[level]);
>> +
>> + return 0;
>> +}
>> +
>> +
>> /*
>> * If mem_access is in use it might have been the reason why
>> get_page_from_gva
>> * failed to fetch the page, as it uses the MMU for the permission
>> checking.
>> @@ -109,9 +233,23 @@ p2m_mem_access_check_and_get_page(vaddr_t gva,
>> unsigned long flag,
>> struct page_info *page = NULL;
>> struct p2m_domain *p2m = &v->domain->arch.p2m;
>>
>> + ASSERT(p2m->mem_access_enabled);
>> +
>
> Why this ASSERT has been added?
The function p2m_mem_access_check_and_get_page is called only if
get_page_from_gva fails and mem_access is active. I can add a comment
and move this part into an individual commit.
>
>> rc = gva_to_ipa(gva, &ipa, flag);
>> +
>> + /*
>> + * In case mem_access is active, hardware-based gva_to_ipa
>> translation
>> + * might fail. Since gva_to_ipa uses the guest's translation
>> tables, access
>> + * to which might be restricted by the active VTTBR, we perform
>> a gva to
>> + * ipa translation in software.
>> + */
>> if ( rc < 0 )
>> - goto err;
>> + if ( p2m_gva_to_ipa(p2m, gva, &ipa, flag) < 0 )
>> + /*
>> + * The software gva to ipa translation can still fail,
>> if the the
>> + * gva is not mapped or does not hold the requested
>> access rights.
>> + */
>> + goto err;
>
> Rather falling back, why don't we do software page table walk every time?
A software page table walk would (presumably) take more time to
translate the gva. Do we want that? Also, I am not sure what you meant
by "falling back" at this point. Thank you.
>
>>
>> gfn = _gfn(paddr_to_pfn(ipa));
>>
>>
>
> Cheers,
>
> [1] https://patchwork.kernel.org/patch/9676291/
>
Cheers,
~Sergej
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |