[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 0/7] unsafe big.LITTLE support



On Thu, Mar 08, 2018 at 03:13:50PM +0000, Julien Grall wrote:
>Hi,
>
>On 08/03/18 12:43, Peng Fan wrote:
>>>>>>I am not sure whether this issue cause DomU big/Little not work.
>>>>>
>>>>>Well, I would recommend to speak with NXP whether this errata affects
>>>>>TLB flush for Hypervisor Page-Table or Stage-2 Page-Table.
>>>>
>>>>I tried the following, but no help. Not sure my patch is correct. I
>>>>think it affects stage2 TLB.
>>>>
>>>>--- a/xen/include/asm-arm/arm64/flushtlb.h
>>>>+++ b/xen/include/asm-arm/arm64/flushtlb.h
>>>>@@ -6,7 +6,7 @@ static inline void flush_tlb_local(void)
>>>>   {
>>>>       asm volatile(
>>>>           "dsb sy;"
>>>>-        "tlbi vmalls12e1;"
>>>>+        "tlbi alle1;"
>>>>           "dsb sy;"
>>>>           "isb;"
>>>>           : : : "memory");
>>>>@@ -17,7 +17,7 @@ static inline void flush_tlb(void)
>>>>   {
>>>>       asm volatile(
>>>>           "dsb sy;"
>>>>-        "tlbi vmalls12e1is;"
>>>>+        "tlbi alle1;"
>>>
>>>I am not sure why you drop the innershareable here?
>>Just want to invalid all the tlb, innershareable could be kept.
>>This is not a formal patch, just my trying to narrow the issue.
>
>alle1 will only flush the TLBs of the local processor. The flush will
>not get propagated to the other CPUs of the system. So you definitely
>want this to be innershareable to avoid the other processors
>containing stale TLBs.
>
>>>
>>>>           "dsb sy;"
>>>>           "isb;"
>>>>           : : : "memory");
>>>>@@ -39,7 +39,7 @@ static inline void flush_tlb_all(void)
>>>>   {
>>>>       asm volatile(
>>>>           "dsb sy;"
>>>>-        "tlbi alle1is;"
>>>>+        "tlbi alle1;"
>>>
>>>Ditto.
>>>
>>>>           "dsb sy;"
>>>>           "isb;"
>>>>           : : : "memory");
>>>>--- a/xen/include/asm-arm/arm64/page.h
>>>>+++ b/xen/include/asm-arm/arm64/page.h
>>>>@@ -74,14 +74,16 @@ static inline void flush_xen_data_tlb_local(void)
>>>>   /* Flush TLB of local processor for address va. */
>>>>   static inline void  __flush_xen_data_tlb_one_local(vaddr_t va)
>>>>   {
>>>>-    asm volatile("tlbi vae2, %0;" : : "r" (va>>PAGE_SHIFT) : "memory");
>>>>+       flush_xen_data_tlb_local();
>>>>+    //asm volatile("tlbi vae2, %0;" : : "r" (va>>PAGE_SHIFT) :
>>>>+ "memory");
>>>>   }
>>>>
>>>>   /* Flush TLB of all processors in the inner-shareable domain for
>>>>    * address va. */
>>>>   static inline void __flush_xen_data_tlb_one(vaddr_t va)
>>>>   {
>>>>-    asm volatile("tlbi vae2is, %0;" : : "r" (va>>PAGE_SHIFT) : "memory");
>>>>+       flush_xen_data_tlb_local();
>>>
>>>Why do you replace an innershareable call to a local call? Is it part of the 
>>>errata?
>>
>>No. Just my trying to narrow down.
>
>Then you should keep the innershareable. See above.
>
>>>
>>>>+    //asm volatile("tlbi vae2is, %0;" : : "r" (va>>PAGE_SHIFT) :
>>>>+ "memory");
>>>>   }
>>>>
>>>>>
>>>>>>So wonder has this patchset been tested on Big/Little Hardware?
>>>>>
>>>>>This series only adds facility to report the correct MIDR to the guest.
>>>>>If your platform requires more, then it would be necessary send a patch for
>>>Xen.
>>>>
>>>>Do you have any suggestions? Besides MIDR/ACTLR/Cacheline, are there more
>>>needed?
>>>
>>>Having a bit more details from your side would be helpful. At the moment, I 
>>>have
>>>no clue what's going on.
>>
>>As from the linux kernel commit:
>>     on i.MX8QM TO1.0, there is an issue: the bus width between A53-CCI-A72
>>     is limited to 36bits.TLB maintenance through DVM messages over AR 
>> channel,
>>     some bits will be forced(truncated) to zero as the followings:
>>
>>     ASID[15:12] is forced to 0
>>     VA[48:45] is forced to 0
>>     VA[44:41] is forced to 0
>>     VA[39:36] is forced to 0
>>
>>     This issue will result in the TLB aintenance across the clusters not 
>> working
>>     as expected due to some VA and ASID bits get truncated and forced to be 
>> zero.
>>
>>     The SW workaround is: use the vmalle1is if VA larger than 36bits or
>>     ASID[15:12] is not zero, otherwise, we use original TLB maintenance path.
>>
>>When doing tlb maintenance through DVM from A53 to A72, some bits are forced
>>to 0, this means TLB may not be really invalidated from A72 perspective.
>>
>>Currently I am trying a domu with big/little capability, but not allowing 
>>big/little vcpu
>>migration.
>>
>>I am not sure whether this hardware issue impacts DomU or not. Or it is 
>>software issue.
>>As you could see dom0 has 6 vcpus, I did a stress test and not found issue on 
>>dom0.
>
>There are a major difference between Dom0 and DomU in your setup.
>Dom0 vCPUs are pinned to a specific pCPU, so they can't move around.
>For DomU, each vCPU are pinned to a set of pCPUs, so they can move
>around.
>
>But, did you check the DomU has the workaround enabled? I am asking
>that because it looks like to me the way to detect the workaround is
>based on a device (scu) and not processor. So I am not convinced that
>DomU is actually using your workaround.

Just checked this. Because xen toolstack create device tree
with compatible "compatible = "xen,xenvm-4.10", "xen,xenvm";",
but the linux code use "fsl,imx8qm" to detect soc, then call scu
to get revision of chip.

After add an entry in linux side "{ .compatible = "xen,xenvm", .data = 
&imx8qm_soc_data, },"
It seems works. Passed a map/unmap stress test which easily fail without
the tlb workaround.

Wonder is it ok to specific machine compatible in domu.cfg and let xen stack
use this machine compatible other than "xen,xenvm"? Is this acceptable by 
community?
Also in domu kernel booting, there is waring.
[    0.201323] Invalid sched_group_energy for CPU3
[    0.201341] Invalid sched_group_energy for Cluster3
[    0.201353] Invalid sched_group_energy for CPU2
[    0.201365] Invalid sched_group_energy for Cluster2
[    0.201376] Invalid sched_group_energy for CPU1
[    0.201387] Invalid sched_group_energy for Cluster1
[    0.201398] Invalid sched_group_energy for CPU0
[    0.201409] Invalid sched_group_energy for Cluster0

This is because no cpu0/1/2/3 is not under cluster node in dts. As
I am using big/little guest, I think need create two cluster nodes
,one for vcpu0-1, the other for vcpu2-3. But this also needs xen toolstack
change (:

Thanks,
Peng.

>
>Cheers,
>
>-- 
>Julien Grall

-- 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.