[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [ARM] gvirt_to_maddr fails when DomU is created





On 11/28/18 6:10 PM, Volodymyr Babchuk wrote:
Hi Julien,

Hi Volodymyr,

On Tue, 27 Nov 2018 at 21:40, Julien Grall <julien.grall@xxxxxxx> wrote:
After creating domU, I'm seeing lots of this messages from hypervisor:

(XEN) p2m.c:1442: d1v0: gvirt_to_maddr failed va=0xffff80000efc7f0f
flags=0x1 par=0x809
(XEN) p2m.c:1442: d1v0: gvirt_to_maddr failed va=0xffff80000efc7f00
flags=0x1 par=0x809
(XEN) p2m.c:1442: d1v0: gvirt_to_maddr failed va=0xffff80000efc7f0f
flags=0x1 par=0x809

Interestingly, I'm getting them from both Dom0 and DomU:

(XEN) p2m.c:1442: d0v0: gvirt_to_maddr failed va=0xffff80003efd7f0f
flags=0x1 par=0x809
(XEN) p2m.c:1442: d1v0: gvirt_to_maddr failed va=0xffff80000efc7f0f
flags=0x1 par=0x809

But only after DomU is created.

I attached GDB and found that this is caused by update_runstate_area:

(gdb) bt
#0  get_page_from_gva (v=0x80005dbe2000, v@entry=0x22f2c8 <schedule+1236>,
      va=va@entry=18446603337277996815, flags=flags@entry=1) at p2m.c:1440
#1  0x000000000024e320 in translate_get_page (write=true, linear=true,
addr=18446603337277996815,
      info=...) at guestcopy.c:37
#2  copy_guest (buf=buf@entry=0x80005dbe20d7,
addr=addr@entry=18446603337277996815, len=len@entry=1,
      info=..., flags=flags@entry=6) at guestcopy.c:69
#3  0x000000000024e45c in raw_copy_to_guest (to=to@entry=0xffff80003efd7f0f,
      from=from@entry=0x80005dbe20d7, len=len@entry=1) at guestcopy.c:110
#4  0x00000000002497b4 in update_runstate_area
(v=v@entry=0x80005dbe2000) at domain.c:287
#5  0x0000000000249eb8 in context_switch (prev=prev@entry=0x80005dbe2000,
      next=next@entry=0x80005bf3c000) at domain.c:344
#6  0x000000000022f2c8 in schedule () at schedule.c:1583
#7  0x0000000000232c10 in __do_softirq
(ignore_mask=ignore_mask@entry=0) at softirq.c:50
#8  0x0000000000232ca4 in do_softirq () at softirq.c:64
#9  0x0000000000258254 in leave_hypervisor_tail () at traps.c:2302

This issue is encountered on QEMU-ARMv8. Dom0 kernel is Linux 4.19.0
My XEN master is at d8ffac1f7 "xen/arm: gic: Remove duplicated comment
in do_sgi"

The same setup worked perfectly with Xen 4.10.2

The message is only printed in debug build. Do you have CONFIG_DEBUG
enabled?

Yes, I do.

update_runstate_area is using a guest virtual address to update the vCPU
runstate. It blindly assumes the vCPU runstate will always be mapped in
stage-1 page-tables. However, if KPTI (Kernel Page Table Isolation) is
enabled the kernel address space (and therefore the vCPU runstate) will
not be mapped when running at EL0.
I tried to disable KPTI for both Dom0 and DomU kernels (with nopti
option) and this didn't helped at all.

nopti is x86 specific. So did you mean kpti=no?

I can verify, that kernel does not print "CPU features: detected:
Kernel page table isolation (KPTI)", but that's all.

So you should see something similar to:

CPU features: kernel page table isolation forced OFF by command line option

Correct?


Strangely, I'm starting to see this messages only after I create DomU.
If this really would be triggered
by KPTI, then I should see those errors right from the boot, right?

Not necessarily, you need to have a context switch happening while you are at EL0 to trigger the issue. That's unlikely going to happen if you have less vCPUs running than available pCPUs. There are more chance to happen when starting you DomU.

Anyway, it is quite interesting because I also managed to reproduce it with KPTI turned off (i.e kpti=no).

The PAR_EL1 contains 0x809 which tells us this is a level 0 translation fault when walking stage-1. So the virtual address is definitely not mapped. I added some code to dump the guest vCPU registers on the fault. All the fault happen at EL0 so somehow the address is getting unmapped when running at EL0.

I have the feeling that kpti=no does not fully disable the feature. I will have the chat tomorrow with my team to see how the option should be behave.

In any case, passing a virtual address is just the wrong things to do as the guest is free to do whatever it wants in term of page-tables. The discussion in this thread is an example of what could go wrong :).

So we still want to fix the hypercall no matter the outcome of the discussion regarding kpti=no.

Finally, for the sake of clarification turning off kpti=no is not recommended unless you really trust your userspace applications. I was interested to know whether the problem was related to the feature or something different :).

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.