[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 5/8] arm/vm_event: get/set registers





On 01/06/16 20:38, Julien Grall wrote:
Hi Tamas,

On 01/06/16 19:21, Tamas K Lengyel wrote:
On Wed, Jun 1, 2016 at 5:24 AM, Julien Grall <julien.grall@xxxxxxx>
wrote:
Hi,


On 01/06/16 09:41, Jan Beulich wrote:

On 31.05.16 at 18:28, <tamas@xxxxxxxxxxxxx> wrote:

On May 31, 2016 01:48, "Jan Beulich" <JBeulich@xxxxxxxx> wrote:


On 30.05.16 at 21:47, <tamas@xxxxxxxxxxxxx> wrote:

On Mon, May 30, 2016 at 5:50 AM, Jan Beulich <JBeulich@xxxxxxxx>
wrote:

On 30.05.16 at 00:37, <tamas@xxxxxxxxxxxxx> wrote:

+struct vm_event_regs_arm32 {
+    uint32_t r0_usr;
+    uint32_t r1_usr;
+    uint32_t r2_usr;
+    uint32_t r3_usr;
+    uint32_t r4_usr;
+    uint32_t r5_usr;
+    uint32_t r6_usr;
+    uint32_t r7_usr;
+    uint32_t r8_usr;
+    uint32_t r9_usr;
+    uint32_t r10_usr;
+    uint32_t r12_usr;
+    uint32_t lr_usr;
+    uint32_t fp;
+    uint32_t pc;
+    uint32_t sp_usr;
+    uint32_t sp_svc;
+    uint32_t spsr_svc;
+};


It would seem more natural for the "ordinary" registers to be
arranged in the numerical sequence, i.e. fp, r12, sp, lr, pc.


Not sure I follow.


For one it is quite natural for someone looking at a sequence of
register values to assume / expect them to be in natural order.
And then, having them in natural (numeric) order allows e.g.
extracting the register identifying bits from an instruction to use
them as an array index into (part of) this structure.

(For some background: I've been bitten a number of times by
people sorting x86 registers alphabetically instead or naturally,
i.e. EAX, EBX, ECX, EDX instead of EAX, ECX, EDX, EBX).


I see, however I believe that would be a very careless use of this
struct
from the user as the register sizes are not even necessarily match the
architecture. For example we only define the 64bit x86 registers, so
trying
to access it as an array of 32bit registers wouldn't work at all.
Plus we
are not doing a full set of registers, and I rather not imply that
every
element in the "natural sequence" is present. It may be, it may be
not.


Once an ABI is set in stone, and if that ABI allows for optimizations
(by consumers) like the one mentioned, I don't think this would be
careless use. The resulting code is very clearly much more efficient
than e.g. a switch() statement with a case label for each and every
register. Well, yes, I already hear the "memory is cheap and hence
code size doesn't matter" argument, but as said elsewhere quite
recently I don't buy this.


I agree with Jan here.

ARM64 has 31 general purposes registers (x0-x30). The switch to find a
register based on the index will be quite big.

If you order the register and provide all the general purposes
registers (x0
- x30), you will be able to replace by a single line (for instance see
select_user_reg in arch/arm/traps.c).

The issue is that the x86 vm_event struct right now has 32*uint64_t
size. So if we would want to transmit all ARM64 GPRs + cpsr, PC and
TTBR0/1 we would end up growing this structure beyond what it is
currently. I want to avoid that as it affects both ARM32 and x86
introspection applications as well as now we can hold fewer events on
the ring. I would say it is better to avoid that then potentially
saving some on a switch in ARM64 introspection applications.

The design choice made for x86 should not impact the ARM design (and
vice-versa). There are key structures in the public interface which
differ between x86 and ARM (see arch_vcpu_info and arch_shared_info).
And this is fine because Xen is not meant to run an x86 guest on an ARM
hypervisor.

As far as I can tell, we currently support VM_EVENT_REASON_MEM_ACCESS
and VM_EVENT_REASON_GUEST_REQUEST. So technically the structure is set
in stone. However, we have an interface version that could be bumped, we
can still decide what is the sensible choice.

With your suggestions only a part of the general-purpose registers will
be present in the vm_event. I understand that the ring has a limited
size. If I counted correctly, the current size of the vm_event structure
is 304 bytes. So 15 vm_event slots are available.

If we grow the structure for ARM64, i.e 3 64-bits registers (PC, TTBR0,
TTBR1) and 1 32-bit register (CPSR). Which mean 311 bytes, i.e 13

Sorry I miscalculated. So it should be:
  - 332 bytes
  - 12 slots

vm_event slots.

If the vm event subsystem is under pressure, I admit that 2 slots could
be a lot. However, as not all the GPRs will be available in the
structure you may have to fetch them, through XEN_DOMCTL_getvcpucontext
I guess (?). The impact of the introspection to the guest will be
significant.

I cannot tell how often this will be the case, but I can tell you a
compiler can do anything with the register allocation. I.e it could
decide to privilege allocation in registers x20-x30 (because big number
are nicer).

If you are still concern about the pressure on the ring page, Razvan
suggested to support multi-ring page.

Regards,


--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.