[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 01/13] xsplice: Design document (v5).



>>> On 14.01.16 at 22:46, <konrad.wilk@xxxxxxxxxx> wrote:
> +## Patching code
> +
> +The first mechanism to patch that comes in mind is in-place replacement.
> +That is replace the affected code with new code. Unfortunately the x86
> +ISA is variable size which places limits on how much space we have available
> +to replace the instructions. That is not a problem if the change is smaller
> +than the original opcode and we can fill it with nops. Problems will
> +appear if the replacement code is longer.
> +
> +The second mechanism is by replacing the call or jump to the
> +old function with the address of the new function.
> +
> +A third mechanism is to add a jump to the new function at the
> +start of the old function. N.B. The Xen hypervisor implements the third
> +mechanism.

Are we, btw, convinced that all functions will be at least 5 bytes
long? Granted it's not very likely for a smaller function to be buggy
and needing patching, but you never know... (Ah, just found a
respective note at the very end.)

> +### Example of trampoline and in-place splicing
> +
> +As example we will assume the hypervisor does not have XSA-132 (see
> +*domctl/sysctl: don't leak hypervisor stack to toolstacks*
> +4ff3449f0e9d175ceb9551d3f2aecb59273f639d) and we would like to binary patch
> +the hypervisor with it. The original code looks as so:
> +
> +<pre>
> +   48 89 e0                  mov    %rsp,%rax  
> +   48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax  
> +</pre>
> +
> +while the new patched hypervisor would be:
> +
> +<pre>
> +   48 c7 45 b8 00 00 00 00   movq   $0x0,-0x48(%rbp)  
> +   48 c7 45 c0 00 00 00 00   movq   $0x0,-0x40(%rbp)  
> +   48 c7 45 c8 00 00 00 00   movq   $0x0,-0x38(%rbp)  
> +   48 89 e0                  mov    %rsp,%rax  
> +   48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax  
> +</pre>
> +
> +This is inside the arch_do_domctl. This new change adds 21 extra
> +bytes of code which alters all the offsets inside the function. To alter
> +these offsets and add the extra 21 bytes of code we might not have enough
> +space in .text to squeeze this in.
> +
> +As such we could simplify this problem by only patching the site
> +which calls arch_do_domctl:
> +
> +<pre>
> +<do_domctl>:  
> + e8 4b b1 05 00          callq  ffff82d08015fbb9 <arch_do_domctl>  
> +</pre>
> +
> +with a new address for where the new `arch_do_domctl` would be (this
> +area would be allocated dynamically).
> +
> +Astute readers will wonder what we need to do if we were to patch 
> `do_domctl`
> +- which is not called directly by hypervisor but on behalf of the guests via
> +the `compat_hypercall_table` and `hypercall_table`.
> +Patching the offset in `hypercall_table` for `do_domctl:
> +(ffff82d080103079 <do_domctl>:)
> +<pre>
> +
> + ffff82d08024d490:   79 30  
> + ffff82d08024d492:   10 80 d0 82 ff ff   
> +
> +</pre>
> +with the new address where the new `do_domctl` is possible. The other
> +place where it is used is in `hvm_hypercall64_table` which would need
> +to be patched in a similar way. This would require an in-place splicing
> +of the new virtual address of `arch_do_domctl`.
> +
> +In summary this example patched the callee of the affected function by
> + * allocating memory for the new code to live in,
> + * changing the virtual address in all the functions which called the old
> +   code (computing the new offset, patching the callq with a new callq).
> + * changing the function pointer tables with the new virtual address of
> +   the function (splicing in the new virtual address). Since this table
> +   resides in the .rodata section we would need to temporarily change the
> +   page table permissions during this part.
> +
> +
> +However it has severe drawbacks - the safety checks which have to make sure
> +the function is not on the stack - must also check every caller. For some
> +patches this could mean - if there were an sufficient large amount of
> +callers - that we would never be able to apply the update.

While this is an issue, didn't we settle on doing the patching without
any deep call stacks only? Also why would that problem not apply to
to trampoline example right below this section (after all you can't
just go and patch a multi-byte instruction without making sure no
CPU is about to execute that code).

> +As such having the payload in an ELF file is the sensible way. We would be
> +carrying the various sets of structures (and data) in the ELF sections under
> +different names and with definitions. The prefix for the ELF section name
> +would always be: *.xsplice* to match up to the names of the structures.

Note that the use of * here is confusing - do you mean them to
represent quotes (matching up with this supposedly being a prefix)
or as wild card?

> +The xSplice core code loads the payload as a standard ELF binary, relocates 
> it
> +and handles the architecture-specifc sections as needed. This process is much
> +like what the Linux kernel module loader does.
> +
> +The payload contains a section (xsplice_patch_func) with an array of 
> structures
> +describing the functions to be patched:
> +<pre>
> +struct xsplice_patch_func {  
> +    const char *name;  
> +    unsigned long new_addr;  
> +    const unsigned long old_addr;  

Stray(?, and slightly confusing) const here...

> +    uint32_t new_size;  
> +    const uint32_t long old_size;  

... and here. This one also leaves the reader guess about the
actual type meant.

Also is using "long" here really a good idea? Shouldn't we rather use
fixed width or ELF types?

> +* `old_size` and `new_size` contain the sizes of the respective functions in 
> bytes.
> +   The value **MUST** not be zero.

For old_size I can see this, but can't new_size being zero "NOP out
the entire code sequence"?

> +### XEN_SYSCTL_XSPLICE_UPLOAD (0)
> +
> +Upload a payload to the hypervisor. The payload is verified
> +against basic checks and if there are any issues the proper return code
> +will be returned. The payload is not applied at this time - that is
> +controlled by *XEN_SYSCTL_XSPLICE_ACTION*.
> +
> +The caller provides:
> +
> + * A `struct xen_xsplice_id` called `id` which has the unique id.
> + * `size` the size of the ELF payload (in bytes).
> + * `payload` the virtual address of where the ELF payload is.
> +
> +The `id` could be an UUID that stays fixed forever for a given
> +payload. It can be embedded into the ELF payload at creation time
> +and extracted by tools.
> +
> +The return value is zero if the payload was succesfully uploaded.
> +Otherwise an XEN_EXX return value is provided. Duplicate `id` are not 
> supported.

Ca you, here and further down, make it unambiguous that the error
value returned is negative (or, as with the rc structure fields below,
maybe indeed positive)?

> +### XEN_SYSCTL_XSPLICE_LIST (2)
> +
> +Retrieve an array of abbreviated status and names of payloads that are 
> loaded in the
> +hypervisor.
> +
> +The caller provides:
> +
> + * `version`. Initially (on first hypercall) *MUST* be zero.
> + * `idx` index iterator. On first call *MUST* be zero, subsequent calls 
> varies.
> + * `nr` the max number of entries to populate.
> + * `pad` - *MUST* be zero.
> + * `status` virtual address of where to write `struct xen_xsplice_status`
> +   structures. Caller *MUST* allocate up to `nr` of them.
> + * `id` - virtual address of where to write the unique id of the payload.
> +   Caller *MUST* allocate up to `nr` of them. Each *MUST* be of
> +   **XEN_XSPLICE_NAME_SIZE** size.
> + * `len` - virtual address of where to write the length of each unique id
> +   of the payload. Caller *MUST* allocate up to `nr` of them. Each *MUST* be
> +   of sizeof(uint32_t) (4 bytes).
> +
> +If the hypercall returns an positive number, it is the number (up to `nr`)
> +of the payloads returned, along with `nr` updated with the number of 
> remaining
> +payloads, `version` updated (it may be the same across hypercalls. If it
> +varies the data is stale and further calls could fail). The `status`,
> +`id`, and `len`' are updated at their designed index value (`idx`) with
> +the returned value of data.
> +
> +If the hypercall returns E2BIG the `count` is too big and should be
> +lowered.

s/count/nr/ ?

> +This operation can be preempted by the hypercall returning XEN_EAGAIN.
> +Retry.

Why is this necessary when preemption via the 'nr' field is already
possible?

Also what meaning would have zero as a return value here (not
spelled out above afaics)?

> +struct xen_sysctl_xsplice_list {  
> +    uint32_t version;                       /* IN/OUT: Initially *MUST* be 
> zero.  
> +                                               On subsequent calls reuse 
> value.  
> +                                               If varies between calls, we 
> are  
> +                                             * getting stale data. */  
> +    uint32_t idx;                           /* IN/OUT: Index into array. */  
> +    uint32_t nr;                            /* IN: How many status, id, and 
> len  
> +                                               should fill out.  
> +                                               OUT: How many payloads left. 
> */  
> +    uint32_t pad;                           /* IN: Must be zero. */  
> +    XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have 
> enough  
> +                                               space allocate for n of them. 
> */  
> +    XEN_GUEST_HANDLE_64(char) id;           /* OUT: Array of ids. Each 
> member  
> +                                               MUST XEN_XSPLICE_NAME_SIZE in 
> size.  
> +                                               Must have n of them. */  
> +    XEN_GUEST_HANDLE_64(uint32) len;        /* OUT: Array of lengths of ids. 
>  
> +                                               Must have n of them. */  

For all three, perhaps better refer to 'nr' instead of 'n'?

> +Note that patching functions that copy to or from guest memory requires
> +to support alternative support. This is due to SMAP (specifically *stac*
> +and *clac* operations) which is enabled on Broadwell and later architectures.

I think it should be emphasized that this is an example, and there
are other uses of alternative instructions (and likely more to come).

> +The v2 design must also have a mechanism for:
> +
> + *  An dependency mechanism for the payloads. To use that information to 
> load:
> +    - The appropiate payload. To verify that payload is built against the
> +      hypervisor. This can be done via the `build-id`
> +      or via providing an copy of the old code - so that the hypervisor can
> +       verify it against the code in memory.

I was missing this above - do you really intend to do patching without
at least one of those two safety measures?

> +## Signature checking requirements.
> +
> +The signature checking requires that the layout of the data in memory
> +**MUST** be same for signature to be verified. This means that the payload
> +data layout in ELF format **MUST** match what the hypervisor would be
> +expecting such that it can properly do signature verification.
> +
> +The signature is based on the all of the payloads continuously laid out
> +in memory. The signature is to be appended at the end of the ELF payload
> +prefixed with the string '~Module signature appended~\n', followed by
> +an signature header then followed by the signature, key identifier, and 
> signers
> +name.
> +
> +Specifically the signature header would be:
> +
> +<pre>
> +#define PKEY_ALGO_DSA       0  
> +#define PKEY_ALGO_RSA       1  
> +
> +#define PKEY_ID_PGP         0 /* OpenPGP generated key ID */  
> +#define PKEY_ID_X509        1 /* X.509 arbitrary subjectKeyIdentifier */  
> +
> +#define HASH_ALGO_MD4          0  
> +#define HASH_ALGO_MD5          1  
> +#define HASH_ALGO_SHA1         2  
> +#define HASH_ALGO_RIPE_MD_160  3  
> +#define HASH_ALGO_SHA256       4  
> +#define HASH_ALGO_SHA384       5  
> +#define HASH_ALGO_SHA512       6  
> +#define HASH_ALGO_SHA224       7  
> +#define HASH_ALGO_RIPE_MD_128  8  
> +#define HASH_ALGO_RIPE_MD_256  9  
> +#define HASH_ALGO_RIPE_MD_320 10  
> +#define HASH_ALGO_WP_256      11  
> +#define HASH_ALGO_WP_384      12  
> +#define HASH_ALGO_WP_512      13  
> +#define HASH_ALGO_TGR_128     14  
> +#define HASH_ALGO_TGR_160     15  
> +#define HASH_ALGO_TGR_192     16  
> +
> +
> +struct elf_payload_signature {  
> +     u8      algo;           /* Public-key crypto algorithm PKEY_ALGO_*. */  
> +     u8      hash;           /* Digest algorithm: HASH_ALGO_*. */  
> +     u8      id_type;        /* Key identifier type PKEY_ID*. */  
> +     u8      signer_len;     /* Length of signer's name */  
> +     u8      key_id_len;     /* Length of key identifier */  
> +     u8      __pad[3];  
> +     __be32  sig_len;        /* Length of signature data */  
> +};
> +
> +</pre>
> +(Note that this has been borrowed from Linux module signature code.).

It doesn't make clear who's supposed to do that verification. If
the hypervisor, this would seem to imply a whole lot of
cryptography code needing importing...

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.