[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v1 01/11] xsplice: Design document (v2).

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
From: Ross Lagerwall <ross.lagerwall@xxxxxxxxxx>
Date: Tue, 10 Nov 2015 09:55:17 +0000
Cc: Tim Deegan <tim@xxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, xen-devel@xxxxxxxxxxxxx
Delivery-date: Tue, 10 Nov 2015 09:56:19 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 11/03/2015 06:15 PM, Ross Lagerwall wrote:

From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

snip

+## Patching code
+
+The first mechanism to patch that comes in mind is in-place replacement.
+That is replace the affected code with new code. Unfortunately the x86
+ISA is variable size which places limits on how much space we have available
+to replace the instructions. That is not a problem if the change is smaller
+than the original opcode and we can fill it with nops. Problems will
+appear if the replacement code is longer.
+
+The second mechanism is by replacing the call or jump to the
+old function with the address of the new function.
+
+A third mechanism is to add a jump to the new function at the
+start of the old function.

Perhaps this document should be clarified to say what is actuallyimplemented in this patch series? I.e. the third mechanism is the onethat is actually implemented.

+
+### Example of trampoline and in-place splicing
+

snip

+## Addendum
+
+Implementation quirks should not be discussed in a design document.
+
+However these observations can provide aid when developing against this
+document.
+
+
+### Alternative assembler
+
+Alternative assembler is a mechanism to use different instructions depending
+on what the CPU supports. This is done by providing multiple streams of code
+that can be patched in - or if the CPU does not support it - padded with
+`nop` operations. The alternative assembler macros cause the compiler to
+expand the code to place a most generic code in place - emit a special
+ELF .section header to tag this location. During run-time the hypervisor
+can leave the areas alone or patch them with an better suited opcodes.

Note that if you patch any function that does a copy to or from guestmemory, alternatives support is _required_ on Broadwell hardware becauseof SMAP (it patches in stac and clac).


This is actually implemented in this patch series.

+
+
+### When to patch
+
+During the discussion on the design two candidates bubbled where
+the call stack for each CPU would be deterministic. This would
+minimize the chance of the patch not being applied due to safety
+checks failing.
+
+#### Rendezvous code instead of stop_machine for patching
+
+The hypervisor's time rendezvous code runs synchronously across all CPUs
+every second. Using the stop_machine to patch can stall the time rendezvous
+code and result in NMI. As such having the patching be done at the tail
+of rendezvous code should avoid this problem.
+
+However the entrance point for that code is
+do_softirq->timer_softirq_action->time_calibration
+which ends up calling on_selected_cpus on remote CPUs.
+
+The remote CPUs receive CALL_FUNCTION_VECTOR IPI and execute the
+desired function.
+
+#### Before entering the guest code.
+
+Before we call VMXResume we check whether any soft IRQs need to be executed.
+This is a good spot because all Xen stacks are effectively empty at
+that point.
+
+To randezvous all the CPUs an barrier with an maximum timeout (which
+could be adjusted), combined with forcing all other CPUs through the
+hypervisor with IPIs, can be utilized to have all the CPUs be lockstep.
+
+The approach is similar in concept to stop_machine and the time rendezvous
+but is time-bound. However the local CPU stack is much shorter and
+a lot more deterministic.


This is implemented in this patch series.

+
+### Compiling the hypervisor code
+
+Hotpatch generation often requires support for compiling the target
+with -ffunction-sections / -fdata-sections.  Changes would have to
+be done to the linker scripts to support this.

As implemented, the current tool for payload generation requires nochanges to Xen.It is also possible to create a payload by compiling a C file containingthe replacement functions that it completely separate from Xen itself.

+
+
+### Generation of xSplice ELF payloads
+
+The design of that is not discussed in this design.
+
+The author of this design envisions objdump and objcopy along
+with special GCC parameters (see above) to create .o.xsplice files
+which can be used to splice an ELF with the new payload.
+
+The ksplice or kpatching code can provide inspiration.


This is already implemented and the tool lives in a separate repo.

+
+### Exception tables and symbol tables growth
+
+We may need support for adapting or augmenting exception tables if
+patching such code.  Hotpatches may need to bring their own small
+exception tables (similar to how Linux modules support this).
+
+If supporting hotpatches that introduce additional exception-locations
+is not important, one could also change the exception table in-place
+and reorder it afterwards.

Almost every patch to a non-trivial function requires additional entriesin the exception table and/or the bug frames.

This patch series allows each payload to introduce their own exceptiontables and bug frames to support this.

+
+### Security
+
+Only the privileged domain should be allowed to do this operation.
+
+
+# v2: Not Yet Done
+
+
+## Goals
+
+The v2 design must also have a mechanism for:
+
+ *  An dependency mechanism for the payloads. To use that information to load:
+    - The appropiate payload. To verify that payload is built against the
+      hypervisor. This can be done via the `build-id` (see later sections),
+      or via providing an copy of the old code - so that the hypervisor can
+       verify it against the code in memory.
+    - To construct an appropiate order of payloads to load in case they
+      depend on each other.
+ * Be able to cope with symbol names in the ELF payload.


I don't understand what this means.

+ * Be able to patch .rodata, .bss, and .data sections.

To clarify, this patch series allows a payload to introduce new .rodata,.bss, and .data sections. So if you change a string in a function, thepayload produced will have a new function which overrides the old oneand a new .rodata section. The new function will reference the newstring in the new .rodata section.

What is not implemented in this series is _in-place_ patching orupdating of these sections. Hook functions (coming in V2) will allowcustom modification of these sections during patch apply. IMO,performing automated in-place patching of these sections (along within-place patching of text sections) is crazy :-)

+ * Further safety checks.


Such as?

+
+### Hypervisor ID (build-id)
+
+The build-id can help with:
+
+  * Prevent loading of wrong hotpatches (intended for other builds)
+
+  * Allow to identify suitable hotpatches on disk and help with runtime
+    tooling (if laid out using build ID)
+
+The build-id (aka hypervisor id) can be easily obtained by utilizing
+the ld --build-id operatin which (copied from ld):
+
+<pre>
+--build-id
+    --build-id=style
+        Request creation of ".note.gnu.build-id" ELF note section.  The 
contents of the note are unique bits identifying this
+        linked file.  style can be "uuid" to use 128 random bits, "sha1" to 
use a 160-bit SHA1 hash on the normative parts of the
+        output contents, "md5" to use a 128-bit MD5 hash on the normative parts of the 
output contents, or "0xhexstring" to use a
+        chosen bit string specified as an even number of hexadecimal digits ("-" and 
":" characters between digit pairs are
+        ignored).  If style is omitted, "sha1" is used.
+
+        The "md5" and "sha1" styles produces an identifier that is always the 
same in an identical output file, but will be
+        unique among all nonidentical output files.  It is not intended to be 
compared as a checksum for the file's contents.  A
+        linked file may be changed later by other tools, but the build ID bit 
string identifying the original linked file does
+        not change.
+
+        Passing "none" for style disables the setting from any "--build-id" 
options earlier on the command line.
+
+</pre>
+
+
+### xSplice interdependencies
+
+xSplice patches interdependencies are tricky.
+
+There are the ways this can be addressed:
+ * A single large patch that subsumes and replaces all previous ones.
+   Over the life-time of patching the hypervisor this large patch
+   grows to accumulate all the code changes.
+ * Hotpatch stack - where an mechanism exists that loads the hotpatches
+   in the same order they were built in. We would need an build-id
+   of the hypevisor to make sure the hot-patches are build against the
+   correct build.
+ * Payload containing the old code to check against that. That allows
+   the hotpatches to be loaded indepedently (if they don't overlap) - or
+   if the old code also containst previously patched code - even if they
+   overlap.
+
+The disadvantage of the first large patch is that it can grow over
+time and not provide an bisection mechanism to identify faulty patches.
+
+The hot-patch stack puts stricts requirements on the order of the patches
+being loaded and requires an hypervisor build-id to match against.
+
+The old code allows much more flexibility and an additional guard,
+but is more complex to implement.
+
+### Symbol names
+
+
+Xen as it is now, has a couple of non-unique symbol names which will
+make runtime symbol identification hard.  Sometimes, static symbols
+simply have the same name in C files, sometimes such symbols get
+included via header files, and some C files are also compiled
+multiple times and linked under different names (guest_walk.c).
+
+As such we need to modify the linker to make sure that the symbol
+table qualifies also symbols by their source file name.
+
+For the awkward situations in which C-files are compiled multiple
+times patches we would need to some modification in the Xen code.
+
+
+The convention for file-type symbols (that would allow to map many
+symbols to their compilation unit) says that only the basename (i.e.,
+without directories) is embedded.  This creates another layer of
+confusion for duplicate file names in the build tree.

This should all be fixed since Jan's symbol patches so perhaps you candrop this section?

+
+That would have to be resolved.
+
+<pre>
+> find . -name \*.c -print0 | xargs -0 -n1 basename | sort | uniq -c | sort -n 
| tail -n10
+      3 shutdown.c
+      3 sysctl.c
+      3 time.c
+      3 xenoprof.c
+      4 gdbstub.c
+      4 irq.c
+      5 domain.c
+      5 mm.c
+      5 pci.c
+      5 traps.c
+</pre>
+
+### Handle inlined __LINE__
+
+
+This problem is related to hotpatch construction
+and potentially has influence on the design of the hotpatching
+infrastructure in Xen.
+
+For example:
+
+We have file1.c with functions f1 and f2 (in that order).  f2 contains a
+BUG() (or WARN()) macro and at that point embeds the source line number
+into the generated code for f2.
+
+Now we want to hotpatch f1 and the hotpatch source-code patch adds 2
+lines to f1 and as a consequence shifts out f2 by two lines.  The newly
+constructed file1.o will now contain differences in both binary
+functions f1 (because we actually changed it with the applied patch) and
+f2 (because the contained BUG macro embeds the new line number).
+
+Without additional information, an algorithm comparing file1.o before
+and after hotpatch application will determine both functions to be
+changed and will have to include both into the binary hotpatch.
+
+Options:
+
+1. Transform source code patches for hotpatches to be line-neutral for
+   each chunk.  This can be done in almost all cases with either
+   reformatting of the source code or by introducing artificial
+   preprocessor "#line n" directives to adjust for the introduced
+   differences.
+
+   This approach is low-tech and simple.  Potentially generated
+   backtraces and existing debug information refers to the original
+   build and does not reflect hotpatching state except for actually
+   hotpatched functions but should be mostly correct.
+
+2. Ignoring the problem and living with artificially large hotpatches
+   that unnecessarily patch many functions.
+
+   This approach might lead to some very large hotpatches depending on
+   content of specific source file.  It may also trigger pulling in
+   functions into the hotpatch that cannot reasonable be hotpatched due
+   to limitations of a hotpatching framework (init-sections, parts of
+   the hotpatching framework itself, ...) and may thereby prevent us
+   from patching a specific problem.
+
+   The decision between 1. and 2. can be made on a patch--by-patch
+   basis.
+
+3. Introducing an indirection table for storing line numbers and
+   treating that specially for binary diffing. Linux may follow
+   this approach.
+
+   We might either use this indirection table for runtime use and patch
+   that with each hotpatch (similarly to exception tables) or we might
+   purely use it when building hotpatches to ignore functions that only
+   differ at exactly the location where a line-number is embedded.

For BUG(), WARN(), etc., the line number is embedded into the bug frame,not the function itself. As implemented, the patch creation tool handlesthis just fine. Because the tool works on a function-by-function basis,only functions which have actually changed will be included.

+
+Similar considerations are true to a lesser extent for __FILE__, but it
+could be argued that file renaming should be done outside of hotpatches.

The same applies for __FILE__ references in bug frames (e.g. BUG, WARN,ASSERT, etc).

+
+## Signature checking requirements.
+
+The signature checking requires that the layout of the data in memory
+**MUST** be same for signature to be verified. This means that the payload
+data layout in ELF format **MUST** match what the hypervisor would be
+expecting such that it can properly do signature verification.
+
+The signature is based on the all of the payloads continuously laid out
+in memory. The signature is to be appended at the end of the ELF payload
+prefixed with the string '~Module signature appended~\n', followed by
+an signature header then followed by the signature, key identifier, and signers
+name.
+
+Specifically the signature header would be:
+
+<pre>
+#define PKEY_ALGO_DSA       0
+#define PKEY_ALGO_RSA       1
+
+#define PKEY_ID_PGP         0 /* OpenPGP generated key ID */
+#define PKEY_ID_X509        1 /* X.509 arbitrary subjectKeyIdentifier */
+
+#define HASH_ALGO_MD4          0
+#define HASH_ALGO_MD5          1
+#define HASH_ALGO_SHA1         2
+#define HASH_ALGO_RIPE_MD_160  3
+#define HASH_ALGO_SHA256       4
+#define HASH_ALGO_SHA384       5
+#define HASH_ALGO_SHA512       6
+#define HASH_ALGO_SHA224       7
+#define HASH_ALGO_RIPE_MD_128  8
+#define HASH_ALGO_RIPE_MD_256  9
+#define HASH_ALGO_RIPE_MD_320 10
+#define HASH_ALGO_WP_256      11
+#define HASH_ALGO_WP_384      12
+#define HASH_ALGO_WP_512      13
+#define HASH_ALGO_TGR_128     14
+#define HASH_ALGO_TGR_160     15
+#define HASH_ALGO_TGR_192     16
+
+
+struct elf_payload_signature {
+       u8      algo;           /* Public-key crypto algorithm PKEY_ALGO_*. */
+       u8      hash;           /* Digest algorithm: HASH_ALGO_*. */
+       u8      id_type;        /* Key identifier type PKEY_ID*. */
+       u8      signer_len;     /* Length of signer's name */
+       u8      key_id_len;     /* Length of key identifier */
+       u8      __pad[3];
+       __be32  sig_len;        /* Length of signature data */
+};
+
+</pre>
+(Note that this has been borrowed from Linux module signature code.).
+
+
+### .rodata sections
+
+The patching might require strings to be updated as well. As such we must be
+also able to patch the strings as needed. This sounds simple - but the compiler
+has a habit of coalescing strings that are the same - which means if we 
in-place
+alter the strings - other users will be inadvertently affected as well.
+
+This is also where pointers to functions live - and we may need to patch this
+as well. And switch-style jump tables.
+
+To guard against that we must be prepared to do patching similar to
+trampoline patching or in-line depending on the flavour. If we can
+do in-line patching we would need to:
+
+ * alter `.rodata` to be writeable.
+ * inline patch.
+ * alter `.rodata` to be read-only.
+
+If are doing trampoline patching we would need to:
+
+ * allocate a new memory location for the string.
+ * all locations which use this string will have to be updated to use the
+   offset to the string.
+ * mark the region RO when we are done.
+
+### .bss and .data sections.
+
+Patching writable data is not suitable as it is unclear what should be done
+depending on the current state of data. As such it should not be attempted.

Handling of these sections is implemented in this patch series andshould just work. See what I wrote above about how .rodata, .data, .bssis handled.

Hook functions (coming in V2) can be used to fix up writable data. E.g.a hook function might iterate over each domain and apply sometransformation.

+
+
+### Patching code which is in the stack.
+
+We should not patch the code which is on the stack. That can lead
+to corruption.

Due to the way the patching code is done I believe that the onlyfunctions on any stack will be the xsplice code and domain.c:idle_loop(and it shouldn't matter if a trampoline is inserted at the beginning ofidle_loop anyway). Perhaps this section could be merged with the "Whento patch" section above?

+
+### Inline patching
+
+The hypervisor should verify that the in-place patching would fit within
+the code or data.
+
+### Trampoline (e9 opcode)
+
+The e9 opcode used for jmpq uses a 32-bit signed displacement. That means
+we are limited to up to 2GB of virtual address to place the new code
+from the old code. That should not be a problem since Xen hypervisor has
+a very small footprint.
+
+However if we need - we can always add two trampolines. One at the 2GB
+limit that calls the next trampoline.
+
+Please note there is a small limitation for trampolines in
+function entries: The target function (+ trailing padding) must be able
+to accomodate the trampoline. On x86 with +-2 GB relative jumps,
+this means 5 bytes are  required.
+
+Depending on compiler settings, there are several functions in Xen that
+are smaller (without inter-function padding).
+
+<pre>
+readelf -sW xen-syms | grep " FUNC " | \
+    awk '{ if ($3 < 5) print $3, $4, $5, $8 }'
+
+...
+3 FUNC LOCAL wbinvd_ipi
+3 FUNC LOCAL shadow_l1_index
+...
+</pre>
+A compile-time check for, e.g., a minimum alignment of functions or a
+runtime check that verifies symbol size (+ padding to next symbols) for
+that in the hypervisor is advised.
+

The tool for generating payloads currently does perform a compile-timecheck to ensure the function to be replaced is large enough.


--
Ross Lagerwall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

References:
- [Xen-devel] [PATCH v1 01/11] xsplice: Design document (v2).
  - From: Ross Lagerwall

Prev by Date: Re: [Xen-devel] [PATCH V8 3/7] libxl: add pvusb API
Next by Date: [Xen-devel] [ovmf test] 63933: all pass - PUSHED
Previous by thread: Re: [Xen-devel] [PATCH v1 01/11] xsplice: Design document (v2).
Next by thread: Re: [Xen-devel] [PATCH v1 01/11] xsplice: Design document (v2).
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.