[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RFC: very initial PVH design document



On Wed, Aug 27, 2014 at 03:38:42PM -0700, Mukesh Rathor wrote:
> On Wed, 27 Aug 2014 16:45:37 -0400
> Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:
> 
> > On Tue, Aug 26, 2014 at 05:33:21PM -0700, Mukesh Rathor wrote:
> > > On Fri, 22 Aug 2014 16:55:08 +0200
> > > Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
> > > 
> > > > Hello,
> > > > 
> > > > I've started writing a document in order to describe the
> > > > interface exposed by Xen to PVH guests, and how it should be used
> > > > (by guest OSes). The document is far from complete (see the
> > > > amount of TODOs scattered around), but given the lack of
> > > > documentation regarding PVH I think it's a good starting point.
> > > > The aim of this is that it should be committed to the Xen
> > > > repository once it's ready. Given that this is still a *very*
> > > > early version I'm not even posting it as a patch.
> > > > 
> > > > Please comment, and try to fill the holes if possible ;).
> > > > 
> > > > Roger.
> > > > 
> > > > ---
> > > > # PVH Specification #
> > > > 
> > > > ## Rationale ##
> > > > 
> > > > PVH is a new kind of guest that has been introduced on Xen 4.4 as
> > > > a DomU, and on Xen 4.5 as a Dom0. The aim of PVH is to make use
> > > > of the hardware virtualization extensions present in modern x86
> > > > CPUs in order to improve performance.
> > > > 
> > > > PVH is considered a mix between PV and HVM, and can be seen as a
> > > > PV guest that runs inside of an HVM container, or as a PVHVM guest
> > > > without any emulated devices. The design goal of PVH is to provide
> > > > the best performance possible and to reduce the amount of
> > > > modifications needed for a guest OS to run in this mode (compared
> > > > to pure PV).
> > > > 
> > > > This document tries to describe the interfaces used by PVH guests,
> > > > focusing on how an OS should make use of them in order to support
> > > > PVH.
> > > > 
> > > > ## Early boot ##
> > > > 
> > > > PVH guests use the PV boot mechanism, that means that the kernel
> > > > is loaded and directly launched by Xen (by jumping into the entry
> > > > point). In order to do this Xen ELF Notes need to be added to the
> > > > guest kernel, so that they contain the information needed by Xen.
> > > > Here is an example of the ELF Notes added to the FreeBSD amd64
> > > > kernel in order to boot as PVH:
> > > > 
> > > >     ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS,       .asciz, "FreeBSD")
> > > >     ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION,  .asciz,
> > > > __XSTRING(__FreeBSD_version)) ELFNOTE(Xen,
> > > > XEN_ELFNOTE_XEN_VERSION,    .asciz, "xen-3.0") ELFNOTE(Xen,
> > > > XEN_ELFNOTE_VIRT_BASE,      .quad,  KERNBASE) ELFNOTE(Xen,
> > > > XEN_ELFNOTE_PADDR_OFFSET,   .quad,  KERNBASE) ELFNOTE(Xen,
> > > > XEN_ELFNOTE_ENTRY,          .quad,  xen_start) ELFNOTE(Xen,
> > > > XEN_ELFNOTE_HYPERCALL_PAGE, .quad,  hypercall_page) ELFNOTE(Xen,
> > > > XEN_ELFNOTE_HV_START_LOW,   .quad,  HYPERVISOR_VIRT_START)
> > > > ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,       .asciz,
> > > > "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector")
> > > > ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE,       .asciz, "yes")
> > > > ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID,   .long,  PG_V, PG_V)
> > > > ELFNOTE(Xen, XEN_ELFNOTE_LOADER,         .asciz, "generic")
> > > > ELFNOTE(Xen, XEN_ELFNOTE_SUSPEND_CANCEL, .long,  0) ELFNOTE(Xen,
> > > > XEN_ELFNOTE_BSD_SYMTAB,     .asciz, "yes")
> > > 
> > > It will be helpful to add:
> > > 
> > > On the linux side, the above can be found in
> > > arch/x86/xen/xen-head.S.
> > > 
> > > 
> > > > It is important to highlight the following notes:
> > > > 
> > > >   * XEN_ELFNOTE_ENTRY: contains the memory address of the kernel
> > > > entry point.
> > > >   * XEN_ELFNOTE_HYPERCALL_PAGE: contains the memory address of the
> > > > hypercall page inside of the guest kernel (this memory region
> > > > will be filled by Xen prior to booting).
> > > >   * XEN_ELFNOTE_FEATURES: contains the list of features supported
> > > > by the kernel. In this case the kernel is only able to boot as a
> > > > PVH guest, but those options can be mixed with the ones used by
> > > > pure PV guests in order to have a kernel that supports both PV
> > > > and PVH (like Linux). The list of options available can be found
> > > > in the `features.h` public header.
> > > 
> > > Hmm... for linux I'd word that as follows:
> > > 
> > > A PVH guest is started by specifying pvh=1 in the config file.
> > > However, for the guest to be launched as a PVH guest, it must
> > > minimally advertise certain features which are:
> > > auto_translated_physmap, hvm_callback_vector,
> > > writable_descriptor_tables, and supervisor_mode_kernel. This is
> > > done via XEN_ELFNOTE_FEATURES and XEN_ELFNOTE_SUPPORTED_FEATURES.
> > > See linux:arch/x86/xen/xen-head.S for more info. A list of all xen
> > > features can be found in xen:include/public/features.h. However, at
> > > present the absence of these features does not make it
> > > automatically boot in PV mode, but that may change in future. The
> > > ultimate goal is, if a guest supports these features, then boot it
> > > automatically in PVH mode, otherwise boot it in PV mode.
> > > 
> > > [You can leave out the last part if you want, or just take whatever
> > > from above].
> > > 
> > > > Xen will jump into the kernel entry point defined in
> > > > `XEN_ELFNOTE_ENTRY` with paging enabled (either long or protected
> > > > mode depending on the kernel bitness) and some basic page tables
> > > > setup.
> > > 
> > > If I may rephrase:
> > > 
> > > Guest is launched at the entry point specified in XEN_ELFNOTE_ENTRY
> > > with paging, PAE, and long mode enabled. At present only 64bit mode
> > > is supported, however, in future compat mode support will be added.
> > > An important distinction for a 64bit PVH is that it is launched at
> > > privilege level 0 as opposed to a 64bit PV guest which is launched
> > > at privilege level 3.
> > > 
> > > > Also, the `rsi` (`esi` on 32bits) register is going to contain the
> > > > virtual memory address were Xen has placed the start_info
> > > > structure. The `rsp` (`esp` on 32bits) will contain a stack, that
> > > > can be used by the guest kernel. The start_info structure
> > > > contains all the info the guest needs in order to initialize.
> > > > More information about the contents can be found on the `xen.h`
> > > > public header.
> > > 
> > > Since the above is all true for PV guest, you could begin it with:
> > > 
> > > Just like a PV guest, the rsi ....
> > > 
> > > > 
> > > > ### Initial amd64 control registers values ###
> > > > 
> > > > Initial values for the control registers are set up by Xen before
> > > > booting the guest kernel. The guest kernel can expect to find the
> > > > following features enabled by Xen.
> > > > 
> > > > On `CR0` the following bits are set by Xen:
> > > > 
> > > >   * PE (bit 0): protected mode enable.
> > > >   * ET (bit 4): 80387 external math coprocessor.
> > > >   * PG (bit 31): paging enabled.
> > > > 
> > > > On `CR4` the following bits are set by Xen:
> > > > 
> > > >   * PAE (bit 5): PAE enabled.
> > > > 
> > > > And finally on `EFER` the following features are enabled:
> > > > 
> > > >   * LME (bit 8): Long mode enable.
> > > >   * LMA (bit 10): Long mode active.
> > > > 
> > > > *TODO*: do we expect this flags to change? Are there other flags
> > > > that might be enabled depending on the hardware we are running on?
> > > 
> > > Can't think of anything...
> > 
> > What about the initial segments (ES, DS, FS, GS)? We boot with Xen
> > provided ones and need to swap over from them - so that means
> > the DS and CS are initially set to Xen ones. And we should probably
> > mention that when the OS switches from Xen ones it MUST jump an
> > CS with CS.L = 1 set otherwise bad things happen.
> 
> CS.L is already covered above:
>     with paging, PAE, and long mode enabled. At present only 64bit mode
>     is supported, however, in future compat mode support will be added.
> 
> that is the CS.L bit. CS.L==1 ==> 64bit mode, CS.L==0 ==> compat mode.

I mean that we should include what the segment actually looks like.
As in what the initial segments it boots with are.

> 
> 
> > We should probably mention that MSR_FS_BASE, MSR_KERNEL_GS_BASE
> > and MSR_FS_BASE are zeroed out. Not sure about any other MSR?
> 
> Could.

Perhaps say that any other MSRS are treated the same as they are
under an HVM guests.
> 
> > Should we have a blurb about IDT and GDT and that the PV hypercalls
> > for that will be ignored.
> 
> and that they are native and guest managed.

Right. Which means that during early bootup one has to be extra
careful to not get a #GP as there are no page-fault handlers setup.

> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.