[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Draft A] Boot ABI for HVM guests without a device-model



On Wed, Jun 10, 2015 at 02:34:00PM +0200, Roger Pau Monnà wrote:
> Hello,
> 
> The discussion in [1] lead to an agreement of the missing pieces in PVH 
> (or HVM without a device-model) in order to progress with it's 
> implementation.
> 
> One of the missing pieces is a new boot ABI, that replaces the PV boot 
> ABI. The aim of this new boot ABI is to remove the limitations of the 

To be fair, there is an existing boot ABI.

It is the same as the PV boot but since it is an PV autotranslated
guest some of the values that an PV guest require are undefined.

With that in mind, why cannot we re-use that (xen_start_info) and
any field which is PV specific can be treated as reserved?


> PV boot ABI, that are no longer present when using auto-translated 
> guests. The new boot protocol should allow to use the same entry point 
> for both 32bit and 64bit guests, and let the guest choose it's bitness 
> at run time without the domain builder knowing in advance.

I like that idea, but that will make the work going forward
on the 32-bit PVH and AMD PVH move out at least another half year
- which is rather sad.

Also this change will require modifying the Linux 64-bit PVH
part. That should be mentioned - and that is likely going to
take also three months.


> 
> Roger.
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2015-06/msg00258.html
> 
> ---
> HVM direct boot ABI
> 
> Since the Xen entry point into the kernel can be different from the 
> native entry point, ELFNOTES are used in order to tell the domain 
> builder how to load and jump into the kernel entry point. At least the 
> following ELFNOTES are required in order to use this boot ABI:
> 
> ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS,       .asciz, "FreeBSD")
> ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION,  .asciz, __XSTRING(__FreeBSD_version))
> ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION,    .asciz, "xen-3.0")
> ELFNOTE(Xen, XEN_ELFNOTE_PADDR_OFFSET,   .quad,  KERNBASE)
> ELFNOTE(Xen, XEN_ELFNOTE_PADDR_ENTRY,    .quad,  xen_start32)
> ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,       .asciz, 
> "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector")

That will choke on older hypervisors. That is a normal PV
guest won't boot anymore. That is because the older hypervisors
will choke on 'hvm_callback_vector' being in the XEN_ELFNOTE_FEATURES.

You have to stick that in XEN_ELFNOTE_SUPPORTED_FEATURES field.

> ELFNOTE(Xen, XEN_ELFNOTE_LOADER,         .asciz, "generic")
> 
> The first three notes contain information about the guest kernel and 
> the Xen hypercall ABI version. The following notes are of special 
> interest:
> 
>  * XEN_ELFNOTE_PADDR_OFFSET: the offset of the ELF paddr field from the
>    actual required physical address.
>  * XEN_ELFNOTE_PADDR_ENTRY: the 32bit entry point into the kernel.

Is 'P' suppose to be 'physical' ?

I am not sure how this will work with an ELF 64-bit binary like
the Linux kernel. Usually we use the virtual address but with
us starting in 32-bit mode with an 64-bit virtual address won't work.

But the ELF loader could figure out the offset of the virtual
address from the ELF starting point and just call at the delta - in
which case having XEN_ELFNOTE_ENTRY can be used with the
understanding that we will just call at that that offset.

>  * XEN_ELFNOTE_FEATURES: features required by the guest kernel in order
>    to run.
> 
> The presence of the XEN_ELFNOTE_PADDR_ENTRY note indicates that the 
> kernel supports the boot ABI described in this document.
> 
> The domain builder will load the kernel into the guest memory space and 
> jump into the entry point defined at XEN_ELFNOTE_PADDR_ENTRY with the 
> following machine state:
> 
>  * esi: contains the physical memory address were the loader has placed
>    the start_info page.
> 
>  * eax: contains the magic value 0xFF6BC1E2.
> 
>  * cr0: bit 31 (PG) must be cleared. Bit 0 (PE) must be set. Other bits
>    are all undefined. 
> 
>  * cs: must be a 32-bit read/execute code segment with an offset of â0â
>    and a limit of â0xFFFFFFFFâ. The exact value is undefined.
> 
>  * ds, es, fs, gs, ss: must be a 32-bit read/write data segment with an
>    offset of â0â and a limit of â0xFFFFFFFFâ. The exact values are all
>    undefined. 
> 
>  * eflags: bit 17 (VM) must be cleared. Bit 9 (IF) must be cleared. 
>    Other bits are all undefined.
> 
>  * A20 gate: must be enabled.
> 
> All other processor registers and flag bits are undefined. The OS is in 
> charge of setting up it's own stack, GDT and IDT.
> 
> Note that the boot protocol resembles the multiboot1 specification, 
> this is done so OSes with multiboot1 entry points can reuse those if 
> desired. Also note that the processor starts with paging disabled, 
> which means that all the memory addresses in the start_info page will 
> be physical memory addresses.

Wow?! Pagetables disabled?! Why? Usually boot loaders start with some
pagetables setup for the OS - to cover at least the kernel and the
ramdisk. Either it being in 1-1 pagetables or such.

Why make this work harder for the guest?
Why can't the hypervisor setup most of these things for the guest?

> 
> ---
> Comments for further discussion:
> 
> Do we want to keep using the start_info page? Most of the fields there 

Yes. It suits its purpose here too.

> are not relevant for auto-translated guests, but without it we have to 
> figure out how to pass the following information to the guest:
> 
>  - Flags: SIF_xxx flags, this could probably be done with cpuid instead.
>  - cmd_line: ?
>  - console mfn: ?
>  - console evtchn: ?
>  - console_info address: ?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.