[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring



On Sat, Mar 24, 2018 at 08:32:44AM +1000, Alexey G wrote:
> On Fri, 23 Mar 2018 13:57:11 +0000
> Paul Durrant <Paul.Durrant@xxxxxxxxxx> wrote:
> [...]
> >> Few related thoughts:
> >> 
> >> 1. MMCONFIG address is chipset-specific. On Q35 it's a PCIEXBAR, on
> >> other x86 systems it may be HECBASE or else. So we can assume it is
> >> bound to the emulated machine
> >
> >Xen emulates the machine so it should be emulating PCIEXBAR. 
> 
> Actually, Xen currently emulates only few devices. Others are
> provided by QEMU, that's the problem.
> 
> >> 2. We rely on QEMU to emulate different machines for us.
> >We should not be. It's a historical artefact that we rely on QEMU for
> >any part of machine emulation.
> 
> HVM guests need to see something more or less close to real hardware to
> run. Even if we later install PV drivers for network/storage/etc usage,
> we still need to support system firmware (SeaBIOS/OVMF) and be able to
> install any (ideally) OS which expects to be installed only on some
> real x86 hw. We also need to be ready to fallback to the emulated hw if
> eg. user will boot OS in the safe mode.

I think Paul means that Xen should be emulating the platform devices
and part of the southbridge/northbridge functionality, but not all the
emulated devices provided to a guest.

> 
> It all depends on what you mean by not relying on QEMU for any part
> of machine emulation.
> 
> There is a number of mandatory devices which should be provided for a
> typical x86 system. Xen emulates some of them, but there is a number
> which he doesn't. Apart from "classic" devices like RTC, PIT, KBC, etc
> we need to provide at least storage and network interfaces.
> 
> Windows installer won't be happy to boot from the PV storage device, he
> prefers to encounter something like AHCI (Windows 7+), ATA (for older
> OSes) or ATAPI if it is an iso cd.
> Providing emulation for the AHCI+ATA+ATAPI trio alone is a non-trivial
> task. QEMU itself provides only partial implementation of these, many
> features are unsupported. Another very useful thing to emulate is USB.
> Depending on the controller version and device classes required, this
> may be far more complex to emulate than AHCI+ATA+ATAPI combined.
> 
> So, if you suggest to drop QEMU completely, it means that all this
> functionality must be replaced by own. Not that hard, but still a lot
> of effort.
> 
> 
> OTOH, if you mean stripping QEMU of general PCI bus control and
> replacing his emulated NB/SB with Xen-owned -- well, it theory it
> should be possible, with patches on QEMU side.
> 
> In fact, the emulated chipset (NB+SB combo without supplemental devices)
> itself is a small part of required emulation. It's relatively easy to
> provide own analogs of for eg. 'mch' and 'ICH9-LPC' QEMU PCIDevice's,
> the problem is to glue all remaining parts together.
> 
> I assume the final goal in this case is to have only a set of necessary
> QEMU PCIDevice's for which we will be providing I/O, MMIO and PCI conf
> trapping facilities. Only devices such as rtl8139, ich9-ahci and few
> others.
> 
> Basically, this means a new, chipset-less QEMU machine type.
> Well, in theory it is possible with a bit of effort I think. The main
> question is where will be the NB/SB/PCIbus emulating part reside in
> this case.

Mostly inside of Xen. Of course the IDE/SATA/USB/Ethernet... part of
the southbrigde will be emulated by a device model (ie: QEMU).

As you mention above, I also took a look and it seems like the amount
of registers that we should emulate for Q35 DRAM controller (D0:F0) is
fairly minimal based on current QEMU implementation. We could even
possibly get away by just emulating PCIEXBAR.

> As this part must still have some priveleges, it's basically
> the same decision problem as with QEMU's dwelling place -- stubdomain,
> Dom0 or else.
> 
> >> 3. There are users which touch chipset-specific PCIEXBAR directly if
> >> they see a Q35 system (OVMF so far)
> >
> >And we should squash such accesses.
> >
> 
> Yes, we have that privilege (i.e. allocating all IO/MMIO bases) for
> hvmloader. OVMF should not differ in this subject to SeaBIOS.
> 
> >The toolstack should be sole
> >control of the guest memory map. It should be the only building MCFG
> >so it should decide where the MMCONFIG regions go, not the firmware
> >running in guest context.
> 
> HVM memory layout is another problem which needs solution BTW. I had to
> implement one for my PT goals, but it's very radical I'm afraid.
> 
> Right now there are wicked issues present in handling memory layout
> between hvmloader and QEMU. They may see a different memory map, even
> with overlaps in some (depending on MMIO hole size and content) cases --
> like an attempt to place MMIO BAR over memory which is used for vram
> backing storage by QEMU, causing variety of issues like emulated I/O
> errors (with a storage device) during guest boot attempt.
> 
> Regarding control of the guest memory map in the toolstack only... The
> problem is, only firmware can see a final memory map at the moment.
> And only the device model knows about invisible "service" ranges for
> emulated devices, like the LFB content (aka "VRAM") when it is not
> mapped to a guest.
> 
> In order to calculate the final memory/MMIO hole split, we need to know:
> 
> 1) all PCI devices on a PCI bus. At the moment Xen contributes only
> devices like PT to the final PCI bus (via QMP device_add). Others are
> QEMU ones. Even Xen platform PCI device relies on QEMU emulation.
> Non-QEMU device emulators are another source of virtual PCI devices I
> guess.
> 
> 2) all chipset-specific emulated MMIO ranges. MMCONFIG is one of them
> and largest (up to 256Mb for a segment). There are few other smaller
> ranges, eg. Root Complex registers. All this ranges depend on the
> emulated chipset.
> 
> 3) all reserved memory ranges (this one what toolstack already knows)
> 
> 4) all "service" guest memory ranges like backing storage for VRAM in
> QEMU. Emulated Option ROMs should belong here too, but IIRC xen-hvm.c
> either intentionally or by mistate handles them as emulated ranges
> currently.
> 
> If we miss any of these (like what are the chipset-specific ranges and
> their size alignment requirements) -- we're in trouble. But, if we know
> *all* of these, we can pre-calculate the MMIO hole size. Although this
> is a bit fragile to do from the toolstack because both sizing algo in
> the toolstack and MMIO BAR allocation code in the firmware (hvmloader)
> must have their algorithms synchronized, because it is possible to
> sruff BARs to MMIO hole in different ways, especially when PCI-PCI
> bridges will appear on the scene. Both need to do it in a consistent way
> (resulting in similar set of gaps between allocated BARs), otherwise
> expected MMIO hole sizes won't match, which means we may need to
> relocate MMIO BARs to the high MMIO hole and this in turn may lead to
> those overlaps with QEMU memories.

I agree that the current memory layout management (or the lack of it)
is concerning. Although related, I think this should be tackled as a
different issue from the chipset one IMHO.

Since you already posted the Q35 series I would attempt to get that
done first before jumping into the memory layout one.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.