xen-devel
Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in tools
To: |
Ian Campbell <Ian.Campbell@xxxxxxxxxx> |
Subject: |
Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm) |
From: |
Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> |
Date: |
Tue, 16 Nov 2010 10:50:16 -0500 |
Cc: |
Jeremy Fitzhardinge <jeremy@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Stefano Stabellini <Stefano.Stabellini@xxxxxxxxxxxxx>, "bruce.edge@xxxxxxxxx" <bruce.edge@xxxxxxxxx>, Gianni@xxxxxxxxxxxxxxxxxxxx, Tedesco <gianni.tedesco@xxxxxxxxxx> |
Delivery-date: |
Tue, 16 Nov 2010 07:53:19 -0800 |
Envelope-to: |
www-data@xxxxxxxxxxxxxxxxxxx |
In-reply-to: |
<1289899586.31507.717.camel@xxxxxxxxxxxxxxxxxxxxxx> |
List-help: |
<mailto:xen-devel-request@lists.xensource.com?subject=help> |
List-id: |
Xen developer discussion <xen-devel.lists.xensource.com> |
List-post: |
<mailto:xen-devel@lists.xensource.com> |
List-subscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe> |
List-unsubscribe: |
<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe> |
References: |
<4CE18AD6.5070102@xxxxxxxx> <C907413B.A0AD%keir@xxxxxxx> <20101115231133.GA12364@xxxxxxxxxxxx> <4CE1D921.2010703@xxxxxxxx> <1289899586.31507.717.camel@xxxxxxxxxxxxxxxxxxxxxx> |
Sender: |
xen-devel-bounces@xxxxxxxxxxxxxxxxxxx |
User-agent: |
Mutt/1.5.20 (2009-06-14) |
disclaimer:
This email got a bit lengthy - so make sure you got a cup of coffee when you
read this.
> On an unrelated note I think if we do go down the route of having the
> guest kernel punch the holes itself and such we should do so iff
> XENMEM_memory_map returns either ENOSYS or nr_entries == 1 to leave open
When would that actually happen? Is that return value returned when the
hypervisor is not implementing it (what version was that implemented this)?
> the possibility of cunning tricks on the tools side in the future.
<shuders>
I think we have three options in regards to this RFC patch I posted:
1). Continue with this and have the toolstack punch the PCI hole. It would
fill the PCI hole area with INVALID_MFN. The toolstack determines where
the PCI hole starts.
2). Do this in the guest where the guest calls both XENMEM_machine_memory_map
and
XENMEM_memory_map to get an idea of the host side PCI hole and set it up.
Requires changes in hypervisor to allow non-privileged PV guest to make
XENMEM_machine_memory_map call. Linux kernel decides where PCI hole starts
and
the PCI hole is filled with INVALID_MFN.
3). Make unconditionally a PCI hole, starting at 3GB. PCI hole filled with
INVALID_MFN.
4). Another one I didn't think of?
For all of those cases when devices show up we populate on demand the P2M array
with the MFNs). For the first two proposals the BARs we read of
the PCI devices are going to be written to the P2M array as identity (so
mfn_list[0xc0000] == 0xc0000). Code has not been written.
For the third proposal, we would have non-identity mappings in the P2M array, as
during the migration we could move from a device with BARs of 0xc0000 to
0x20000.
So mfn_list[0xc0000] = 0x20000.
But for the third case I am unsure how we would get the "real" MFNs. We
initially get
the BARs via 0xcf8 calls and if we don't filter them, it gets to ioremap
function.
Say the host side BAR is at 0x20000, and our PCI hole starts at 0xc0000. The
ioremap
gets called with 0x20000, and in its E820 that region is 'System RAM'.
last_pfn = last_addr >> PAGE_SHIFT;
for (pfn = phys_addr >> PAGE_SHIFT; pfn <= last_pfn; pfn++) {
int is_ram = page_is_ram(pfn);
if (is_ram && pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn)))
return NULL;
WARN_ON_ONCE(is_ram);
}
Ugh, and it will think (correctly) that it falls within RAM.
If we filter the 0xcf8 calls, which we can do the Xen PCI backend case, we can
then
provide BARs that always start at 0xC0000. But that does not help the PV guest
to
know the "real" MFNs which it needs so it can program the P2M array. So the Xen
PCI front would have to do this - which it could, thought it adds a complexity
to it.
We also need to make all of this works with Domain zero, and here 1) or 2) can
easily be used as the Xen hypervisor has given us the E820 nicely peppered with
holes.
(I wonder, what happens if dom0 makes a XENMEM_memory_map call - does it get
anything?)
If we then go with 3), we would need to instrument the code that reads the BARs
so that
it can filter it properly. That would be low-level Linux pci_conf_read and that
is not
going happen - so we would have to make the Xen hypervisor be aware of this and
when
it traps the in/out provide new BAR values starting at 0xC0000.
I am not comfortable maintaining this filter/keep state code in both the Xen
hypervisor
and the Xen PCI front module so I think 3) would not work that well, unless
there are
better ways that I have missed?
Back to 1) and 2). Migration would work if we unplug the PCI devices before
suspend and
on resume plug them back in - otherwise the PCI BARs might have changed between
migrations. When the guest gets recreated - how does it iterate over the E820
to create
the P2M list? Or is that something that is not done and we just save the P2M
list and
restore as-is on the other side? Naturally, since we would unplug the PCI
device the
entries in the E820 gaps would be INVALID_MFN...
If we consult the E820 during resume I think doing the PCI hole in the
toolstack has
merits - simply b/c the user can set the PCI hole to an arbitrary address that
is low
enough (0x2000 say) to cover all of the machines that he/she would migrate too.
While
if we do it in the Linux kernel we do not have that information. Even if we
don't
consult the E820, the toolstack still has merits - as the PCI hole start address
might be different between the migration machines and we might have started on
a box with the PCI hole being way up (3.9GB) while the other machines might have
at 1.2GB.
The other thing I don't know is how all of this works with 32-bit kernels?
P.S.
I've done the testing of 1) with 64-bit w/ and w/o ballooning and it worked
fine.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm), (continued)
- Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm), Keir Fraser
- Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm), Jeremy Fitzhardinge
- Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm), Keir Fraser
- Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm), Konrad Rzeszutek Wilk
- Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm), Jeremy Fitzhardinge
- Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm), Ian Campbell
- Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm), Keir Fraser
- Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm), Ian Campbell
- Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm), Keir Fraser
- Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm), Jeremy Fitzhardinge
- Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm),
Konrad Rzeszutek Wilk <=
- Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm), Ian Campbell
- Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm), Keir Fraser
- Re: [Xen-devel] [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm), Jeremy Fitzhardinge
[Xen-devel] Re: [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm), Gianni Tedesco
|
|
|