[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Qemu-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap



On 11/27/14 05:48, Stefano Stabellini wrote:
On Wed, 26 Nov 2014, Don Slutz wrote:
On 11/26/14 13:17, Stefano Stabellini wrote:
On Tue, 25 Nov 2014, Andrew Cooper wrote:
On 25/11/14 17:45, Stefano Stabellini wrote:
Increase maxmem before calling xc_domain_populate_physmap_exact to avoid
the risk of running out of guest memory. This way we can also avoid
complex memory calculations in libxl at domain construction time.

This patch fixes an abort() when assigning more than 4 NICs to a VM.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>

diff --git a/xen-hvm.c b/xen-hvm.c
index 5c69a8d..38e08c3 100644
--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -218,6 +218,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
size, MemoryRegion *mr)
       unsigned long nr_pfn;
       xen_pfn_t *pfn_list;
       int i;
+    xc_dominfo_t info;
         if (runstate_check(RUN_STATE_INMIGRATE)) {
           /* RAM already populated in Xen */
@@ -240,6 +241,13 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
size, MemoryRegion *mr)
           pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
       }
   +    if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) {
xc_domain_getinfo()'s interface is mad, and provides no guarantee that
it returns the information for the domain you requested.  It also won't
return -1 on error.  The correct error handing is:

(xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) || (info.domid !=
xen_domid)
It might be wiser to switch to xc_domain_getinfolist
Either needs the same tests, since both return an vector of info.
Right


~Andrew

+        hw_error("xc_domain_getinfo failed");
+    }
+    if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
+                            (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) {
There are two big issues and 1 minor one with this.
1) You will allocate the videoram again.
2) You will never use the 1 MB already allocated for option ROMs.

And the minor one is that you can increase maxmem more then is needed.
I don't understand: are you aware that setmaxmem doesn't allocate any
memory, just raises the maximum amount of memory allowed for the domain
to have?

Yes.

But you are right that we would raise the limit more than it could be,
specifically the videoram would get accounted for twice and we wouldn't
need LIBXL_MAXMEM_CONSTANT. I guess we would have to write a patch for
that.



Here is a better if:

-    if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
-                            (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) {
+    max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE;
+    free_pages = max_pages - info.nr_pages;
+    need_pages = nr_pfn - free_pages;
+    if ((free_pages < nr_pfn) &&
+       (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
+                            (need_pages * XC_PAGE_SIZE / 1024)) < 0)) {
That's an interesting idea, but I am not sure if it is safe in all
configurations.

It could make QEMU work better with older libxl and avoid increasing
maxmem more than necessary.
On the other hand I guess it could break things when PoD is used, or in
general when the user purposely sets maxmem on the vm config file.


Works fine in both claim modes and with PoD used (maxmem > memory).  Do
not know how to test with tmem. I do not see how it would be worse then current code that does not auto increase. I.E. even without a xen change, I think something
like this could be done.


My testing shows a free 32 pages that I am not sure where they come from.  But
the code about is passing my 8 nics of e1000.
I think that raising maxmem a bit higher than necessary is not too bad.
If we really care about it, we could lower the limit after QEMU's
initialization is completed.

Ok. I did find the 32 it is VGA_HOLE_SIZE. So here is what I have which includes
a lot of extra printf.


--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -67,6 +67,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu)
 #endif

 #define BUFFER_IO_MAX_DELAY  100
+#define VGA_HOLE_SIZE (0x20)

 typedef struct XenPhysmap {
     hwaddr start_addr;
@@ -219,6 +220,11 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr)
     xen_pfn_t *pfn_list;
     int i;
     xc_dominfo_t info;
+    unsigned long max_pages, free_pages, real_free;
+    long need_pages;
+    uint64_t tot_pages, pod_cache_pages, pod_entries;
+
+    trace_xen_ram_alloc(ram_addr, size, mr->name);

     if (runstate_check(RUN_STATE_INMIGRATE)) {
         /* RAM already populated in Xen */
@@ -232,13 +238,6 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr)
         return;
     }

-    fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
-            " bytes (%ld Kib) of ram at "RAM_ADDR_FMT
-            " mr.name=%s\n",
-            __func__, size, (long)(size>>10), ram_addr, mr->name);
-
-    trace_xen_ram_alloc(ram_addr, size);
-
     nr_pfn = size >> TARGET_PAGE_BITS;
     pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn);

@@ -246,11 +245,38 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr)
         pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
     }

-    if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) {
+    if ((xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) ||
+       (info.domid != xen_domid)) {
         hw_error("xc_domain_getinfo failed");
     }
-    if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
-                            (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) {
+    max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE;
+    free_pages = max_pages - info.nr_pages;
+    real_free = free_pages;
+    if (free_pages > VGA_HOLE_SIZE) {
+        free_pages -= VGA_HOLE_SIZE;
+    } else {
+        free_pages = 0;
+    }
+    need_pages = nr_pfn - free_pages;
+    fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
+            " bytes (%ld KiB) of ram at "RAM_ADDR_FMT
+            " mp=%ld fp=%ld nr=%ld need=%ld mr.name=%s\n",
+            __func__, size, (long)(size>>10), ram_addr,
+           max_pages, free_pages, nr_pfn, need_pages,
+            mr->name);
+    if (xc_domain_get_pod_target(xen_xc, xen_domid, &tot_pages,
+                                 &pod_cache_pages, &pod_entries) >= 0) {
+        unsigned long populated = tot_pages - pod_cache_pages;
+        long delta_tot = tot_pages - info.nr_pages;
+
+ fprintf(stderr, "%s: PoD pop=%ld tot=%ld(%ld) cnt=%ld ent=%ld nop=%ld free=%ld\n",
+                __func__, populated, (long)tot_pages, delta_tot,
+                (long)pod_cache_pages, (long)pod_entries,
+               (long)info.nr_outstanding_pages, real_free);
+    }
+    if ((free_pages < nr_pfn) &&
+       (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
+                            (need_pages * XC_PAGE_SIZE / 1024)) < 0)) {
         hw_error("xc_domain_setmaxmem failed");
     }
if (xc_domain_populate_physmap_exact(xen_xc, xen_domid, nr_pfn, 0, 0, pfn_list)) {


Note: I already had a trace_xen_ram_alloc, here is the delta in trace-events (which
has others not yet sent):



diff --git a/trace-events b/trace-events
index eaeecd5..a58fc11 100644
--- a/trace-events
+++ b/trace-events
@@ -908,7 +908,7 @@ pvscsi_tx_rings_ppn(const char* label, uint64_t ppn) "%s page: %"PRIx64"" pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of %s pages: %u"

 # xen-all.c
-xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx, size %#lx" +xen_ram_alloc(unsigned long ram_addr, unsigned long size, const char* mr_name) "requested: %#lx size %#lx mr->name=%s" xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) "%#"PRIx64" size %#lx, log_dirty %i" handle_ioreq(void *req, uint32_t type, uint32_t dir, uint32_t df, uint32_t data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p type=%d dir=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d size=%d" handle_ioreq_read(void *req, uint32_t type, uint32_t df, uint32_t data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p read type=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d size=%d"


   -Don Slutz


    -Don Slutz


+        hw_error("xc_domain_setmaxmem failed");
+    }
       if (xc_domain_populate_physmap_exact(xen_xc, xen_domid, nr_pfn, 0,
0, pfn_list)) {
           hw_error("xen: failed to populate ram at " RAM_ADDR_FMT,
ram_addr);
       }

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.