[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Realloc VM's memory while running

Hi all,

I am working in a project in which we try to switch domain's underlying machine memory(MFNs) for another "chunk" of the same size while the VM is running. This can be useful for example when a domain running a memory intensive load experiences performance penalties(e.g: lot of cache misses); by switching domain's memory for a "chunk" of memory that is allocated(assuming the allocator is able to take into account the relation between memory address and the cache lines) such that the number of misses decreases.

The implementation is mostly done, however we have some issues that we're stuck on, so I'm asking for some help.

So, the implementation works in the following way:
S1. Pause domain

S2. Allocate new memory for domain

S3. Setup the 'new P2M' table for the newly allocated memory using 'old P2M' table (based on a 1-to-1 mapping of oldP2M[i] and newP2M[i]).

For this purpose the pages a domain owns are split into 3 types: PT, WR and P2M - which in fact ar also WR pages but they store the P2M(using the pfn_to_mfn_frame_list_list field of the shared data between Xen and domains). Accordingly, WR pages need only to be copied to the corresponding new page while PT and P2M pages require for each entry/element in the page to find their mapping in the new P2M and write it down at the same entry location on the PT or P2M page in the "new" memory.

S4. For each page copy old page's metadata information(like count and type info) to the matching new page.

S5. Update the fields of domain's data structures pointing to MFNs (i.e.: domain.arch.pirq_eoi_map_mfn, domain.shared.arch.pfn_to_mfn_frame_list_list, vcpu.arch.[guest_table, guest_table_user], vcpu.vcpu_info_mfn)

S6.  Release domain's old memory by using relinquish_memory() in a loop, in a similar manner like in relinquish_resources() but for memory(L4, L3, L2) only. 

S7. Update M2P table to reflect the changes

S8. Assign the new memory pages to domain.

S9. Unpause the domain.

The problems we face:

P1. It seems that translation of PTs and P2Ms works well - we wrote some test scripts dumping and comparing domain's memory before and after the memory switch(in order for this test to work we pause domain from console at the beginning of the test - the unpause_doimain function called in implementation should have no effect). However, for some RAW pages, ~ 1% of the total number of pages of a 64MB domain, we can see some differences in their content.

The question is, does somebody else touch a domain's memory once it is paused ?
It is a 1VCPU domain.

P2. Some of the old pages(~ 2-3%) doesn't seem to be released. It looks that this happens due to count/type info constraints, with the following error:

d0v1 Error pfn 42b625: rd = 1, od = 32756 caf = 1c00000000000000 taf=7400000000000001

I guess it is in response to a page_get on a page that does not belong to a domain anymore, but that shouldn't normally happen ... or am I wrong ?

P3. Trying to connect to a domain's console after it has been unpaused doesn't work. We also run ping, but machine is not reachable. How can we debug this issue ? Are there any changes to be done in the PV OS(Linux) running on the domain ? Our assumptions were that as long as the domain has direct memory access and uses  P2M and M2P tables, the changes will be visible to the OS.

Thank you in advance.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.