[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] paging mechanism clarification





-----Original Message-----
From: M.A. Williamson on behalf of Mark Williamson
Sent: Wed 14-Mar-07 10:26 AM
To: Pradeep Singh, TLS-Chennai
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Petersson, Mats
Subject: Re: [Xen-devel] paging mechanism clarification

>>  Not for paravirtualised (Xen-aware) guests.  They handle their own
>>  translations, which makes it possible to eliminate shadow paging entirely
>> for them; this is a benefit to performance.  Shadow pagetables are normally
>> only used for a PV guest when live migration is in progress.
>
>  Now again i am confused :-(.
>  you mean shadow tables are not used at all in case of PV guests if i am
> not using live migration at all? They are used as default only in case of
> HVM guests, with or without live migration right?

Normally, PV guests manage the real hardware pagetable so there are no shadow
pagetables.  Xen just limits what they're allowed to do with their pagetables
to prevent abuse.

If you live migrate a PV guest then some shadow paging gets slid in
temporarily underneath it to enable the migration daemon to monitor the
memory activity of the guest.

HVM needs to use shadow paging all the time because the guest doesn't know its
running on Xen and needs to be shielded from the real memory layout, etc.

affirmative.

>  One more thing, where can i find code related to this shadow page table
> handling in my source code? Couldn't find even the shadow page struct in my
> source.but it was available on the lxr repo on xensource.

arch/x86/mm.c
arch/x86/mm/*
arch/x86/mm/shadow/*

Are probably good places to start looking...  I'm not too familiar with where
all the code for this functionality is so there may be other places you
should also look (for starters there is an arch/x86/x86_32/mm.c and an
arch/x86/x86_64/mm.c).

Thank you for helping me out.

--pradeep

Cheers,
Mark

>  Thank you very much
>  --pradeep
>
>  HTH,
>  Cheers,
>  Mark
>
>  >  Thank you
>  >  --pradeep
>  >
>  >  In PV mode, pages that are currently part of a pagetable are only ever
>  > allowed to be mapped readonly in order to prevent tampering by the
>  > guest.
>  >
>  >  Cheers,
>  >  Mark
>  >
>  >  > > > I hope i made myself clear.
>  >  > > > Please enlighten me :-).
>  >  > > >
>  >  > > > When paging is enabled, we use a shadow page-table, which is
>  >  > > > essentially
>  >  > > > that the GUEST sees one page-table, and the processor another
>  >  > > > (thanks to
>  >  > > > the fact that the hypervisor intercepts the CR3 read/write
>  >  > >
>  >  > > operations,
>  >  > >
>  >  > > > and when CR3 is read back by the guest, we don't send back the
>  > value > > > it's ACTUALLY POINTING TO IN THE PROCESSOR, but the value >
>  > >
>  >  > > that was set
>  >  > >
>  >  > > > by the guest). So there are two page-tables.
>  >  > > >
>  >  > > > Got this well, thanks Mats :).
>  >  > > >
>  >  > > > To make the page-table updates by the guest visible to the
>  >  > >
>  >  > > hypervisor,
>  >  > >
>  >  > > > all of the guest-page-tables are made read-only (by scanning
>  >  > > > the new CR3
>  >  > > > value whenever one is set).
>  >  > > >
>  >  > > > I didn't get this either well :(
>  >  > > > sorry, but do you mean CR3 for the guest or for the
>  >  > > > processor? i hope you mean guest?
>  >  > >
>  >  > > Yes, scan the guest-CR3 to see where it placed the page-tables.
>  >  > >
>  >  > > > Whenever a page-fault happens, the hypervisor has "first look",
>  > and > > > determines if the update is for a page-table or not. If it is
>  > a > > > page-table update, the guest operation is emulated (in
>  >  > >
>  >  > > x86_emulate.c),
>  >  > >
>  >  > > > and the result is written to the shadow-page-table AND the
>  >  > > >
>  >  > > > Why do we need emulation?some peculiar reason for emulating?
>  >  > > > Do you mean to say if i am running a 32 bit domU on top of a
>  >  > > > 64 bit processor, the guest operation for updating the page
>  >  > > > table is emulated by the hypervisor.am i right?
>  >  > >
>  >  > > No, it's simply because we need to see the result of the
>  >  > > instruction and
>  >  > > write it to two places (with some modification in one of
>  >  > > those places).
>  >  > > So if the code is doing, for example: "*pte |= 1;" (set a
>  >  > > page-table-entry to "present"), we need to mark both the
>  >  > > guest-page-table-entry to "present", and mark our
>  >  > > shadow-entry "present"
>  >  > > (and perhaps do some other work too, but that's the minimum work
>  >  > > needed).
>  >  > >
>  >  > > This brings one more question in my mind.Why do we use pinning
>  > then? >
>  >  > I believe there's two types of pinning! Page-pinning, which is
>  > blocking > a page from being accessed in an incorrect way [again, I'm
>  > not 100% sure > how this works, or exactly what it does - just that it's
>  > a term used in > the general way I described in the previous sentence].
>  >  >
>  >  > > As i see at it.To avoid shadow page tables to be swapped out
>  >  > > before the page tables they actually point to are swapped.Am i
>  > right? > >
>  >  > > But according to interface manual,-> to bind a vcpu to a
>  >  > > specific CPU in a SMP environment we use pining.But these two
>  >  > > look pretty orthogonal statements to me, which means i may be
>  >  > > wrong :(.
>  >  > > Can somebody help me in this regard?
>  >  >
>  >  > CPU pinning is to tie a VCPU to a (set of) processor(s). For example,
>  >  > you may want to pin Dom0 to run only on CPU0, and pin a DomU to run
>  > on > CPU's 1,2 and 3. That way, Dom0 is ALWAYS able to run on it's own
>  > CPU, > and it's never in contention about which CPU to use, and DomU can
>  > run on > three CPU's as much as it likes. You could have another DomU
>  > pinned to > CPU 3 if you wish. That means that CPU 1, 2 are exclusively
>  > for the > first DomU, whilst the second DomU shares CPU3 with the first
>  > DomU (so > they both get half the CPU performance of one CPU - on
>  > average over a > reasonable amount of time).
>  >  >
>  >  > --
>  >  > Mats
>  >  >
>  >  > > Pointers to actual code will be of great help.
>  >  > >
>  >  > > Thanks a lot Mats.
>  >  > > Thank you all.
>  >  > >
>  >  > > --pradeep
>  >  > >
>  >  > > > Does this means on a x86 platform this overkill or this
>  >  > > > emulation is skipped altogether?
>  >  > > > Please bear with me as i am an absolute Xen newbie out here :-).
>  >  > >
>  >  > > No, it's ALWAYS used for all page-table writes, as far as I
>  >  > > understand.
>  >  > >
>  >  > > --
>  >  > > Mats
>  >  > >
>  >  > > > guest-page-table, but in the shadow-page-table, the value is
>  >  > > > modified to
>  >  > > > reflect the actual address in machine-space, rather than what
>  >  > > > the guest
>  >  > > > thinks it should be.
>  >  > > >
>  >  > > > In futuer versions of AMD processors (and I believe Intel are
>  >  > > > working on
>  >  > > > something very similar if not the same), there will be a mode
>  >  > > > where the
>  >  > > > processor is able to work in "nested paging mode", which means
>  > that > > > there are two "parallel" page-tables. The first one is the >
>  > > > "guest-page-table", the second one is the "host-page-table". In this
>  > > > > case, every lookup in the guest-page-table will be done through
>  > the > > > host-page-table. So we have a "simple" way to just take the >
>  > > > guest-page-table and translate it to machine-physical-address > > >
>  > - with the
>  >  > > > good thing that the host-page-table needn't change, since the
>  >  > > > pages that
>  >  > > > the host consists of is pretty much static for the duration of
>  > the > > > guest.
>  >  > > >
>  >  > > > Yes, read about about this in an article mention how Pacifica
>  >  > > > is better than VT.
>  >  > > >
>  >  > > > Say for example, we have a guest that lives at 256-512MB. The
>  >  > > > guest-page-table would contain, for example, a mapping for
>  >  > > > 0x12200000 ->
>  >  > > > guest-physical 0x100000 (1MB). The host-page-table
>  >  > >
>  >  > > translates this to
>  >  > >
>  >  > > > 0x10100000 because the 1MB entry in guest-address is 256+1MB in
>  >  > > > machine-address.
>  >  > > >
>  >  > > > Exactly, got this well on spot :).
>  >  > > >
>  >  > > > [In reality, it's very likely that the guest never gets all
>  >  > > > the space in
>  >  > > > one big chunk, but rather a few pages here and a few pages there.
>  > If > > > there are big chunks, we could use large pages to map those!].
>  > > > >
>  >  > > > Thanks a ton Mats and all.
>  >  > > >
>  >  > > > --pradeep
>  >  > > >
>  >  > > > The support for nested paging (called HAP, Hardware Assisted
>  >  > > > Paging) is
>  >  > > > in the Unstable version of Xen since a few days back.
>  >  > > >
>  >  > > > --
>  >  > > > Mats
>  >  > > >
>  >  > > > > And this whole 2 level paging consitutes Xen's shadow page
>  >  > > > > tables. Right?
>  >  > > > >
>  >  > > > > Is my understanding of Xen's paging mechanism correct?or am i
>  >  > > > > missing something?
>  >  > > > >
>  >  > > > > Thank you
>  >  > > > >
>  >  > > > > -pradeep
>  >  >
>  >  > _______________________________________________
>  >  > Xen-devel mailing list
>  >  > Xen-devel@xxxxxxxxxxxxxxxxxxx
>  >  > http://lists.xensource.com/xen-devel
>  >
>  >  --
>  >  Dave: Just a question. What use is a unicyle with no seat?  And no
>  > pedals! Mark: To answer a question with a question: What use is a
>  > skateboard? Dave: Skateboards have wheels.
>  >  Mark: My wheel has a wheel!
>
>  --
>  Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
>  Mark: To answer a question with a question: What use is a skateboard?
>  Dave: Skateboards have wheels.
>  Mark: My wheel has a wheel!

--
Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!

DISCLAIMER:
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. 
Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the 
opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is 
strictly prohibited. If you have 
received this email in error please delete it and notify the sender 
immediately. Before opening any mail and 
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.