WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Consult some concepts about shadow paging mechanism

To: Gianluca Guida <gianluca.guida@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] Consult some concepts about shadow paging mechanism
From: Jui-Hao Chiang <windtracekimo@xxxxxxxxx>
Date: Sun, 3 May 2009 09:39:26 -0400
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: Sun, 03 May 2009 06:41:10 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=h0X2Ovo1AFIjgJFVkp4QaEORWr2mpfMa647fnOU79WM=; b=VsEm1JxOg40Wp5N2C43cKigxcQLd+uwt0E/0WjNT+emt9PIGZCrQakEDhVO7ToUMqy 7eNlF21t+Fd9p2wdVWEeHuCT+CvEHYfPqOLcHLh8IZWRPqe8EG8vi0r6Eu0Ujvdv7eZJ iYU6Rj+XJBmsAg7mcbRmanYlSmkoHFPlzgo5w=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=vSWUT0kdUk63ZhALjgHTJd1THscoKoHc7Eelpgqow2rP52nanYP7gyMnX9QSG7cK9Q tlLOiFj+RQgFbvWEjtPSxsIVUavdBZ5hFDwr0aq55ppiVF3lUQSJf9fis0EFMhRXMcM9 rL16+L0wjZI9bW7TffZTxEPkx8MO0zjKudZvg=
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <52cf60ee0905011947v5a7ca8a2k1bee8e9bd9e2c763@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <52cf60ee0904220614m343aa6c2v13c244fc878825f7@xxxxxxxxxxxxxx> <f8877f640904230846t256a35bfq191a19a4c9e43a4b@xxxxxxxxxxxxxx> <52cf60ee0904232123n6e38711fjbd8fe084c464b059@xxxxxxxxxxxxxx> <f8877f640904240632qaf5cff9v181e20d279f2cbc1@xxxxxxxxxxxxxx> <52cf60ee0905011947v5a7ca8a2k1bee8e9bd9e2c763@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
I got the answer because I made a mistake to pass four sl2mfn entries
in v->arch.paging.shadow.l3table[] to sh_walk_l2_table().
Then truth is I only need to pass v->arch.paging.shadow.l3table[0]
because SHADOW_FOREACH_L2E has already done a good job on looping the
four sl2mfns.

But I have another doubt in traversing SPT from level 3, level 2, and level1.
When I am traversing down to the level 1 SPT, I found several
inconsistency between gl1e and sl1e content, which is the same as the
mechanism in sh_audit_l1_table(). Is this a normal case? I thought
they should keep consistent at all times.

My purpose is to walk down the SPT and GPT during each process context
switch (sh_update_cr3), and do some statistics first, e.g. dirty,
access, present bit.

Now I tried another checking in level 2 SPT by skipping those sl1mfn
which does not pass sh_mfn_is_a_page_table(sl1mfn) check, then the
inconsistency is gone is level 1 SPT traversing.

Can anyone show some hint about how to do the right thing? Is there
some special type of SPTE that I should not traverse down?

Many thanks,
Jui-Hao



On Fri, May 1, 2009 at 10:47 PM, Jui-Hao Chiang <windtracekimo@xxxxxxxxx> wrote:
> Hi, sorry for disturbing you guys again.
>
> Assume guest's paging level is 2 and shadow is using level 3 PAE.
> I am now trying to dump the L2 shadow page table information in the
> beginning of sh_update_cr3() as the following (actually copying the
> code from sh_audit_l2_table and audit_gfn_to_mfn functions)
>
> The code accidentally crashes in  guest_l2e_get_flags(*gl2e) of the
> sh_walk_l2_table I wrote.
> However, the weird part is the code doesn't crash in gfn =
> guest_l2e_get_gfn(*gl2e) which is accessing the *gl2e in a similar way
> as guest_l2e_get_flags.
>
> static inline mfn_t
> convert_gfn_to_mfn(struct vcpu *v, gfn_t gfn, mfn_t gmfn)
> {
>    p2m_type_t p2mt;
>    if ( !shadow_mode_translate(v->domain) )
>        return _mfn(gfn_x(gfn));
>
>    if ( (mfn_to_page(gmfn)->u.inuse.type_info & PGT_type_mask)
>         != PGT_writable_page )
>        return _mfn(gfn_x(gfn)); // This is a paging-disabled shadow
>    else
>        return gfn_to_mfn(v->domain, gfn, &p2mt);
> }
>
> /* JuiHao: walk the l2 shadow page table based on input sl2mfn */
> static int sh_walk_l2_table(struct vcpu *v, mfn_t sl2mfn, mfn_t x)
> {
>        guest_l2e_t *gl2e, *gp;
>        shadow_l2e_t *sl2e;
>        mfn_t sl1mfn, gl2mfn;
>        gfn_t gfn;
>        mfn_t gmfn;
>        int done = 0;
>
>        /* Follow the backpointer in struct shadow_page_info to get guest 
> l2mfn */
>        gl2mfn = _mfn(mfn_to_shadow_page(sl2mfn)->backpointer);
>        gl2e = gp = sh_map_domain_page(gl2mfn);
>
>        SHADOW_FOREACH_L2E(sl2mfn, sl2e, &gl2e, done, v->domain, {
>
>                gfn = guest_l2e_get_gfn(*gl2e);  // ###!!!! Works Fine 
> !!!!!####
>                sl1mfn = shadow_l2e_get_mfn(*sl2e);
>
>                if (mfn_valid(sl1mfn) && (shadow_l2e_get_flags(*sl2e) & 
> _PAGE_PRESENT)) {
>
>                        // We get this gmfn is just to double check if this is 
> equal to sl1mfn
>                        gmfn = (guest_l2e_get_flags(*gl2e) & _PAGE_PSE) // 
> ###!!!! CRASH !!!!!####
>                                ? get_fl1_shadow_status(v, gfn)
>                                : get_shadow_status(v, convert_gfn_to_mfn(v, 
> gfn, gl2mfn),
>                                SH_type_l1_shadow);
>
>                        if (mfn_x(gmfn) != mfn_x(sl1mfn)) {
>                                printk("!! gmfn %" PRI_mfn " != sl1mfn %" 
> PRI_mfn "\n", gmfn, sl1mfn);
>                        } else {
>                                printk("going down to traverse level 1 SPT\n");
>                        }
>                }
>
>        });
>        sh_unmap_domain_page(gp);
>        return 0;
> }
>
> Could you help a little bit on this?
> Many thanks,
> Jui-Hao
>
> On Fri, Apr 24, 2009 at 9:32 AM, Gianluca Guida
> <gianluca.guida@xxxxxxxxxxxxx> wrote:
>> On Fri, Apr 24, 2009 at 6:23 AM, Jui-Hao Chiang <windtracekimo@xxxxxxxxx> 
>> wrote:
>>> I have some additional doubts as the following:
>>> (1) For normal data page, in order to propagate the Dirty or Access
>>> bit from SPTE to GPTE, the hypervisor needs to set Read-Only in the
>>> SPTE. When the write page fault of this data page comes, hypervisor
>>> can propagate the Dirty or Access bit to GPTE and set it to R/W. My
>>> question is when does the hypervisor make it Read-Only again? Is there
>>> any place inside the source code you can point out?
>>
>> What happens is this: the guest has to clear the dirty/accessed bit
>> and then flush the tlb (or invlpg the entry).
>> If the pagetable is mapped read only (as in levels > 1) the write to
>> the pagetable will trigger the emulator that will update the entry.
>> Otherwhise (if the page is out of sync, which means a writable guest
>> pagetable, and this happens when it's an L1) the flushtlb will do the
>> job of updating the shadow entry.
>>
>> Look at how sh_propagate function works and when it get called. It's
>> what you're looking for.
>>
>>> (2) How many shadow pages are maintained for each guest domain? If the
>>> hypervisor keep only one shadow page table for the active process in
>>> each guest domain, then during the guest context-switch, it might
>>> erase the entire shadow page table, and re-construct it for the new
>>> process, which seems a lot of overhead. I have checked the
>>> sh_update_cr3(), but not sure of the detailed mechanism.
>>
>> There's a pool of shadow memory that get reused in a pseudo-LRU
>> manner. Across cr3 switch toplevel pagetables are kept in memory, and
>> unshadowed when evicted by the allocator or when other things happens,
>> mostly based on heuristic and reference counting.
>>
>> Thanks,
>> Gianluca
>>
>> --
>> It was a type of people I did not know, I found them very strange and
>> they did not inspire confidence at all. Later I learned that I had been
>> introduced to electronic engineers.
>>                                                  E. W. Dijkstra
>>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel