[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] shadow2 corrupting PV guest state



Hi,

You (jeremy) said:
> I've been fighting random crashes in the paravirt tree for a while.  
> After a fair amount of head-banging, it  looks to me like the shadow2 
> code is trashing the guest stack (and maybe register state) at random 
> points.

  I have a question about shadow2 in another point of view.

  I've been porting PV-on-HVM driver for ia64 platform. In my jobs,
I had a doubt that shadow2 might occur a problem of memory corruption.

  At first, I had found the problem as a hypervisor crash during
destruction of HVM domain with active VNIF on ia64 platform. The
reason of crash was that hypervisor detected P2M table used by 
gnttab_copy in the HVM domain destruction. Thus I looked for a way
to avoid hypervisor crash in x86 code.

  So, I found that:

  * Before shadow2 age, x86 and ia64 use same logic for domain
    destruction.
    - at first, release gnttab references
    - destruct page table for VCPU
    - destruct P2M table for domain
    - relinquish memory for domain

  * After shadow2 age, x86 introduces delayed P2M table destruction.
    - release gnttab references
    - destruct page table for VCPU
    - relinquish memory for domain
    - destruct P2M table for domain in domain_destroy()
    *** I don't have confidence in my investigation. 
    *** Am I right ?

  I try to show the code that...

[common/domain.c]
   203  void domain_kill(struct domain *d)
   204  {
   205      domain_pause(d);
   206
   207      if ( test_and_set_bit(_DOMF_dying, &d->domain_flags) )
   208          return;
   209
   210      gnttab_release_mappings(d);
   211      domain_relinquish_resources(d);
   212      put_domain(d);
   213
   214      send_guest_global_virq(dom0, VIRQ_DOM_EXC);
   215  }

[arch/x86/domain.c]
   930  void domain_relinquish_resources(struct domain *d)
   931  {
   932      struct vcpu *v;
   933      unsigned long pfn;
       ....
   937      /* Drop the in-use references to page-table bases. */
   938      for_each_vcpu ( d, v )
       ....
   979      /* Relinquish every page of memory. */
   980      relinquish_memory(d, &d->xenpage_list);
   981      relinquish_memory(d, &d->page_list);
       ....

  This is the code for domain_kill phase. I think that hypervisor
relinquishes memory for domain in this code.

  In the other hand...

[common/domain.c]
   322  /* Release resources belonging to task @p. */
   323  void domain_destroy(struct domain *d)
   324  {
   325      struct domain **pd;
   326      atomic_t      old, new;
       ....
   354      arch_domain_destroy(d);
   355
   356      free_domain(d);
   357
   358      send_guest_global_virq(dom0, VIRQ_DOM_EXC);
   359  }

[arch/x86/domain.c]
   237  void arch_domain_destroy(struct domain *d)
   238  {
   239      shadow_final_teardown(d);
      ....

[arch/x86/mm/shadow/common.c]
  2580  void shadow_final_teardown(struct domain *d)
  2581  /* Called by arch_domain_destroy(), when it's safe to pull down the p2m
map. */
  2582  {
      ....
  2597      /* It is now safe to pull down the p2m map. */
  2598      if ( d->arch.shadow.p2m_pages != 0 )
  2599          shadow_p2m_teardown(d);

  In this code, P2M table are released.

  If my speculation is correct, shadow2 may occur a problem of memory
corruption.

  What do you think about this point ?

Thanks,
- Tsunehisa Doi

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.