[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device



>>> Monday, September 28, 2015 2:47 PM,<JBeulich@xxxxxxxx> wrote:
> >>> On 28.09.15 at 05:08, <quan.xu@xxxxxxxxx> wrote:
> >>>> Thursday, September 24, 2015 12:27 AM, Tim Deegan wrote:

> It would be a guest kernel bug, but all _we_ care about is that such a guest 
> kernel
> bug won't affect the hypervisor or other guests.

It won't affect the hypervisor or other guest domains.
As the required Device-TLB flushes are not applied, the hypercall is not 
completed. The being freed page is still owned by this buggy
Guest, not released back to xen or reallocated for other guests.


> You need to answer the
> question (perhaps just for yourself) taking into account Tim's suggestion to 
> hold
> references to all pages mapped by the IOMMU page tables. 

It is safe and complex.
But if Tim can ack all of my memory analysis, does my solution work for 
upstream?


For Tim's suggestion --"to make the IOMMU table take typed refcounts to
anything it points to, and only drop those refcounts when the flush completes."

From IOMMU point of view, if it can walk through IOMMU table to get these pages 
and take typed refcounts. 
These pages are maybe owned by hardware_domain, dummy, HVM guest .etc. could I 
narrow it down to HVM guest? --- It is not for anything it points to, but just 
for HVM guest related. this will simplify the design.

from HVM guest point of view, once the ATS device is assigned, we can: 
*pause the HVM guest domain.
*scan domain's xenpage_list, page_list and arch.relmem_list to get these pages, 
which will be took typed refcounts ( PGT_dev_tlb_page -- a new type).
*unpause the HVM guest domain.

(we can ignore domain's xenpage_list) as:
((
   Actually, the previous pages are maybe mapped from Xen heap for guest 
domains in decrease_reservation() / xenmem_add_to_physmap_one()
   / p2m_add_foreign(), but they are not mapped to IOMMU table. The below 4 
functions will map xen heap page for guest domains:
          * share page for xen Oprofile.
          * vLAPIC mapping.
          * grant table shared page.
          * domain share_info page.
))


* Once assigned a new page, if the ATS device is assigned, we should also take 
typed refcounts ( PGT_dev_tlb_page).
* Once freed a page, the ATS device is assigned, we should check the typed 
refcounts ( PGT_dev_tlb_page) in free_domheap_pages()
  If the typed refcounts is PGT_dev_tlb_page, the page should be hold in a page 
list per-domain and freed in QI interrupt handler.


 Just for check, do typed refcounts refer to the following?

--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -183,6 +183,7 @@ struct page_info
 #define PGT_seg_desc_page PG_mask(5, 4)  /* using this page in a GDT/LDT?  */
 #define PGT_writable_page PG_mask(7, 4)  /* has writable mappings?         */
 #define PGT_shared_page   PG_mask(8, 4)  /* CoW sharable page              */
+#define PGT_dev_tlb_page  PG_mask(9, 4)  /* Maybe in Device-TLB mapping?   */
 #define PGT_type_mask     PG_mask(15, 4) /* Bits 28-31 or 60-63.           */




* I define a new typed refcounts PGT_dev_tlb_page.



> Once you do that, I
> don't think there'll be a reason to pause the guest for the duration of the 
> flush.
> And really (as pointed out before) pausing the guest would get us _far_ away
> from how real hardware behaves.
> 

Once I do that, I think the guest should be still paused, if the Device-TLB 
flush is not completed.

As mentioned in previous email, for example:
Call do_memory_op HYPERCALL to free a pageX (gfn1 <---> mfn1). The gfn1 is the 
freed portion of GPA.
assume that there is a mapping(gfn1<---> mfn1) in Device-TLB. If the Device-TLB 
flush is not completed and return to guest mode,
the guest may call do_memory_op HYPERCALL to allocate a new pageY(mfn2) to 
gfn1..
then:
the EPT mapping is (gfn1--mfn2), the Device-TLB mapping is (gfn1<--->mfn1) .

If the Device-TLB flush is not completed, DMA associated with gfn1 may still 
write some data with pageX(gfn1 <---> mfn1), but pageX will be 
Released to xen when the Device-TLB flush is completed. It is maybe not correct 
for guest to read data from gfn1 after DMA(now the page associated with gfn1 is 
pageY ).

Right?


> The only possibly tricky thing will be how to know in the flush completion 
> handler
> which pages to drop references for, as it doesn't look like you'd be able to 
> put
> them on a list without allocating extra memory fro tracking (and allocation 
> in turn
> would be bad as it can fail).
> 

* Once freed a page, the ATS device is assigned, we should check the typed 
refcounts ( PGT_dev_tlb_page) in free_domheap_pages()
  If the typed refcounts is PGT_dev_tlb_page, the page should be hold in a page 
list per-domain and freed in QI interrupt handler.


> > I didn't make the IOMMU table to take typed refcount to anything it
> > points to. This is really complex.
> 
> But unavoidable I think, and with that I'm not sure it makes a lot of sense 
> to do
> further (detailed) review of the initial version of the series.
> 

If it is unavoidable for upstream, I think the patch 0001--0005, 0013 IOMMU 
related are good. I should design and modify the other part.
Jan, thanks for your help.


Quan 

> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.