|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
On 07/04/17 15:05, Yu Zhang wrote:
>
>
> On 4/7/2017 9:56 PM, George Dunlap wrote:
>> On 07/04/17 12:31, Jan Beulich wrote:
>>>>>> On 07.04.17 at 12:55, <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:
>>>> On 4/7/2017 6:26 PM, Jan Beulich wrote:
>>>>>>>> On 07.04.17 at 11:53, <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:
>>>>>> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>>>>>>> On 06.04.17 at 17:53, <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:
>>>>>>>> @@ -965,7 +987,7 @@ static mfn_t ept_get_entry(struct p2m_domain
>>>>>>>> *p2m,
>>>>>>>> if ( is_epte_valid(ept_entry) )
>>>>>>>> {
>>>>>>>> if ( (recalc || ept_entry->recalc) &&
>>>>>>>> - p2m_is_changeable(ept_entry->sa_p2mt) )
>>>>>>>> + p2m_check_changeable(ept_entry->sa_p2mt) )
>>>>>>> I think the distinction between these two is rather arbitrary, and I
>>>>>>> also think this is part of the problem above: Distinguishing
>>>>>>> log-dirty
>>>>>>> from ram-rw requires auxiliary data to be consulted. The same
>>>>>>> ought to apply to ioreq-server, and then there wouldn't be a need
>>>>>>> to have two p2m_*_changeable() flavors.
>>>>>> Well, I think we have also discussed this quite long ago, here is
>>>>>> the link.
>>>>>> https://lists.xen.org/archives/html/xen-devel/2016-09/msg01017.html
>>>>> That was a different discussion, namely not considering this ...
>>>>>
>>>>>>> Of course the subsequent use p2m_is_logdirty_range() may then
>>>>>>> need amending.
>>>>>>>
>>>>>>> In the end it looks like you have the inverse problem here compared
>>>>>>> to above: You should return ram-rw when the reset was already
>>>>>>> initiated. At least that's how I would see the logic to match up
>>>>>>> with
>>>>>>> the log-dirty handling (where the _effective_ rather than the last
>>>>>>> stored type is being returned).
>>>>> ... at all.
>>>> Sorry I still do not get it.
>>> Leaving ioreq-server out of the picture, what the function returns
>>> to the caller is the type as it would be if a re-calculation would have
>>> been done on the entry, even if that hasn't happened yet (the
>>> function here shouldn't change any entries after all). I think we
>>> want the same behavior here for what have been ioreq-server
>>> entries (and which would become ram-rw after the next re-calc).
>> How about something like the attached? (In-lined for convenience.)
>>
>> -George
>>
>> After an ioreq server has unmapped, the remaining p2m_ioreq_server
>> entries need to be reset back to p2m_ram_rw. This patch does this
>> asynchronously with the current p2m_change_entry_type_global()
>> interface.
>>
>> New field entry_count is introduced in struct p2m_domain, to record
>> the number of p2m_ioreq_server p2m page table entries. One nature of
>> these entries is that they only point to 4K sized page frames, because
>> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
>> p2m_change_type_one(). We do not need to worry about the counting for
>> 2M/1G sized pages.
>>
>> This patch disallows mapping of an ioreq server, when there's still
>> p2m_ioreq_server entry left, in case another mapping occurs right after
>> the current one being unmapped, releases its lock, with p2m table not
>> synced yet.
>>
>> This patch also disallows live migration, when there's remaining
>> p2m_ioreq_server entry in p2m table. The core reason is our current
>> implementation of p2m_change_entry_type_global() lacks information
>> to resync p2m_ioreq_server entries correctly if global_logdirty is
>> on.
>>
>> We still need to handle other recalculations, however; which means
>> that when doing a recalculation, if the current type is
>> p2m_ioreq_server, we check to see if p2m->ioreq.server is valid or
>> not. If it is, we leave it as type p2m_ioreq_server; if not, we reset
>> it to p2m_ram as appropriate.
>>
>> To avoid code duplication, lift recalc_type() out of p2m-pt.c and use
>> it for all type recalculations (both in p2m-pt.c and p2m-ept.c).
>>
>> Signed-off-by: Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx>
>> Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxx>
>> Reviewed-by: Paul Durrant <paul.durrant@xxxxxxxxxx>
>> ---
>> xen/arch/x86/hvm/ioreq.c | 8 ++++++
>> xen/arch/x86/mm/hap/hap.c | 9 ++++++
>> xen/arch/x86/mm/p2m-ept.c | 73
>> +++++++++++++++++++++++++++++------------------
>> xen/arch/x86/mm/p2m-pt.c | 58 ++++++++++++++++++++++++-------------
>> xen/arch/x86/mm/p2m.c | 9 ++++++
>> xen/include/asm-x86/p2m.h | 26 ++++++++++++++++-
>> 6 files changed, 135 insertions(+), 48 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
>> index 5bf3b6d..07a6c26 100644
>> --- a/xen/arch/x86/hvm/ioreq.c
>> +++ b/xen/arch/x86/hvm/ioreq.c
>> @@ -955,6 +955,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain
>> *d, ioservid_t id,
>>
>> spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
>>
>> + if ( rc == 0 && flags == 0 )
>> + {
>> + struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> + if ( read_atomic(&p2m->ioreq.entry_count) )
>> + p2m_change_entry_type_global(d, p2m_ioreq_server,
>> p2m_ram_rw);
>> + }
>> +
>> return rc;
>> }
>>
>> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
>> index a57b385..4b591fe 100644
>> --- a/xen/arch/x86/mm/hap/hap.c
>> +++ b/xen/arch/x86/mm/hap/hap.c
>> @@ -187,6 +187,15 @@ out:
>> */
>> static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
>> {
>> + struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> + /*
>> + * Refuse to turn on global log-dirty mode if
>> + * there are outstanding p2m_ioreq_server pages.
>> + */
>> + if ( log_global && read_atomic(&p2m->ioreq.entry_count) )
>> + return -EBUSY;
>> +
>> /* turn on PG_log_dirty bit in paging mode */
>> paging_lock(d);
>> d->arch.paging.mode |= PG_log_dirty;
>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>> index cc1eb21..9f41067 100644
>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -533,6 +533,7 @@ static int resolve_misconfig(struct p2m_domain *p2m,
>> unsigned long gfn)
>> {
>> for ( gfn -= i, i = 0; i < EPT_PAGETABLE_ENTRIES; ++i )
>> {
>> + p2m_type_t nt;
>> e = atomic_read_ept_entry(&epte[i]);
>> if ( e.emt == MTRR_NUM_TYPES )
>> e.emt = 0;
>> @@ -542,10 +543,15 @@ static int resolve_misconfig(struct p2m_domain
>> *p2m, unsigned long gfn)
>> _mfn(e.mfn), 0, &ipat,
>> e.sa_p2mt ==
>> p2m_mmio_direct);
>> e.ipat = ipat;
>> - if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>> - {
>> - e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn +
>> i, gfn + i)
>> - ? p2m_ram_logdirty : p2m_ram_rw;
>> + nt = p2m_recalc_type(e.recalc, e.sa_p2mt, p2m, gfn
>> + i);
>> + if ( nt != e.sa_p2mt ) {
>> + if ( e.sa_p2mt == p2m_ioreq_server )
>> + {
>> + ASSERT(p2m->ioreq.entry_count > 0);
>> + p2m->ioreq.entry_count--;
>> + }
>> +
>> + e.sa_p2mt = nt;
>> ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt,
>> e.access);
>> }
>> e.recalc = 0;
>> @@ -562,23 +568,24 @@ static int resolve_misconfig(struct p2m_domain
>> *p2m, unsigned long gfn)
>>
>> if ( recalc && p2m_is_changeable(e.sa_p2mt) )
>> {
>> - unsigned long mask = ~0UL << (level *
>> EPT_TABLE_ORDER);
>> -
>> - switch ( p2m_is_logdirty_range(p2m, gfn & mask,
>> - gfn | ~mask) )
>> - {
>> - case 0:
>> - e.sa_p2mt = p2m_ram_rw;
>> - e.recalc = 0;
>> - break;
>> - case 1:
>> - e.sa_p2mt = p2m_ram_logdirty;
>> - e.recalc = 0;
>> - break;
>> - default: /* Force split. */
>> - emt = -1;
>> - break;
>> - }
>> + unsigned long mask = ~0UL << (level *
>> EPT_TABLE_ORDER);
>> +
>> + ASSERT(e.sa_p2mt != p2m_ioreq_server);
>> + switch ( p2m_is_logdirty_range(p2m, gfn & mask,
>> + gfn | ~mask) )
>> + {
>> + case 0:
>> + e.sa_p2mt = p2m_ram_rw;
>> + e.recalc = 0;
>> + break;
>> + case 1:
>> + e.sa_p2mt = p2m_ram_logdirty;
>> + e.recalc = 0;
>> + break;
>> + default: /* Force split. */
>> + emt = -1;
>> + break;
>> + }
>
>
> So here I guess only the "ASSERT(e.sa_p2mt != p2m_ioreq_server);" is
> new, right?
Yes, that's right -- the whole block was indented wrong; when I added
the ASSERT() my editor aligned it correctly, and I couldn't very well
leave it that way. :-)
-George
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |