[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/3] mwait-idle: add support for using halt



On 3/27/19 9:48 AM, Jan Beulich wrote:
>>>> On 26.03.19 at 22:56, <Brian.Woods@xxxxxxx> wrote:
>> On 3/26/19 10:54 AM, Jan Beulich wrote:
>>>>>> On 19.03.19 at 17:12, <Brian.Woods@xxxxxxx> wrote:
>>>> On 3/15/19 3:37 AM, Jan Beulich wrote:
>>>>> Furthermore I'm then once again wondering what the gain is
>>>>> over using the ACPI driver: The suggested _CST looks to exactly
>>>>> match the data you enter into the table in the later patch. IOW
>>>>> my fundamental concern didn't go away yet: As per the name
>>>>> of the driver, it shouldn't really need to support HLT (or anything
>>>>> other than MWAIT) as an entry method. Hence I think that at
>>>>> the very least you need to extend the description of the change
>>>>> quite a bit to explain why the ACPI driver is not suitable.
>>>>>
>>>>> Depending on how this comes out, it may then still be a matter
>>>>> of discussing whether, rather than fiddling with mwait-idle, it
>>>>> wouldn't be better to have an AMD-specific driver instead. Are
>>>>> there any thoughts in similar directions for Linux?
>>>>
>>>> Because:
>>>> #1 getting the ACPI tables from dom0 is either unreliable (PV dom0) or
>>>> not possible (PVH dom0).
>>>> #2 the changes to the Intel code are minimal.
>>>> #3 worse case, Xen thinks it's using CC6 when it's using CC1.  Not
>>>> perfect but far from fatal or breaking.
>>>
>>> Having thought about this some more, I agree that an AMD-specific
>>> driver would likely go too far. However, that's still no reason to fiddle
>>> with the mwait-idle one - I think you could as well populate the data
>>> as necessary for the ACPI driver to use, removing the dependency
>>> on Dom0. After all that driver already knows of all the entry methods
>>> you may want/need to use (see acpi_idle_do_entry()).
>>>
>> I did a rough example of how that might work and lines of code changed
>> for adding it to cpu_idle was roughly 125.  Seeing as this doesn't
>> compile and doesn't even have comments, I'd say at least 140 lines of
>> code/change (most of those are additive too), a lot of is functionally
>> copied from mwait-idle and how it reads data out of the structures,
>> checks, and populates the cx structures.  The first set of mwait patches
>> is 87 lines changed total.
>>
>> I _could_ try and refactor some of the code and get it down from
>> 125-140, but that would most likely make porting changes even harder for
>> mwait-idle.
> 
> Well, I was rather thinking about something like the change below,
> taking slightly over 100 lines of new code, and not touching
> mwait-idle.c at all. Otoh there are a couple of TBDs in there which
> may cause the patch to further grow once addressed.
> 
> Note that this goes on top of
> https://lists.xenproject.org/archives/html/xen-devel/2018-08/msg00089.html
> which sadly there still wasn't sufficient feedback on to decide where
> to go with the series; all I know is that Andrew (understandably)
> doesn't want to see the last patch go in without vendor confirmation
> (and I'd be fine to drop that last patch if need be, but this shouldn't
> block the earlier patches in the series).
> 
> Jan
> 
> x86/AMD: make C-state handling independent of Dom0
> 
> At least for more recent CPUs, following what BKDG / PPR suggest for the
> BIOS to surface via ACPI we can make ourselves independent of Dom0
> uploading respective data.
> 
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> TBD: Can we set local_apic_timer_c2_ok to true? I can't seem to find any
>       statement in the BKDG / PPR as to whether the LAPIC timer continues
>       running in CC6.
> TBD: We may want to verify that HLT indeed is configured to enter CC6.
> TBD: I guess we could extend this to families older then Fam15 as well.
> 
> --- a/xen/arch/x86/acpi/cpu_idle.c
> +++ b/xen/arch/x86/acpi/cpu_idle.c
> @@ -120,6 +120,8 @@ boolean_param("lapic_timer_c2_ok", local
>   
>   struct acpi_processor_power *__read_mostly processor_powers[NR_CPUS];
>   
> +static int8_t __read_mostly vendor_override;
> +
>   struct hw_residencies
>   {
>       uint64_t mc0;
> @@ -1220,6 +1222,9 @@ long set_cx_pminfo(uint32_t acpi_id, str
>       if ( pm_idle_save && pm_idle != acpi_processor_idle )
>           return 0;
>   
> +    if ( vendor_override > 0 )
> +        return 0;
> +
>       print_cx_pminfo(acpi_id, power);
>   
>       cpu_id = get_cpu_id(acpi_id);
> @@ -1292,6 +1297,98 @@ long set_cx_pminfo(uint32_t acpi_id, str
>       return 0;
>   }
>   
> +static void amd_cpuidle_init(struct acpi_processor_power *power)
> +{
> +    unsigned int i, nr = 0;
> +    const struct cpuinfo_x86 *c = &current_cpu_data;
> +    const unsigned int ecx_req = CPUID5_ECX_EXTENSIONS_SUPPORTED |
> +                                 CPUID5_ECX_INTERRUPT_BREAK;
> +    const struct acpi_processor_cx *cx = NULL;
> +    static const struct acpi_processor_cx fam17[] = {
> +        {
> +            .type = ACPI_STATE_C1,
> +            .entry_method = ACPI_CSTATE_EM_FFH,
> +            .address = 0,
> +            .latency = 1,
> +        },
> +        {
> +            .type = ACPI_STATE_C2,
> +            .entry_method = ACPI_CSTATE_EM_HALT,
> +            .latency = 400,
> +        },
> +    };
> +
> +    if ( pm_idle_save && pm_idle != acpi_processor_idle )
> +        return;
> +
> +    if ( vendor_override < 0 )
> +        return;
> +
> +    switch ( c->x86 )
> +    {
> +    case 0x17:
> +        if ( cpu_has_monitor && c->cpuid_level >= CPUID_MWAIT_LEAF &&
> +             (cpuid_ecx(CPUID_MWAIT_LEAF) & ecx_req) == ecx_req )
> +        {
> +            cx = fam17;
> +            nr = ARRAY_SIZE(fam17);
> +            break;
> +        }
> +        /* fall through */
> +    case 0x15:
> +    case 0x16:
> +        cx = &fam17[1];
> +        nr = ARRAY_SIZE(fam17) - 1;
> +        break;
> +
> +    default:
> +        vendor_override = -1;
> +        return;
> +    }
> +
> +    power->flags.has_cst = true;
> +
> +    for ( i = 0; i < nr; ++i )
> +    {
> +        if ( cx[i].type > max_cstate )
> +            break;
> +        power->states[i + 1] = cx[i];
> +        power->states[i + 1].idx = i + 1;
> +        power->states[i + 1].target_residency = cx[i].latency * 
> latency_factor;
> +    }
> +
> +    if ( i )
> +    {
> +        power->count = i + 1;
> +        power->safe_state = &power->states[i];
> +
> +        if ( !vendor_override )
> +        {
> +            if ( !boot_cpu_has(X86_FEATURE_ARAT) )
> +                hpet_broadcast_init();
> +
> +            if ( !lapic_timer_init() )
> +            {
> +                vendor_override = -1;
> +                cpuidle_init_cpu(power->cpu);
> +                return;
> +            }
> +
> +            if ( !pm_idle_save )
> +            {
> +                pm_idle_save = pm_idle;
> +                pm_idle = acpi_processor_idle;
> +            }
> +
> +            dead_idle = acpi_dead_idle;
> +
> +            vendor_override = 1;
> +        }
> +    }
> +    else
> +        vendor_override = -1;
> +}
> +
>   uint32_t pmstat_get_cx_nr(uint32_t cpuid)
>   {
>       return processor_powers[cpuid] ? processor_powers[cpuid]->count : 0;
> @@ -1437,8 +1534,8 @@ static int cpu_callback(
>       int rc = 0;
>   
>       /*
> -     * Only hook on CPU_UP_PREPARE because a dead cpu may utilize the info
> -     * to enter deep C-state.
> +     * Only hook on CPU_UP_PREPARE / CPU_ONLINE because a dead cpu may 
> utilize
> +     * the info to enter deep C-state.
>        */
>       switch ( action )
>       {
> @@ -1447,6 +1544,12 @@ static int cpu_callback(
>           if ( !rc && cpuidle_current_governor->enable )
>               rc = cpuidle_current_governor->enable(processor_powers[cpu]);
>           break;
> +
> +    case CPU_ONLINE:
> +        if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> +             processor_powers[cpu] )
> +            amd_cpuidle_init(processor_powers[cpu]);
> +        break;
>       }
>   
>       return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
> 
> 

This also lacks some of the features of mwait-idle has and duplicates 
the limited functionality.  There's also a lack of comments which may or 
may not be needed.  So that would add to the line change count if you 
care about that.

I'm not sure why you're so adverse to the mwait-idle patches.  We're 
hard coding values in and using mwait (just like Intel is), but the only 
real change we need is using halt for one c-state.

Rather than having both drivers have the ability to read in data from 
structures and duplicate functionality and code, why not just have a 
single driver for hard coded values (mwait-idle) and one for getting the 
values in from dom0 (acpi-idle)?

I think you're getting hung up on the fact that we just don't use mwait 
rather than seeing the larger picture.  Which is, one driver uses hard 
coded values and the other is reading them from Dom0.  We still use 
mwait in the mwait driver, but we just happen to use halt for one 
c-state.  It doesn't require a huge rework of the driver or reducing the 
functionality of it for Intel.  Rather, we're limiting the functionality 
of using hard coded values to one driver that support both vendors with 
minimal changes.

Brian
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.