[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: [PATCH][pvops_dom0][4/4] use physical acpi_id in acpi processor parsing logic



On 07/21/09 01:07, Yu, Ke wrote:
> To use acpi id in native, I can see there are at least two kind of conflicts 
> need to be resolved: 
> 1. kernel assume it only cares about the present CPU. For non present CPU, it 
> will simply stop going further and return, or trigger BUG(). When switch to 
> acpi id, the acpi processor object may refer to a non present cpu, so the 
> code need to be able to handle the non-present CPU situation.
>   

The percpu subsystem should be able to deal with accesses to percpu data
of non-present cpus (though it might need some advance preparation to
make sure the memory is allocated).  In general the percpu subsystem is
concerned with making sure that the amount of memory allocated is
"reasonable" - ie, for cpus which are actually present or could be
present, rather than cpus which can never exist on this system (like
running a kernel compiled for 1024 processors on a dual-core laptop).

I assume that ACPI processor IDs are always going to be in the realm of
sensible for the hardware: ie, either CPUs which actually exist, or
which have sockets which could potentially be hotplugged.  In that case
I don't see a problem with making sure they have percpu data allocated.

(Of course in the Xen case this needs a bit more care, since the domain
VCPU count has nothing to do with the host PCPUs, but we can do things
like manipulate the possible CPU set if that helps.)

> 2. native kernel use per_cpu data extensively, which is indexed by general 
> cpu id. when switch to acpi id, these per_cpu data should be changed to the 
> array indexed by acpi id.
>   

How is the acpi id derived?  Is the the same as the local apic id?  Is
it typically the same as the kernel's smp_processor_id, or does it tend
to be different?  If they're different, is the mapping fixed or can it vary?

> Take the acpi processor core code (driver/acpi/ processor_core.c) as example, 
> the condition check " BUG_ON((pr->id >= nr_cpu_ids) || (pr->id < 0)); " need 
> change. the per-cpu data processor_device_array, processors need change. And 
> the cpu_sys_devices in get_cpu_sysdev need more thoughts before changing, 
> since it is globally used by other component.
>
> Another example is the cpufreq case. if we want to use acpi id in cpufreq 
> case, we also need to resolve the above two conflicts. For example, in 
> drivers/cpufreq/cpufreq.c, its core data struct " cpufreq_policy " is 
> per-CPU, thus need many changes in every place it is used. and the condition 
> checking, like " if (cpu >= nr_cpu_ids) goto err_out;" also need change. 
> Compared with the change in the driver/acpi/ processor_core.c, the change in 
> cpufreq is more intrusive. Since the acpi processor core code already has the 
> Px info parsing functionality, it may be better not changing cpufreq.
>   

OK, to summarize:

The cpufreq subsystem provides two services to the rest of the kernel:

    * the ability to set the overall power management policy
      (performance, powersave, etc)
    * the mechanism and drivers to implement that policy

In this case we still want a way to set the policy, but Xen itself will
implement the mechanism internally without dom0's further involvement
(aside from some info culled from the ACPI tables), right?

But even then, cpufreq is oriented towards controlling the
kernel-visible CPUs, and is ill-suited to controlling the policy of the
host CPUs from the context of one particular domain.

Therefore we need to have new interfaces which:

   1. insert ACPI info that dom0 extracts from various tables into Xen
      (assuming its impractical for Xen to do this itself)
   2. set the overall power-management policy
   3. Xen implements that policy without further interaction with dom0

(And what's missing from this is some way for each individual domain to
set the "importance" of the work being done on each VCPU to allow Xen to
determine what's the appropriate operating point for each PCPU from
timeslice to timeslice.)

Is that accurate?

Thanks,
    J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.