On Wed, Jul 02, 2008 at 04:20:33PM +1000, Simon Horman wrote:
> I have done some more investigations and it does really
> seem that calling ia64_sal_get_state_info() via ia64_log_queue()
> in ia64_mca_cpe_int_caller() causes the hypervisor to lock
> up when my EFI RR patches are applied.
>
> As you point out, if xmalloc() was ever called by ia64_log_queue()
> in this context then a BUG would be triggered. As we are not
> seeing that in the wild, then that case must not occur (or occur
> so rarely that no one has seen and reported it yet). This means
> that ia64_sal_get_state_info() must be returning zero.
>
> If I understand correctly, ia64_log_queue() does more or less nothing
> if ia64_sal_get_state_info() returns zero. Or in other words, if
> ia64_sal_get_state_info() then for one reason or another there is no
> information available at that time - we know that because if
> there was information available then xmalloc() would be called and
> a BUG would be triggered.
>
>
> Given that without the EFF RR patches the call to ia64_log_queue()
> in ia64_sal_get_state_info() seems to do nothing and with the call
> a crash occurs, I wonder if the best way forward is to simply
> remove the call.
>
> The section on SAL_GET_STATE (==ia64_sal_get_state_info()) in the System
> Abstraction Layer Specification (Dec 2003) does state "In response to
> the MCA, Processor CMC, or Corrected Platform event, The operating
> system must call the procedure to obtain all the pending processor and
> plaftorm error information that triggerd the event."
>
> Does that apply to situations when ia64_mca_cpe_int_caller() is called?
> And if so, can calling ia64_log_queue() be deffered?
ia64_mca_cpe_int_caller() is triggered by the polling timer,
cpe_poll_timer which send IA64_CPEP_VECTOR. So I think
ia64_log_queue() can be deferred by using softirq or tasklet.
To be honest, taking a rough look at SAL specification I don't
understand why the VMM locks up when ia64_sal_get_state_info() is called.
You stated that when ia64_log_queue() is called, RID is already
EFI's. Have you tracked down the reason and what's firmware
call(PAL/SAL/EFI)?
And where have you tracked down the hypervisor locks up?
i.e. The hypervisor locks up in ia64_sal_get_state_info() around
SAL call or right in the SAL call.
If the lock up happens in the SAL call, what we can do is to take
a closer look at SAL spec and to make the calling condition sure.
If the lock up happens before or after the SAL call, presumably
it sould indicate xen/ia64 vmm bug.
--
yamahata
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|