[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V2] x86, amd_ucode: Skip microcode updates for final levels



On 8/3/2015 1:18 PM, Boris Ostrovsky wrote:
On 08/03/2015 01:52 PM, Aravind Gopalakrishnan wrote:
On 8/3/2015 12:22 PM, Boris Ostrovsky wrote:
On 08/03/2015 12:35 PM, Aravind Gopalakrishnan wrote:
Some of older[Fam10h] systems require that certain number of
applied microcode patch levels should not be overwritten by
the microcode loader. Otherwise, system hangs are known to occur.

The 'final_levels' of patch ids have been obtained empirically.
Refer bug https://bugzilla.suse.com/show_bug.cgi?id=913996
for details of the issue.

The short version is that people have predominantly noticed
system hang issues when trying to update microcode levels
beyond the patch IDs below.
[0x01000098, 0x0100009f, 0x010000af]



  +/*
+ * The 'final_levels' of patch ids have been obtained empirically.
+ * Refer bug https://bugzilla.suse.com/show_bug.cgi?id=913996
+ * for details of the issue. The short version is that people
+ * have predominantly noticed system hang issues when trying to
+ * update microcode levels beyond the patch IDs below.
+ * From internal discussions, we gathered that OS/hypervisor
+ * cannot reliably perform microcode updates beyond these levels
+ * due to hardware issues. Therefore, we need to abort microcode
+ * update process if we hit any of these levels.
+ */
+static unsigned int final_levels[] = {
+    0x01000098,
+    0x0100009f,
+    0x010000af
+};
+
+static bool_t check_final_patch_levels(int cpu)
+{
+    /*
+ * Check the current patch levels on the cpu. If they are equal to + * any of the 'final_levels', then we should not update the microcode
+     * patch on the cpu as system will hang otherwise.
+     */
+    struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
+    unsigned int i;
+
+    for ( i = 0; i < ARRAY_SIZE(final_levels); i++ )
+        if ( uci->cpu_sig.rev == final_levels[i] )
+            return 1;

(I should have asked this when you posted v1, sorry)

Are these final_levels[] fam10h-specific? Are we guaranteed that other families won't have them? I think family check would be good here. Or at least a comment stating that these levels can only be observed on family 10h (the only mention of family right now is in the commit message)




Yep. These are Fam10h specific microcode patches. And yeah, the HW problem seems to exist only on these Fam10h specific processor variants. So, it's pretty isolated.

And why need a family check? The microcode patches are tied to a specific family/model anyway right?

Patches --- yes. I wasn't sure about levels (I hope we will never see the same level for different families but AFAIK it's not strictly prohibited since level is just a cookie).


Ah. I see what you mean.
I can think of two ways around this-
a. I can move the check_final_patch_levels() call to apply_microcode(). That way, our initial checks in microcode_fits() would have already checked equivalent cpu IDs so we know we are on the right family. b. Introduce a family check in this patch as you suggested (It's a simple change, should work fine too and I think can retain Andrew's 'Reviewed-by' as the logic doesn't change too much..)

Do let me know if you have a preference.


It's only ever going to apply successfully if the equivalent id's match..

And sure, I can massage the comment like so-
" ... The short version is that people who are using certain Fam10h systems are noticing system hang issues when tryingupdate microcode..."

Would that be ok?

Better than nothing (although TBH I'd rather see both the comment and the check).


And sure, I'll include a comment as well.

Thanks,
-Aravind.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.