[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 01/11] xen/manage: keep track of the on-going suspend mode
On Thu, Jun 03, 2021 at 04:11:46PM -0400, Boris Ostrovsky wrote: > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you can confirm the sender and know the > content is safe. > > > > On 6/2/21 3:37 PM, Anchal Agarwal wrote: > > On Tue, Jun 01, 2021 at 10:18:36AM -0400, Boris Ostrovsky wrote: > >> > > The resume won't fail because in the image the xen_vcpu and xen_vcpu_info > > are > > same. These are the same values that got in there during saving of the > > hibernation image. So whatever xen_vcpu got as a value during boot time > > registration on resume is > > essentially lost once the jump into the saved kernel image happens. > > Interesting > > part is if KASLR is not enabled boot time vcpup mfn is same as in the image. > > > Do you start the your guest right after you've hibernated it? What happens if > you create (and keep running) a few other guests in-between? mfn would likely > be different then I'd think. > > Yes, I just run it in loops on a single guest and I am able to see the issue in 20-40 iterations sometime may be sooner. Yeah, you could be right and this could definitely happen more often depending what's happening on dom0 side. > > Once you enable KASLR this value changes sometimes and whenever that happens > > resume gets stuck. Does that make sense? > > > > No it does not resume successfully if hypercall fails because I was trying > > to > > explicitly reset vcpu and invoke hypercall. > > I am just wondering why does restore logic fails to work here or probably I > > am > > missing a critical piece here. > > > If you are not using KASLR then xen_vcpu_info is at the same address every > time you boot. So whatever you registered before hibernating stays the same > when you boot second time and register again, and so successful comparison in > xen_vcpu_setup() works. (Mostly by chance.) > That's what I thought so too. > > But if KASLR is on then this comparison not failing should cause xen_vcpu > pointer in the loaded image to become bogus because xen_vcpu is now > registered for a different xen_vcpu_info address during boot. > The reason for that I think is once you jump into the image that information is getting lost. But there is some residue somewhere that's causing the resume to fail. I haven't been able to pinpoint the exact field value that may be causing that issue. Correct me if I am wrong here, but even if hypothetically I put a hack to tell the kernel somehow re-register vcpu it won't pass because there is no hypercall to unregister it in first place? Can the resumed kernel use the new values in that case [Now this is me just throwing wild guesses!!] > > >>> Another line of thought is something what kexec does to come around this > >>> problem > >>> is to abuse soft_reset and issue it during syscore_resume or may be > >>> before the image get loaded. > >>> I haven't experimented with that yet as I am assuming there has to be a > >>> way to re-register vcpus during resume. > >> > >> Right, that sounds like it should work. > >> > > You mean soft reset or re-register vcpu? > > > Doing something along the lines of a soft reset. It should allow you to > re-register. Not sure how you can use it without Xen changes though. > No not without xen changes. It won't work. I will have xen changes in place to test that on our infrastructure. -- Anchal > > > -boris >
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |