[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 01/11] xen/manage: keep track of the on-going suspend mode

On Thu, Jun 03, 2021 at 04:11:46PM -0400, Boris Ostrovsky wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you can confirm the sender and know the 
> content is safe.
> On 6/2/21 3:37 PM, Anchal Agarwal wrote:
> > On Tue, Jun 01, 2021 at 10:18:36AM -0400, Boris Ostrovsky wrote:
> >>
> > The resume won't fail because in the image the xen_vcpu and xen_vcpu_info 
> > are
> > same. These are the same values that got in there during saving of the
> > hibernation image. So whatever xen_vcpu got as a value during boot time 
> > registration on resume is
> > essentially lost once the jump into the saved kernel image happens. 
> > Interesting
> > part is if KASLR is not enabled boot time vcpup mfn is same as in the image.
> Do you start the your guest right after you've hibernated it? What happens if 
> you create (and keep running) a few other guests in-between? mfn would likely 
> be different then I'd think.
Yes, I just run it in loops on a single guest and I am able to see the issue in
20-40 iterations sometime may be sooner. Yeah, you could be right and this could
definitely happen more often depending what's happening on dom0 side.
> > Once you enable KASLR this value changes sometimes and whenever that happens
> > resume gets stuck. Does that make sense?
> >
> > No it does not resume successfully if hypercall fails because I was trying 
> > to
> > explicitly reset vcpu and invoke hypercall.
> > I am just wondering why does restore logic fails to work here or probably I 
> > am
> > missing a critical piece here.
> If you are not using KASLR then xen_vcpu_info is at the same address every 
> time you boot. So whatever you registered before hibernating stays the same 
> when you boot second time and register again, and so successful comparison in 
> xen_vcpu_setup() works. (Mostly by chance.)
That's what I thought so too.
> But if KASLR is on then this comparison not failing should cause xen_vcpu 
> pointer in the loaded image to become bogus because xen_vcpu is now 
> registered for a different xen_vcpu_info address during boot.
The reason for that I think is once you jump into the image that information is
getting lost. But there is  some residue somewhere that's causing the resume to
fail. I haven't been able to pinpoint the exact field value that may be causing
that issue.
Correct me if I am wrong here, but even if hypothetically I put a hack to tell 
the kernel
somehow re-register vcpu it won't pass because there is no hypercall to
unregister it in first place? Can the resumed kernel use the new values in that
case [Now this is me just throwing wild guesses!!]

> >>> Another line of thought is something what kexec does to come around this 
> >>> problem
> >>> is to abuse soft_reset and issue it during syscore_resume or may be 
> >>> before the image get loaded.
> >>> I haven't experimented with that yet as I am assuming there has to be a 
> >>> way to re-register vcpus during resume.
> >>
> >> Right, that sounds like it should work.
> >>
> > You mean soft reset or re-register vcpu?
> Doing something along the lines of a soft reset. It should allow you to 
> re-register. Not sure how you can use it without Xen changes though.
No not without xen changes. It won't work. I will have xen changes in place to
test that on our infrastructure. 

> -boris



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.