[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC V7 2/3] libxl domain snapshot API design




>>> On 10/21/2014 at 12:11 AM, in message 
>>> <1413821501.29506.13.camel@xxxxxxxxxx>,
Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote: 
> On Fri, 2014-10-10 at 16:48 +0800, Chunyan Liu wrote: 
>  
> > int libxl_domain_snapshot_create(libxl_ctx *ctx, int domid, 
> >                                  libxl_domain_snapshot_args *snapshot, 
> >                                  bool live) 
> >  
> >     Creates a new snapshot of a domain based on the snapshot config  
> contained 
> >     in @snapshot. Save domain and do disk snapshot. 
> >  
> >     ctx (INPUT): context 
> >     domid (INPUT):  domain id 
> >     snapshot (INPUT): configuration of domain snapshot 
> >     live (INPUT):   live snapshot or not 
> >     Returns: 0 on success, -1 on failure 
> >  
> >     ctx: 
> >        context. 
> >  
> >     domid: 
> >        If domain is active, this is the domid of the domain. 
> >        If domain is inactive, set domid=-1. Only disk-only snapshot can b
>  
> >     live: 
> >        true or false. 
> >        when live is 'true', domain is not paused while creating the  
> snapshot, 
> >        like live migration. This increases size of the memory dump file,  
> but 
> >        reducess downtime of the guest. 
>  
> >  Only support this flag during external checkpoints. 
>  
> Why?

This refers to libvirt qemu implementation. I think the reason is time.
Using external snapshot, it only needs to create a qcow2 image,
reference to original image as backing file, then switch to use the new
qcow2 image and start VM again. It's much quicker than doing internal
disk snapshot. For live snapshot, it certainly hopes the VM down time
is short. Well, this is my guess, please point out if there is any different
ideas.
 
>  
> Even if valid for the planned implementation I don't think it belongs in 
> this sort of high level design. There should be an error value 
> indicating that a live checkpoint is not possible, which is the right 
> place to encode this behaviour. 
>  
> >     snapshot: 
> >        memory: 
> >            true or false. 
> >            'false' means disk-only, won't save memory state. 
> >            'true' means saving memory state. Memory would be saved in 
> >            'memory_path'. 
> >        memory_path: 
> >            path to save memory file. NULL when 'memory' is false. 
> >        num_disks: 
> >            number of disks that need to take disk snapshot. 
> >        disks: 
> >            array of disk snapshot configuration. Has num_disks members. 
> >            libxl_device_disk: 
> >                structure to represent which disk. 
> >            name: 
> >                snapshot name. 
>  
> How is this used? Does it get stored somewhere by libxl? 
>  
> >            external: 
> >                true or flase. 
> >                'false' means internal disk snapshot. external_format and 
> >                external_path will be ignored. 
> >                'true' means external disk snapshot, then external_format  
> and 
> >                external_path should be provided. 
> >           external_format: 
> >               should be provided when 'external' is true. If not provided,  
> will 
> >               use default 'qcow2'. 
>  
> I think this should say: will use a default appropriate to the disk 
> backend and format of the underlying disk image in use. 
>  
> >               ignored when 'external' is false. 
> >           external_path: 
> >               must be provided when 'external' is true. 
> >               ignored when 'external' is false. 
> >  
> >  
> > int libxl_domain_snapshot_delete(libxl_ctx *ctx, int domid, 
> >                                  libxl_domain_snapshot_args *snapshot); 
> >  
> >     Delete a snapshot. 
> >     This will delete the related domain and related disk snapshots. 
>  
> I think last time we agreed that this operation could not "delete the 
> related domain" because it mustn't be active, and therefore libxl 
> doesn't know about it and that the management of the snapshot storage 
> was a matter for the toolstack's storage management layer, not libxl. 
>  
> I think we ended up proposing a scheme where there was an API which the 
> toolstack could use to tell libxl that a snapshot in an active domain's 
> snapshot chain was to be changed/has changed, so that it could rescan 
> and make any necessary adjustments. 
>  
> I think this is what we were discussing here: 
> http://lists.xen.org/archives/html/xen-devel/2014-09/msg01541.html 
>  
> >  
> >     ctx (INPUT): context 
> >     domid (INPUT): domain id 
> >     snapshot (INPUT): domain snapshot related info 
> >     Returns: 0 on success, -1 on error. 
> >  
> >     About each input, explanation is the same as  
> libxl_domain_snapshot_create. 
> >  
> > int libxl_domain_snapshot_revert(libxl_ctx *ctx,
> >                                libxl_domain_snapshot_args *snapshot); 
> >  
> >     Revert the domain to a given snapshot. 
> >  
> >     Normally, the domain will revert to the same state the domain was in  
> while 
> >     the snapshot was taken (whether inactive, running, or paused). 
>  
> I don't think inactive makes sense in this interface, there should be no 
> way to create a libxl snapshot of an inactive domain, therefore any 
> reversion to that state will not involve libxl. 

One case is in libvirt. It creates a snapshot, then destroy the domain, but
the domain still exists (inactive). In this case, one can still do 
snapshot-revert.
But maybe we shouldn't include it in libxl, let libvirt handle this case itself.

>  
> Is this operation any different to destroying the domain and using 
> libxl_domain_restore to start a new domain based on the snapshot? Is 
> this operation just a convenience layer over that operation? 

It depends on implementation. It's a simple way to destroy the domain
first, then start new domain based on snapshot. But destroying the
domain may be not good to user (after xl snapshot-revert, domid is
changed.)  and may cause some problem in libvirt (may affect its
event handling ?).

Or another way is: not destroying the domain, but through a process
like pause domain, reload memory, reload disk snapshot, reload config
file, resume domain. Complex but maybe better.
At a previous talk with Jim, he personally suggests it should not destroy
the domain.

>  
> >  
> >     ctx (INPUT): context 
> >     domid (INPUT): domain id 
> >     snapshot (INPUT): snapshot 
> >     Returns: 0 on success, -1 on error. 
> >  
> >     About each input, explanation is the same as  
> libxl_domain_snapshot_create. 
> >  
> > 3. Function Implementation 
> >  
> >    libxl_domain_snapshot_create: 
> >        1). check args validation e 
> >        done. libxl_domain_snapshot_args:memory should be 'false'. 
>  
> I think we discussed last time that if the domain is inactive then libxl 
> doesn't know anything about it and cannot be expected to snapshot it. In 
> this case I think the toolstack's (e.g. libvirt's) storage management is 
> responsible for taking a disk snapshot, libxl is not involved. 
> >        2). if it is not disk-only, save domain memory through save-domain 
> >        3). take disk snapshot by qmp command (if domian is active) or  
> qemu-img 
> >            command (if domain is inactive). 
> >  
> >    libxl_domain_snapshot_delete: 
> >        1). check args validation 
> >        2). remove memory state file if it's not disk-only. 
> >        3). delete disk snapshot. (for internal disk snapshot, through qmp 
> >            command or qemu-img command) 
> >  
> >    libxl_domain_snapshot_revert: 
> >        This may need to hack current libxl code. Could be (?): 
> >        1). pause domain 
> >        2). reload memory 
> >        3). apply disk snapshot. 
> >        4). restore domain config file 
> >        5). resume. 
>  
>  
>  
>  


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.