[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xm/xl block-detach issue



On Mon, 2011-07-11 at 13:09 -0400, SÃbastien Riccio wrote:
> On 10.07.2011 20:19, Daniel Stodden wrote: 
> > Okay, that needs to get fixed, but I don't know where. In XCP that's
> > how it's exclusively done, because it's the most general approach.
> 
> My guess at the moment is that it might be a problem with blktap and
> vhd or, blktap and my kernel combo, or blktap and my kernel combo and
> vhd :)
> 
> The kernel i'm actually playing with is: 2.6.39.2-xen-stable + blktap,
> built over a debian squeeze like this:
> 
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git
> linux_xen_2.6.39.x-stable
> cd linux_xen_2.6.39.x-stable
> git checkout -b stable/2.6.39.x origin/stable/2.6.39.x 
> git remote add daniel
> git://xenbits.xensource.com/people/dstodden/linux.git 
> git fetch daniel 
> git merge daniel/blktap/next-2.6.39 
> make menuconfig (removing useless stuff, activating the needed ones)
> make bzImage -j9 ; make modules -j9 ; make modules_install
> [ etc ... ]

I'm not familiar with potential open issues in konrad's 2.6.39 tree, but
I wouldn't expect that to be the problem.

> Playing more with it this morning I managed to start a vm with a vhd
> file for the disk (after provisioning it with some files and
> reboots...) , with this config:
> 
> box# cat /cloud/data2/configs/vm1.test.cfg 
> bootloader = "/usr/bin/pygrub"
> memory = 1024
> name = "vm1"
> vcpus = 4
> #vif = [ 'ip=10.111.5.10, bridge=trunk0, vifname=vm1.0' ]
> disk = [ 'tap2:vhd:/cloud/data2/machines/vm1.vhd,xvda,w' ]
> root = "/dev/xvda1"
> extra = "fastboot"
> on_poweroff = 'destroy'
> on_reboot = 'restart'
> on_crash = 'restart'

Looks good, although I'm not really good with the 'disk' lines either.
E.g. mine just say tap:aio, tap:vhd, etc.
> 
> vm1 is up and rocking
> 
> box# xl list
> Name                                        ID   Mem VCPUs      State
> Time(s)
> Domain-0                                     0  1024    16     r-----
> 26.2
> vm1                                          2  1024     4     -b----
> 2.8
> 
> But now if I issue a ps -aux in the dom0, it displays some process
> then hangs the ps.
> (that was not the case before I start vm1)
> 
> And if I try to list the attached block devices with xl:
> 
> box# xl block-list 2
> Vdev  BE  handle state evt-ch ring-ref BE-path                       
> Segmentation fault

Ouch. What xen tree are you running? Unstable? What does tap-ctl list
say, does that work? You might want to try the 4.1 tree instead. If not,
you'll at least want to get yourself a coredump to get an idea of what
you're after.

> (dmesg)
> [ 1592.151122] xl[2292]: segfault at 0 ip 00007f7de314e6d2 sp
> 00007fff610e30b0 error 4 in libc-2.11.2.so[7f7de3117000+158000]

# ulimit -c unlimited
# xl block-list 2
... core dumped.
# gdb $(which xl) core
(gdb) backtrace

> but works if i try to list it with xm:
> 
> box# xm block-list 2
> Vdev  BE handle state evt-ch ring-ref BE-path
> 51712  0    0     4      23
> 8     /local/domain/0/backend/vbd/2/51712  

Well, different codebase...
> 
> I'll try with something else than vhd to see if the same happens, but
> my goal is to use vhd's ...

Just check if the tapdisk's are all working. You didn't see those
failing. But if your guests look happy (xl console, poke around) long
as, your problem is elsewhere.
> 
> 
> > Can you check if it works with some normal disk? Check out modules,
> > install lvm2, make sure you have dm-linear loaded, etc... You were
> > running a custom kernel, right? You're probably just missing sth. 
> 
> modules lists:
> 
> root@xen-blade15:~# lsmod
> Module                  Size  Used by
> blktap                 17941  8 
> ocfs2                 618206  1 
> quota_tree              7539  1 ocfs2
> ocfs2_dlmfs            17331  1 
> ocfs2_stack_o2cb        3482  1 
> ocfs2_dlm             204671  1 ocfs2_stack_o2cb
> ocfs2_nodemanager     186569  14
> ocfs2,ocfs2_dlmfs,ocfs2_stack_o2cb,ocfs2_dlm
> ocfs2_stackglue         7437  3 ocfs2,ocfs2_dlmfs,ocfs2_stack_o2cb
> dm_round_robin          2260  1 
> configfs               21658  2 ocfs2_nodemanager
> crc32c                  2688  8 
> iscsi_tcp               8503  6 
> libiscsi_tcp           11604  1 iscsi_tcp
> libiscsi               34844  2 iscsi_tcp,libiscsi_tcp
> scsi_transport_iscsi    28673  3 iscsi_tcp,libiscsi
> openvswitch_mod        71205  3 
> xenfs                   9815  1 
> xfs                   501098  1 
> ext2                   61369  1 
> sg                     27333  0 
> sr_mod                 14760  0 
> cdrom                  35494  1 sr_mod
> xen_evtchn              4739  2 
> loop                   16002  0 
> tpm_tis                 7821  0 
> tpm                    10878  1 tpm_tis
> i7core_edac            15891  0 
> tpm_bios                4921  1 tpm
> dcdbas                  5416  0 
> edac_core              34483  1 i7core_edac
> evdev                   9374  4 
> usb_storage            43361  0 
> thermal_sys            14045  0 
> pcspkr                  1779  0 
> acpi_processor          5423  0 [permanent]
> button                  4199  0 
> usbhid                 34740  0 
> hid                    78436  1 usbhid
> ext4                  255423  1 
> mbcache                 5434  2 ext2,ext4
> jbd2                   48549  2 ocfs2,ext4
> crc16                   1319  1 ext4
> dm_multipath           16384  2 dm_round_robin
> scsi_dh                 4876  1 dm_multipath
> dm_mod                 63657  7 dm_multipath

Do you have dm-linear (I think it's just linear.ko) available? If not,
it might explain yesterday's kpartx issue.

> sd_mod                 34293  6 
> crc_t10dif              1292  1 sd_mod
> uhci_hcd               21828  0 
> megaraid_sas           70747  3 
> ehci_hcd               37665  0 
> scsi_mod              144719  9
> iscsi_tcp,libiscsi,scsi_transport_iscsi,sg,sr_mod,usb_storage,scsi_dh,sd_mod,megaraid_sas
> usbcore               137744  5 usb_storage,usbhid,uhci_hcd,ehci_hcd
> bnx2                   70964  0 
> 
> My vhd storage is on an ocfs2 shared storage attached with multipath
> iscsi. I will try it on a local storage too to eliminate
> that possible cause.

Well, yeah, that's a bit thicker than normally recommended for testing
patchworks, but then again, it doesn't really sound like that's your
most immediate problem.

Cheers,
Daniel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.