|
|
|
|
|
|
|
|
|
|
xen-devel
Re: [Xen-devel] Greater than 16 xvd devices for blkfront
* Ian Jackson (Ian.Jackson@xxxxxxxxxxxxx) wrote:
> Chris Wright writes ("Re: [Xen-devel] Greater than 16 xvd devices for
> blkfront"):
> > * Daniel P. Berrange (berrange@xxxxxxxxxx) wrote:
> > > + default:
> > > + if (major > 202) {
> > > + minor += (16 * 16 * (major - 202));
> > > + major = 202;
> > > + }
> > > + }
>
> The root cause of the problem is the incorporation of the Linux device
> numbering scheme into the xenstore protocol, which is wrong I think.
> What Daniel's excellent if rather unpleasant suggestion is doing is to
> regard the xenstore number not as a `Linux device number' but rather
> as a crazy encoding of the disk number.
>
> I think this is fine but it would be good if we could think about what
> the new crazy encoding is, and document it. I infer that in Daniel's
> suggestion it's:
>
> xenstore number = (202 << 8) + (actual disk number << 4)
> | partition number
>
> where the actual disk number starts at 0 for xvda and partition
> numbers are 0 for whole disk or 1..15.
>
> Daniel's solution still doesn't work for partitions >15. Perhaps,
I think that's OK, and effectively a hard limitation w.r.t. lanana:
202 block Xen Virtual Block Device
0 = /dev/xvda First Xen VBD whole disk
16 = /dev/xvdb Second Xen VBD whole disk
32 = /dev/xvdc Third Xen VBD whole disk
...
240 = /dev/xvdp Sixteenth Xen VBD whole disk
Partitions are handled in the same way as for IDE
disks (see major number 3) except that the limit on
partitions is 15.
> given that old guests are going to break anyway, we should consider a
> different scheme ? Since disks and partitions not supported by the
> old encoding won't work on old guests anyway, we can use a completely
> new encoding for that case provided only that it doesn't use numbers
> of the form (202 << 8) | something
Well, we don't actually need 202, or any minor numbers at all. The major
is only needed for the case where xvd masquerades as IDE or SCSI.
We ripped this wart out for upstream Linux. And the guest can happily
dynamically allocate minor numbers on its own behalf. A disk discovery
event can be completely dynamic, the admin just wouldn't be able to
guarantee which minor slot gets allocated for a particular disk in
a guest. We do have mount by label or UUID.
> Presumably we can safely use at least 31 bits. If we reserve one to
> indicate that this is the new encoding that leaves us with 30 which
> should be enough for a reasonable number of disks with many
> partitions each.
>
> > I didn't think of handling overflow (since the major for scsi/ide/etc
> > were involved, I expected that to fail). But, aside of crashing an
> > older guest with > 16 disks (not ideal, but I think it's possible
> > already with 0x format), seems good.
>
> If a guest takes the xenstore number to be the concatenation of its
> own major and minor numbers then obviously it is leaving itself open
> to breaking in the future. dom0 admins will just have to Not Do That
> Then. (It's a shame, if true, that the guests don't have actual error
> checking.)
Agreed.
thanks,
-chris
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
|
|
|
|