[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: [PATCH 2 of 3] interface: Flesh out the BLKIF_OP_DISCARD description



On Thu, Oct 13, 2011 at 09:00:07AM +0100, Ian Campbell wrote:
> Thanks for splitting these out.
> 
> On Wed, 2011-10-12 at 23:12 +0100, Konrad Rzeszutek Wilk wrote:
> [...]
> > + * The backend can optionally provide two extra XenBus attributes to
> > + * further optimize the discard functionality:
> > + * 'discard-aligment' - Devices that support discard functionality may
> > + * internally allocate space in units that are bigger than the exported
> > + * logical block size. The discard-alignment parameter indicates how many 
> > bytes
> > + * the beginning of the partition is offset from the internal allocation 
> > unit's
> > + * natural alignment.
> 

[note: I copied the Documentation/ABI/testing/sysfs-block contents]

> So this is to account for the case where a physical device can discard
> e.g. 128K blocks at a time but the VBD (a better term than "partition"
> in the context, I think) starts at e.g. offset 64K within that
> underlying device?

Yes. And the tools, such as 'fdisk/gparted' can take advantage of that
and create the partitions^H^H^VBDs at the proper spots.

> 
> Does this mean that the virtual device can discard the first 64K (and
> thereafter in 128K chunks), or that it cannot because that would overlap
> the first 64K of that block which belongs to something else? Or that it
> can try but it may or may not succeed. What about if the secure flag is
> set? 

They are all "best try, but we might fail."
> 
> Could we simplify and say that blkback won't expose discard support
> unless the underlying block device is correctly aligned for it? i.e.

I am not sure how we would do that? The discard support works for
full devices, not LVMs, not partitions. So if the user does not
setup the partitions correctly it will try to discard but not do a very
good job.

The current way that Linux does report that the aligment is off is by
by exporting the discard-aligment flag as -1 if it is improperly aligned.
(/sys/block/sda/discard_aligment)

> encourage people to align their underlying storage correctly? Presumably
> doing that has other benefits?

It does that automatically if the user uses the newly found tools
like parted/fdisk..
> 
> > + * 'discard-granularity'  - Devices that support discard functionality may
> > + * internally allocate space using units that are bigger than the logical 
> > block
> > + * size. The discard-granularity parameter indicates the size of the 
> > internal
> > + * allocation unit in bytes if reported by the device. Otherwise the
> > + * discard-granularity will be set to match the device's physical block 
> > size.
> 
> This is effectively the minimum size you can discard? (modulo the
> sub-block at the front arising from discard-alignment).

Yes.
> 
> Presumably the granularity sized blocks are self aligned to that same ?
> (again modulo the sub-block at the beginning).

Yes.

> 
> Would there be any benefit to having both these numbers in logical-block
> sized units instead of bytes? The rest of the interface typically uses
> sectors/segments.

Uhh, I would prefer not too - as we would have to convert those values
back to bytes when providing it to the block API. And the backend would
have to convert from bytes to sectors/segments again.

But this got me thinking - I don't think we actually figure out the
correct block size. Meaning we just hard-code 512.. But then I am not
sure what Linux is doing either:

scsi 2:0:0:0: Direct-Access     ATA      INTEL SSDSA2M080 2CV1 PQ: 0 ANSI: 5
sd 2:0:0:0: [sda] 156301488 512-byte logical blocks: (80.0 GB/74.5 GiB)
sd 2:0:0:0: Attached scsi generic sg0 type 0
scsi 3:0:0:0: Direct-Access     ATA      ST3250410AS      3.AA PQ: 0 ANSI: 5
sd 3:0:0:0: [sdb] 488397168 512-byte logical blocks: (250 GB/232 GiB)
sd 3:0:0:0: Attached scsi generic sg1 type 0

And logical_block_size 512, discard_granularity is 512, and discard_alignment
is zero.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.