[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] PAGE_SIZE (64KB), while block driver 'struct request' deals with < PAGE_SIZE (up to 44Kb). Was:Re: [RFC] Support of non-indirect grant backend on 64KB guest
Hi, On 21/08/15 18:10, Konrad Rzeszutek Wilk wrote: > On Fri, Aug 21, 2015 at 05:08:35PM +0100, David Vrabel wrote: >> On 21/08/15 17:05, Konrad Rzeszutek Wilk wrote: >>> >>> I have to concur with that. We can't mandate that ARM 64k page MUST use >>> indirect descriptors. >> >> Then it has to be fixed in the block layer to allow < PAGE_SIZE segments >> and to get the block layer to split requests for blkfront. > > Hey Jens, > > I am hoping you can help us figure this problem out. > > The Linux ARM is capable of using 4KB pages and 64KB pages. Our block > driver (xen-blkfront) was built with 4KB pages in mind and without using > any fancy flags (which some backends lack) the maximum amount of I/O it can > fit on a ring is 44KB. > > This has the unfortunate effect that when the xen-blkfront > gets an 'struct request' it can have on page (64KB) and it can't actually > fit it on the ring! And the lowest segment size it advertises is PAGE_SIZE > (64KB). I believe Julien (who found this) tried initially advertising > smaller segment size than PAGE_SIZE (32KB). However looking at > __blk_segment_map_sg it looks to assume smallest size is PAGE_SIZE so > that would explain why it did not work. To be honest, I haven't tried to see how the block layer will act if I dropped those checks in blk-settings.c until today. I don't see any assumption about PAGE_SIZE in __blk_segment_map_sg. Although dropping the checks in blk-settings (see quick patch [1]), I got the following error in the frontend: bio too big device xvda (128 > 88) Buffer I/O error on dev xvda, logical block 0, async page read bio too big device xvda (128 > 88) Buffer I/O error on dev xvda, logical block 0, async page read The "bio too big device ..." comes from generic_make_request_checks (linux/block/blk-core.c) and the stack trace is: [<fffffe0000096c7c>] dump_backtrace+0x0/0x124 [<fffffe0000096db0>] show_stack+0x10/0x1c [<fffffe00005885e8>] dump_stack+0x78/0xbc [<fffffe00000bf7f8>] warn_slowpath_common+0x98/0xd0 [<fffffe00000bf8f0>] warn_slowpath_null+0x14/0x20 [<fffffe00002df304>] generic_make_request_checks+0x114/0x230 [<fffffe00002e0580>] generic_make_request+0x10/0x108 [<fffffe00002e070c>] submit_bio+0x94/0x1e0 [<fffffe00001d573c>] submit_bh_wbc.isra.36+0x100/0x1a8 [<fffffe00001d5bf8>] block_read_full_page+0x320/0x3e8 [<fffffe00001d877c>] blkdev_readpage+0x14/0x20 [<fffffe000014582c>] do_read_cache_page+0x16c/0x1a0 [<fffffe0000145870>] read_cache_page+0x10/0x1c [<fffffe00002f2908>] read_dev_sector+0x30/0x9c [<fffffe00002f3d84>] msdos_partition+0x84/0x554 [<fffffe00002f38e4>] check_partition+0xf8/0x21c [<fffffe00002f2f28>] rescan_partitions+0xb0/0x2a4 [<fffffe00001d98b0>] __blkdev_get+0x228/0x34c [<fffffe00001d9a14>] blkdev_get+0x40/0x364 [<fffffe00002f0b6c>] add_disk+0x398/0x424 [<fffffe00003d8500>] blkback_changed+0x1200/0x152c [<fffffe000036a954>] xenbus_otherend_changed+0x9c/0xa4 [<fffffe000036c984>] backend_changed+0xc/0x18 [<fffffe000036a088>] xenwatch_thread+0xa0/0x13c [<fffffe00000d98d0>] kthread+0xd8/0xf0 The fs buffer code seems to assume that the block driver will always support at least a bio of PAGE_SIZE. > One wya to make this work is for the driver (xen-blkfront) to split > the 'struct request' I/O in two internal requests. > > But this has to be a normal problem. Surely there are other drivers > (MMC?) that can't handle PAGE_SIZE and have to break their I/Os. > Would it make sense for the common block code to be able to deal > with this? It will take me a bit of time to fully understand the block layer as the changes doesn't seem trivial from POV (I don't have any knowledge in it). So I will wait a feedback from Jens before going further on this approach. Regards, [1] patch diff --git a/block/blk-settings.c b/block/blk-settings.c index e0057d0..ac024e7 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -251,12 +251,15 @@ EXPORT_SYMBOL(blk_queue_bounce_limit); **/ void blk_limits_max_hw_sectors(struct queue_limits *limits, unsigned int max_hw_sectors) { +#if 0 if ((max_hw_sectors << 9) < PAGE_CACHE_SIZE) { max_hw_sectors = 1 << (PAGE_CACHE_SHIFT - 9); printk(KERN_INFO "%s: set to minimum %d\n", __func__, max_hw_sectors); } +#endif + limits->max_sectors = limits->max_hw_sectors = max_hw_sectors; } EXPORT_SYMBOL(blk_limits_max_hw_sectors); @@ -351,11 +354,14 @@ EXPORT_SYMBOL(blk_queue_max_segments); **/ void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size) { +#if 0 if (max_size < PAGE_CACHE_SIZE) { max_size = PAGE_CACHE_SIZE; printk(KERN_INFO "%s: set to minimum %d\n", __func__, max_size); } +#endif + q->limits.max_segment_size = max_size; } @@ -777,11 +783,14 @@ EXPORT_SYMBOL_GPL(blk_queue_dma_drain); **/ void blk_queue_segment_boundary(struct request_queue *q, unsigned long mask) { +#if 0 if (mask < PAGE_CACHE_SIZE - 1) { mask = PAGE_CACHE_SIZE - 1; printk(KERN_INFO "%s: set to minimum %lx\n", __func__, mask); } +#endif + q->limits.seg_boundary_mask = mask; } -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |