|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] RFC v1: Xen block protocol overhaul - problem statement (with pictures!)
> > > Which has a nice power of two ring to it. (Ah the puns!)
> > >
> > > I like the idea of putting the request on a diet - but too much
> > > could cause us to miss the opportunity to insert other flags on it.
> > > If I recall correctly, the DIF/DIX only need 8 bytes of data.
> > > If we make the assumption that:
> > > I/O request = one ring entry
> >
> > So we only need to reserve 8bytes for each DIF/IDX, even if the request
> > contains a variable number of data? (I mean, block requests can at a
> > minimum contain 4096bytes, or much more)
>
> I need to double check with Martin (CC-ed here). But my recollection
> is that it is just attached the the 'bio'. So if the BIO is 4K or 1MB -
> it would only have one DIF/DIX data type.
And that is semi-correct. If the user did a horrible job (say using
dd) the pages are chained together - and we end up with a link list
of bio's. The last bio would point to a page filled with 'sector's worth
of data has a checksum. Each checksum occupies 8 bytes. So if the
total 'bio' length is say 1MB, this last page is filled with 256 of
checksums - so 2048 bytes of data.
>
> Hmm, but then we operate on the 'struct request' so that might not
> be the case..
> >
> > > and the "one ring entry" can use the the '4' grants if we just have a
> > > 16KB I/O request, but if it is more than that - we use the indirect page
> >
> > Well, on my purpose I've limited the number of segments of a "rw"
> > requests to 2, so it's only 8K, anything bigger has to use indirect
> > descriptors, which can fit 4M of data (because I'm passing 4 grant
> > frames full of "blkif_request_indirect_entry" entries).
>
> <nods>
> >
> > > and can stuff 1MB of data in there.
> > > The extra 32-bytes of space for such things as 'DIF/DIX'. This also
> > > means we could unify the 'struct request' with the 'discard' operation
> > > and it could utilize the 32-bytes of extra unused payload data.
> > >
> > >>>
> > >>>
> > >>> The âoperationâ would be BLKIF_OP_INDIRECT. The read/write/discard,
> > >>> etc operation would now be in indirect.op. The indirect.gref points to
> > >>> a page that is filled with:
> > >>>
> > >>>
> > >>> struct blkif_request_indirect_entry {
> > >>> blkif_sector_t sector_number;
> > >>> struct blkif_request_segment seg;
> > >>> } __attribute__((__packed__));
> > >>> //16 bytes, so we can fit in a page 256 of these structures.
> > >>>
> > >>>
> > >>> This means that with the existing 36 slots in the ring (single page)
> > >>> we can cover: 32 slots * each blkif_request_indirect covers: 256 * 4096
> > >>> ~= 32M. If we donât want to use indirect descriptor we can still use
> > >>> up to 4 pages of the request (as it has enough space to contain four
> > >>> segments and the structure will still be cache-aligned).
> > >>>
Martin asked me why we even do this via these entries. Meaning why
have this tuple of information for each page: <lba, first_sect, last_sect,
gref>.
The lba on the next subsequent indirect entry is going to be incremented by
one. The first_sect and last_sect too... So why not just do:
struct blkif_request_indirect {
uint8_t operation;
blkif_vdev_t handle; /* only for read/write requests */
#ifdef CONFIG_X86_64
uint32_t _pad1; /* offsetof(blkif_request,u.rw.id) == 8 */
#endif
uint64_t id; /* private guest value, echoed in resp */
blkif_sector_t sector_number;/* start sector idx on disk (r/w only) */
grant_ref_t indirect_desc;
uint16_t nr_elems;
}
And the 'indirect_desc' would point to a page that looks quite close to
what the scatterlist looks like:
struct indirect_chain {
uint16_t op_flag; //*Can D_NEXT, D_START, D_END ?
uint16_t next;
uint16_t offset;
uint16_t length;
uint32_t gref;
uint32_t _pad; // Need this in case we ever
want to
// make gref + _pad be a
physical addr.
}
And the page itself would be:
struct indirect_chain[256];
the 'next' would just contain the index inside in indirect_chain page - so from
0->256. The offset and length would reference wherein the page the data is
contained.
This way the 'lba' information is part of the 'blkif_request_indirect' and the
payload info is all in the indirect descriptors.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |