[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands
On 29/02/16 04:37, Bob Liu wrote: > 1) What is this patch about? > This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG). > A request with BLKIF_OP_EXTRA_FLAG set means the following request is an > extra request which is used to pass through SCSI commands. > This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h. > It can be extended easily to transmit other per-request/bio data from frontend > to backend e.g Data Integrity Field per bio. > > 2) Why we need this? > Currently only raw data segments are transmitted from blkfront to blkback, > which > means some advanced features are lost. > * Guest knows nothing about features of the real backend storage. > For example, on bare-metal environment INQUIRY SCSI command can be used > to query storage device information. If it's a SSD or flash device we > can have the option to use the device as a fast cache. > But this can't happen in current domU guests, because blkfront only > knows it's just a normal virtual disk > > * Failover Clusters in Windows > Failover clusters require SCSI-3 persistent reservation target disks, > but now this can't work in domU. > > 3) Known issues: > * Security issues, how to 'validate' this extra request payload. > E.g SCSI operates on LUN bases (the whole disk) while we really just want > to > operate on partitions It's not only validation: some operations just affect the whole LUN (e.g. Reserve/Release). And what about "multi-LUN" commands like "report LUNs"? > * Can't pass SCSI commands through if the backend storage driver is bio-based > instead of request-based. > > 4) Alternative approach: Using PVSCSI instead: > * Doubt PVSCSI can support as many type of backend storage devices as > Xen-block. pvSCSI won't need to support all types of backends. It's enough to support those where passing through SCSI commands makes sense. Seems to be a similar issue as the above mentioned problem with bio-based backend storage drivers. > * Much longer path: > ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> > PVSCSI-backend -> Target framework(LIO?) -> > > With xen-block we only need: > ioctl() -> blkfront -> blkback -> I'd like to see performance numbers before making a decision. > * xen-block has been existed for many years, widely used and more stable. Adding another SCSI passthrough capability wasn't accepted for pvSCSI (that's the reason I used the Target Framework). Why do you think it will be accepted for pvblk? This is not my personal opinion, just a heads up from someone who had a try already. ;-) Juergen > Welcome any input, thank you! > > Signed-off-by: Bob Liu <bob.liu@xxxxxxxxxx> > --- > xen/include/public/io/blkif.h | 73 > +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 73 insertions(+) > > diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h > index 99f0326..7c10bce 100644 > --- a/xen/include/public/io/blkif.h > +++ b/xen/include/public/io/blkif.h > @@ -635,6 +635,28 @@ > #define BLKIF_OP_INDIRECT 6 > > /* > + * Recognised only if "feature-extra-request" is present in backend xenbus > info. > + * A request with BLKIF_OP_EXTRA_FLAG indicates an extra request is followed > + * in the shared ring buffer. > + * > + * By this way, extra data like SCSI command, DIF/DIX and other > per-request/bio > + * data can be transmitted from frontend to backend. > + * > + * The 'wire' format is like: > + * Request 1: xen_blkif_request > + * [Request 2: xen_blkif_extra_request] (only if request 1 has > BLKIF_OP_EXTRA_FLAG) > + * Request 3: xen_blkif_request > + * Request 4: xen_blkif_request > + * [Request 5: xen_blkif_extra_request] (only if request 4 has > BLKIF_OP_EXTRA_FLAG) > + * ... > + * Request N: xen_blkif_request > + * > + * If a backend does not recognize BLKIF_OP_EXTRA_FLAG, it should *not* > create the > + * "feature-extra-request" node! > + */ > +#define BLKIF_OP_EXTRA_FLAG (0x80) > + > +/* > * Maximum scatter/gather segments per request. > * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE. > * NB. This could be 12 if the ring indexes weren't stored in the same page. > @@ -703,10 +725,61 @@ struct blkif_request_indirect { > }; > typedef struct blkif_request_indirect blkif_request_indirect_t; > > +enum blkif_extra_request_type { > + BLKIF_EXTRA_TYPE_SCSI_CMD = 1, /* Transmit SCSI command. */ > +}; > + > +struct scsi_cmd_req { > + /* > + * Grant mapping for transmiting SCSI command to backend, and > + * also receive sense data from backend. > + * One 4KB page is enough. > + */ > + grant_ref_t cmd_gref; > + /* Length of SCSI command in the grant mapped page. */ > + unsigned int cmd_len; > + > + /* > + * SCSI command may require transmiting data segment length less > + * than a sector(512 bytes). > + * Record num_sg and last segment length in extra request so that > + * backend can know about them. > + */ > + unsigned int num_sg; > + unsigned int last_sg_len; > +}; > + > +/* > + * Extra request, must follow a normal-request and a normal-request can > + * only be followed by one extra request. > + */ > +struct blkif_request_extra { > + uint8_t type; /* BLKIF_EXTRA_TYPE_* */ > + uint16_t _pad1; > +#ifndef CONFIG_X86_32 > + uint32_t _pad2; /* offsetof(blkif_...,u.extra.id) == 8 */ > +#endif > + uint64_t id; > + struct scsi_cmd_req scsi_cmd; > +} __attribute__((__packed__)); > +typedef struct blkif_request_extra blkif_request_extra_t; > + > +struct scsi_cmd_res { > + unsigned int resid_len; > + /* Length of sense data returned in grant mapped page. */ > + unsigned int sense_len; > +}; > + > +struct blkif_response_extra { > + uint8_t type; /* BLKIF_EXTRA_TYPE_* */ > + struct scsi_cmd_res scsi_cmd; > +} __attribute__((__packed__)); > + > struct blkif_response { > uint64_t id; /* copied from request */ > uint8_t operation; /* copied from request */ > int16_t status; /* BLKIF_RSP_??? */ > + struct blkif_response_extra extra; > }; > typedef struct blkif_response blkif_response_t; > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |