[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 3/3] xen-blkfront: dynamic configuration of per-vbd resources
On 07/22/2016 07:45 PM, Roger Pau Monné wrote: > On Fri, Jul 22, 2016 at 05:43:32PM +0800, Bob Liu wrote: >> >> On 07/22/2016 05:34 PM, Roger Pau Monné wrote: >>> On Fri, Jul 22, 2016 at 04:17:48PM +0800, Bob Liu wrote: >>>> >>>> On 07/22/2016 03:45 PM, Roger Pau Monné wrote: >>>>> On Thu, Jul 21, 2016 at 06:08:05PM +0800, Bob Liu wrote: >>>>>> >>>>>> On 07/21/2016 04:57 PM, Roger Pau Monné wrote: >>>> ..[snip].. >>>>>>>> + >>>>>>>> +static ssize_t dynamic_reconfig_device(struct blkfront_info *info, >>>>>>>> ssize_t count) >>>>>>>> +{ >>>>>>>> + unsigned int i; >>>>>>>> + int err = -EBUSY; >>>>>>>> + >>>>>>>> + /* >>>>>>>> + * Make sure no migration in parallel, device lock is actually a >>>>>>>> + * mutex. >>>>>>>> + */ >>>>>>>> + if (!device_trylock(&info->xbdev->dev)) { >>>>>>>> + pr_err("Fail to acquire dev:%s lock, may be in >>>>>>>> migration.\n", >>>>>>>> + dev_name(&info->xbdev->dev)); >>>>>>>> + return err; >>>>>>>> + } >>>>>>>> + >>>>>>>> + /* >>>>>>>> + * Prevent new requests and guarantee no uncompleted reqs. >>>>>>>> + */ >>>>>>>> + blk_mq_freeze_queue(info->rq); >>>>>>>> + if (part_in_flight(&info->gd->part0)) >>>>>>>> + goto out; >>>>>>>> + >>>>>>>> + /* >>>>>>>> + * Front Backend >>>>>>>> + * Switch to XenbusStateClosed >>>>>>>> + * frontend_changed(): >>>>>>>> + * case XenbusStateClosed: >>>>>>>> + * >>>>>>>> xen_blkif_disconnect() >>>>>>>> + * Switch to >>>>>>>> XenbusStateClosed >>>>>>>> + * blkfront_resume(): >>>>>>>> + * frontend_changed(): >>>>>>>> + * reconnect >>>>>>>> + * Wait until XenbusStateConnected >>>>>>>> + */ >>>>>>>> + info->reconfiguring = true; >>>>>>>> + xenbus_switch_state(info->xbdev, XenbusStateClosed); >>>>>>>> + >>>>>>>> + /* Poll every 100ms, 1 minute timeout. */ >>>>>>>> + for (i = 0; i < 600; i++) { >>>>>>>> + /* >>>>>>>> + * Wait backend enter XenbusStateClosed, >>>>>>>> blkback_changed() >>>>>>>> + * will clear reconfiguring. >>>>>>>> + */ >>>>>>>> + if (!info->reconfiguring) >>>>>>>> + goto resume; >>>>>>>> + schedule_timeout_interruptible(msecs_to_jiffies(100)); >>>>>>>> + } >>>>>>> >>>>>>> Instead of having this wait, could you just set info->reconfiguring = >>>>>>> 1, set >>>>>>> the frontend state to XenbusStateClosed and mimic exactly what a resume >>>>>>> from >>>>>>> suspension does? blkback_changed would have to set the frontend state >>>>>>> to >>>>>>> InitWait when it detects that the backend has switched to Closed, and >>>>>>> call >>>>>>> blkfront_resume. >>>>>> >>>>>> >>>>>> I think that won't work. >>>>>> In the real "resume" case, the power management system will trigger all >>>>>> ->resume() path. >>>>>> But there is no place for dynamic configuration. >>>>> >>>>> Hello, >>>>> >>>>> I think it should be possible to set info->reconfiguring and wait for the >>>>> backend to switch to state Closed, at that point we should call >>>>> blkif_resume >>>>> (from blkback_changed) and the backend will follow the reconection. >>>>> >>>> >>>> Okay, I get your point. Yes, that's an option. >>>> >>>> But this will make 'dynamic configuration' to be async, I'm worry about >>>> the end-user will get panic. >>>> E.g >>>> A end-user "echo <new value> > /sys/devices/vbd-xxx/max_indirect_segs", >>>> but then the device will be Closed and disappeared, the user have to wait >>>> for a random time so that the device can resume. >>> >>> That should not happen, AFAICT on migration the device never dissapears. >> >> Oh, yes. >> >>> alloc_disk and friends should not be called on resume from migration (see >>> the switch in blkfront_connect, you should take the BLKIF_STATE_SUSPENDED >>> path for the reconfiguration). >>> >> >> What about if the end-user starts I/O immediately after writing new value to >> /sys? >> But the resume is still in progress. > > blkif_free already stops the queues by calling blk_mq_stop_hw_queues, and > blkif_queue_request will refuse to put anything on the ring if the state > is different than connected, which in turn makes blkif_queue_rq call > blk_mq_stop_hw_queue to stop the queue, so it should be safe. > But this will surprise the end-user, our user script is like this: 1) echo <new value> > /sys/xxxx 2) Start I/O immediately. ^^^ Fail because requests would be refused(even software queue was still freezed). It's not good for the end user have to wait for a random time before restart I/O. There are two more concerns I have: * blkif_resume() may fail, how the end-user can aware that if "echo <new value> > /sys/xxx" already returned success. * We get the device lock and blk_mq_freeze_queue() in dynamic_reconfig_device(), and then have to release in blkif_recover() asynchronously which makes the code more difficult to readable. As a result, I still prefer current synchronize way so that we can know whether blkif_resume fails, not break the end-user and more straightforward code. -- Regards, -Bob _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |