[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/3] xen-blkfront: dynamic configuration of per-vbd resources



On 07/25/2016 05:20 PM, Roger Pau Monné wrote:
> On Sat, Jul 23, 2016 at 06:18:23AM +0800, Bob Liu wrote:
>>
>> On 07/22/2016 07:45 PM, Roger Pau Monné wrote:
>>> On Fri, Jul 22, 2016 at 05:43:32PM +0800, Bob Liu wrote:
>>>>
>>>> On 07/22/2016 05:34 PM, Roger Pau Monné wrote:
>>>>> On Fri, Jul 22, 2016 at 04:17:48PM +0800, Bob Liu wrote:
>>>>>>
>>>>>> On 07/22/2016 03:45 PM, Roger Pau Monné wrote:
>>>>>>> On Thu, Jul 21, 2016 at 06:08:05PM +0800, Bob Liu wrote:
>>>>>>>>
>>>>>>>> On 07/21/2016 04:57 PM, Roger Pau Monné wrote:
>>>>>> ..[snip]..
>>>>>>>>>> +
>>>>>>>>>> +static ssize_t dynamic_reconfig_device(struct blkfront_info *info, 
>>>>>>>>>> ssize_t count)
>>>>>>>>>> +{
>>>>>>>>>> +    unsigned int i;
>>>>>>>>>> +    int err = -EBUSY;
>>>>>>>>>> +
>>>>>>>>>> +    /*
>>>>>>>>>> +     * Make sure no migration in parallel, device lock is actually a
>>>>>>>>>> +     * mutex.
>>>>>>>>>> +     */
>>>>>>>>>> +    if (!device_trylock(&info->xbdev->dev)) {
>>>>>>>>>> +            pr_err("Fail to acquire dev:%s lock, may be in 
>>>>>>>>>> migration.\n",
>>>>>>>>>> +                    dev_name(&info->xbdev->dev));
>>>>>>>>>> +            return err;
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>> +    /*
>>>>>>>>>> +     * Prevent new requests and guarantee no uncompleted reqs.
>>>>>>>>>> +     */
>>>>>>>>>> +    blk_mq_freeze_queue(info->rq);
>>>>>>>>>> +    if (part_in_flight(&info->gd->part0))
>>>>>>>>>> +            goto out;
>>>>>>>>>> +
>>>>>>>>>> +    /*
>>>>>>>>>> +     * Front                                Backend
>>>>>>>>>> +     * Switch to XenbusStateClosed
>>>>>>>>>> +     *                                      frontend_changed():
>>>>>>>>>> +     *                                       case XenbusStateClosed:
>>>>>>>>>> +     *                                              
>>>>>>>>>> xen_blkif_disconnect()
>>>>>>>>>> +     *                                              Switch to 
>>>>>>>>>> XenbusStateClosed
>>>>>>>>>> +     * blkfront_resume():
>>>>>>>>>> +     *                                      frontend_changed():
>>>>>>>>>> +     *                                              reconnect
>>>>>>>>>> +     * Wait until XenbusStateConnected
>>>>>>>>>> +     */
>>>>>>>>>> +    info->reconfiguring = true;
>>>>>>>>>> +    xenbus_switch_state(info->xbdev, XenbusStateClosed);
>>>>>>>>>> +
>>>>>>>>>> +    /* Poll every 100ms, 1 minute timeout. */
>>>>>>>>>> +    for (i = 0; i < 600; i++) {
>>>>>>>>>> +            /*
>>>>>>>>>> +             * Wait backend enter XenbusStateClosed, 
>>>>>>>>>> blkback_changed()
>>>>>>>>>> +             * will clear reconfiguring.
>>>>>>>>>> +             */
>>>>>>>>>> +            if (!info->reconfiguring)
>>>>>>>>>> +                    goto resume;
>>>>>>>>>> +            schedule_timeout_interruptible(msecs_to_jiffies(100));
>>>>>>>>>> +    }
>>>>>>>>>
>>>>>>>>> Instead of having this wait, could you just set info->reconfiguring = 
>>>>>>>>> 1, set 
>>>>>>>>> the frontend state to XenbusStateClosed and mimic exactly what a 
>>>>>>>>> resume from 
>>>>>>>>> suspension does? blkback_changed would have to set the frontend state 
>>>>>>>>> to 
>>>>>>>>> InitWait when it detects that the backend has switched to Closed, and 
>>>>>>>>> call 
>>>>>>>>> blkfront_resume.
>>>>>>>>
>>>>>>>>
>>>>>>>> I think that won't work.
>>>>>>>> In the real "resume" case, the power management system will trigger 
>>>>>>>> all ->resume() path.
>>>>>>>> But there is no place for dynamic configuration.
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I think it should be possible to set info->reconfiguring and wait for 
>>>>>>> the 
>>>>>>> backend to switch to state Closed, at that point we should call 
>>>>>>> blkif_resume 
>>>>>>> (from blkback_changed) and the backend will follow the reconection.
>>>>>>>
>>>>>>
>>>>>> Okay, I get your point. Yes, that's an option.
>>>>>>
>>>>>> But this will make 'dynamic configuration' to be async, I'm worry about 
>>>>>> the end-user will get panic.
>>>>>> E.g
>>>>>> A end-user "echo <new value> > /sys/devices/vbd-xxx/max_indirect_segs",
>>>>>> but then the device will be Closed and disappeared, the user have to 
>>>>>> wait for a random time so that the device can resume.
>>>>>
>>>>> That should not happen, AFAICT on migration the device never dissapears. 
>>>>
>>>> Oh, yes.
>>>>
>>>>> alloc_disk and friends should not be called on resume from migration (see 
>>>>> the switch in blkfront_connect, you should take the BLKIF_STATE_SUSPENDED 
>>>>> path for the reconfiguration).
>>>>>
>>>>
>>>> What about if the end-user starts I/O immediately after writing new value 
>>>> to /sys?
>>>> But the resume is still in progress.
>>>
>>> blkif_free already stops the queues by calling blk_mq_stop_hw_queues, and 
>>> blkif_queue_request will refuse to put anything on the ring if the state 
>>> is different than connected, which in turn makes blkif_queue_rq call 
>>> blk_mq_stop_hw_queue to stop the queue, so it should be safe.
>>>
>>
>> But this will surprise the end-user, our user script is like this:
>> 1) echo <new value> > /sys/xxxx
>> 2) Start I/O immediately.
>>      ^^^ Fail because requests would be refused(even software queue was 
>> still freezed).
> 
> Which error do you get? AFAICT reads/writes should either block when the 

You are right, the reads/writes will just block which is fine.

> queue is full, or if the fd is in non-blocking mode it should return EAGAIN 
> (which a properly coded application should be prepared to deal with 
> gracefully if it's using non-blocking fds anyway).
>  
>> It's not good for the end user have to wait for a random time before restart 
>> I/O.
>>
>>
>> There are two more concerns I have:
>>  * blkif_resume() may fail, how the end-user can aware that if "echo <new 
>> value> > /sys/xxx" already returned success.
> 
> If you really think this is needed, can't you use a monitor or some kind of 
> condition with a timeout instead of open-coding it? Although I'm still not 
> convinced that blocking here is TRTTD.
> 

Let me ask in another way.
If moving blkfront_resume() to blkback_changed, should we check the return 
value of blkfront_resume()?
And what to do if it returns error.


>>  * We get the device lock and blk_mq_freeze_queue() in 
>> dynamic_reconfig_device(),
>>    and then have to release in blkif_recover() asynchronously which makes 
>> the code more difficult to readable.
> 
> I'm not sure I'm following here, do you mean that you will pick the lock in 
> dynamic_reconfig_device and release it in blkif_recover? Why wouldn't you 

Yes.

> release the lock in dynamic_reconfig_device after doing whatever is needed?
> 

Both 'dynamic configuration' and migration:xenbus_dev_resume() use { 
blkfront_resume(); blkif_recover() } and depends on the change of xbdev->state.
If they happen simultaneously, the State machine of xbdev->state is going to be 
a mess and very difficult to track.

The lock(actually a mutex) is like a big lock to make sure no race would happen 
at all.

Thanks,
Bob Liu



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.