[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH 0/1] qemu-qdisk: indirect descriptors



In the meantime I tried an implementation for indirect descriptors for qemu.
Described further in the next mail. It is based on current staging branch of 
qemu. 

From tests I did not observed an improvement. A decrease of bandwith starts 
earlier when the block size increase then for staging branch, especially for 
higher values of iodepth[1]. 
I run it under gprof and all the results are available on my github[2] 
but below is a part of flat profile for staging and indirect descriptors when 
fio is run with iodepth=256 and bs=256 for 300 sec. 

In the indirect descriptors implementation more time is spent in ioreq_unmap
function with smaller number of calls. I tried to check if it cooperate better
with grant copy running in the same time vmstat but then rapidly memory is 
exhausted and swap-out/in, the part of the listings are below, and that is not
a case for poor grant copy implementation. 
I tried also different values of MAX_INDIRECT_SEGMENTS in the range 
{256, 128, 64, 32, 16} without bigger difference.

I would appreciate any suggestions how to approach the problem.  

flat profiles:
indirect descriptors
 Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
13.19      1.12     1.12   653798     0.00     0.00  get_clock_realtime
 10.13      1.98     0.86    83570     0.00     0.00  ioreq_unmap
  4.77      2.38     0.41 31245461     0.00     0.00  rcu_read_unlock
  4.12      2.73     0.35    83423     0.00     0.00  ioreq_map
  3.65      3.04     0.31 20900170     0.00     0.00  phys_page_find
  3.12      3.31     0.27 20886790     0.00     0.00  address_space_rw
  2.24      3.50     0.19 20886790     0.00     0.00  address_space_translate
  2.00      3.67     0.17 10849312     0.00     0.00  test_and_clear_bit
  1.88      3.83     0.16 31245456     0.00     0.00  rcu_read_lock
  1.71      3.98     0.14 41773586     0.00     0.00  memory_access_is_direct
  1.65      4.12     0.14 10330994     0.00     0.00  xen_map_cache_unlocked
  1.59      4.25     0.14 20886785     0.00     0.00  
address_space_translate_internal
  1.53      4.38     0.13 10339152     0.00     0.00  cpu_inw
  1.41      4.50     0.12 10458730     0.00     0.00  find_portio
  1.30      4.61     0.11 10389053     0.00     0.00  cpu_physical_memory_rw
  1.12      4.71     0.10 10358655     0.00     0.00  qemu_get_ram_block
  1.06      4.79     0.09 31245450     0.00     0.00  xen_enabled
  1.06      4.88     0.09 10447242     0.00     0.00  portio_read
  1.06      4.97     0.09   237496     0.00     0.00  cpu_ioreq_pio
  1.06      5.07     0.09     1557     0.00     0.00  vnc_refresh_server_sur

 staging 
 Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 11.51      1.61     1.61   970388     0.00     0.00  get_clock_realtime
  9.58      2.95     1.34  1186036     0.00     0.00  ioreq_unmap
  5.50      3.72     0.77  1187881     0.00     0.00  ioreq_map
  4.15      4.30     0.58 31195245     0.00     0.00  rcu_read_unlock
  2.50      4.65     0.35 31195243     0.00     0.00  rcu_read_lock
  2.50      5.00     0.35 20866261     0.00     0.00  phys_page_find
  1.79      5.25     0.25 20852888     0.00     0.00  address_space_rw
  1.36      5.44     0.19  4912499     0.00     0.00  qemu_coroutine_switch
  1.22      5.61     0.17 20852881     0.00     0.00  address_space_translate
  1.22      5.78     0.17  6141137     0.00     0.00  bdrv_is_inserted
  1.07      5.93     0.15  2455277     0.00     0.00  tracked_request_end
  1.07      6.08     0.15  1187877     0.00     0.00  ioreq_parse
  1.00      6.22     0.14 20852887     0.00     0.00  
address_space_translate_internal
  1.00      6.36     0.14  2456463     0.00     0.00  qemu_aio_unref
  1.00      6.50     0.14  2456156     0.00     0.00  qemu_coroutine_enter
  0.93      6.63     0.13 41705784     0.00     0.00  memory_access_is_direct

vmstat listings:
grant map
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  1     11     62   2775    638    0    0  4052   244 16250 14124  6 20 71  2 
 1
 1  0     11     58   2779    638    0    0  4308     0 16227 14254  7 18 74  1 
 0
 1  0     11     56   2781    638    0    0  2320  1456 16310 14124  6 19 74  0 
 1
 1  0     13     67   2776    631    0    1  3924  1372 14720 14019  6 20 74  0 
 1
 1  0     13     66   2779    631    0    0  2768     0 16105 14038  6 19 74  0 
 0
 1  0     13     63   2782    631    0    0  3000     0 14471 14002  6 19 74  0 
 0
 1  0     13     58   2786    632    0    0  3988    36 12383 13135  7 19 73  1 
 0
 1  0     13     56   2789    632    0    0  2488   116 12417 13853  6 20 74  0 
 0
 1  0     13     61   2788    627    0    0  2556   296 12402 13382  7 20 73  0 
 0
 2  0     13     59   2791    627    0    0  2552     0 16114 14085  7 18 74  0 
 1
 1  0     13     56   2793    627    0    0  2320     0 16155 14092  5 20 75  0 
 0
 1  0     14     69   2787    621    0    1  2848  1248 16766 14480  7 19 73  1 
 1
 1  0     14     65   2792    620    0    0  4356     6 16369 14136  6 20 74  0 
 0
 1  0     14     62   2795    621    0    0  3020     0 16079 14079  7 19 74  0 
 1
 1  0     14     59   2798    621    0    0  2964     0 16229 14084  5 19 75  0 
 1
 1  0     14     57   2800    621    0    0  2172     0 16454 14257  6 18 75  0 
 0
 2  0     15     69   2794    614    0    0  3024   712 16416 14241  7 18 73  1 
 1
 1  0     15     67   2797    615    0    0  2936    32 16168 14084  6 19 74  0 
 0

grant map with indirect desriptors
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  1      0     89   1900   1477    0    0  5760    24 9876 11438  5 19 76  0  0
 2  0      0     83   1906   1478    0    0  5568     0 8829 12670  5 19 76  1  0
 1  0      0     78   1911   1477    0    0  4736     0 8649 11291  5 19 76  0  
1
 1  0      0     73   1916   1478    0    0  5120   984 8746 11946  5 20 75  0  0
 1  0      0     66   1922   1478    0    0  6016     0 8959 11785  6 18 76  0  
1
 1  0      0     61   1927   1478    0    0  5312    32 9031 11559  5 18 76  1  0
 2  0      0     56   1932   1477    0    0  4608     0 9170 12156  5 19 75  0  
1
 2  0      0     63   1937   1466    0    0  4992    28 8205 11871  5 21 74  0  0
 2  0      0     57   1942   1466    0    0  4928     0 8249 12198  5 18 76  0  0
 1  0      0     67   1948   1450    0    0  5376     8 10813 11381  6 20 74  0 
 0
 2  0      0     63   1952   1450    0    0  4288   192 9651 11814  5 20 70  4  
1
 1  0      0     59   1956   1450    0    0  4096     0 8960 12058  4 19 76  0  0
 1  0      0     68   1962   1434    0    0  5184    12 9207 12089  5 20 75  0  
1
 1  0      0     64   1966   1434    0    0  4096     0 8433 12016  5 20 75  0  0
 1  0      0     60   1970   1434    0    0  4224   140 10919 10750  5 18 76  0 
 0
 1  0      0     55   1976   1434    0    0  5440     0 8362 12207  5 19 76  0  0
 1  0      0     60   1980   1425    0    0  3776    64 8437 12020  6 19 74  1  
1
 1  0      0     55   1984   1425    0    0  4416     0 8902 11962  6 17 76  0  0

grant copy
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  1      0     63   2789    760    0    0  2268     8 36651 32671  6 19 74  0 
 1
 0  0      0     62   2791    760    0    0  1240     4 36465 33543  5 18 75  1 
 1
 1  0      0     61   2792    760    0    0  1584     0 36237 32312  4 21 74  0 
 1
 3  0      0     59   2794    760    0    0  1628     0 36475 32888  4 20 75  0 
 1
 2  0      0     57   2796    760    0    0  1968     0 34898 33329  5 19 75  0 
 1
 0  0      0     55   2798    759    0    0  1948     0 31510 31938  4 20 75  0 
 1
 1  0      0     66   2794    753    0    0  2244    12 36692 34147  5 18 75  1 
 1
 1  1      0     64   2796    753    0    0  1792    20 29159 32907  5 18 76  0 
 1
 1  0      0     62   2798    753    0    0  2416     0 37445 35323  2 19 77  1 
 1
 2  0      0     59   2800    753    0    0  2188     0 35741 32670  4 20 76  0 
 1
 1  0      0     58   2802    753    0    0  1772     0 36770 34468  4 17 78  0 
 1
 0  0      0     56   2803    752    0    0  1260     0 36317 33152  4 19 76  0 
 1
 1  0      0     55   2805    753    0    0  1216     0 36364 32263  4 19 76  0 
 1
 4  0      0     67   2802    743    0    0  1068    16 35886 32045  5 19 75  1 
 1
 2  0      0     65   2805    743    0    0  2928     0 28347 33364  5 20 74  0 
 1
 2  0      0     62   2807    743    0    0  1944     0 36737 35010  4 17 78  0 
 1
 1  0      0     61   2808    743    0    0  1540     0 35855 31968  3 20 76  0 
 1
 1  0      0     58   2810    743    0    0  2268     0 36047 31639  4 20 75  0 
 1

grant copy with indirect descriptors
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0 29    296     55      9    150    0  153 124752 167260 11989 13103  2  7  6 
84  0
 0 28    296     53      9    156    0    0  2692   620  596  852  0  0  0 100  0
 0 27    296     72      1    159    0    0  7896  2072  675  882  0  2  0 98  0
 2 11    313     53     13    146    0   17 35392 17032 2920 3640  1  5 10 83  0
 0 13    324     58      1    139    0   10 25240 10688 2270 2584  0  3 22 74  0
 0 27    447     72      0    117    0  122 126116 120204 10926 12598  1  5  4 
90  0
 1 24    450     67      0    130    0    3 11968  3608  772 1283  0  2  5 93  0
 0 21    486     71      0    133    0   35  5176 34968  596  952  0  2 46 53  0
 0 18    486     62      0    141    0    0  9352     0  652 1029  0  1 35 64  0
 0 14    488     80      0    146    0    2 12068  2124  584  706  1  2 26 71  0
 0 27    619     59      8    104    0  131 126264 128104 10722 12416  1  7  5 
87  0
 0 22    619     78      0    111    0    0 16652    28 1267 1865  0  2  0 98  0
 0 25    800     68      5     81    0  180 166844 176752 13875 16661  1  6  0 
92  0
 0 19    800     55      1    100    0    0 14436   200  763  909  0  1  0 99  0
 2 15    801     57      6    105    0    1 16308  1080 1103 1461  0  4 17 79  0
 0 28    832     68      0     82    0   30 179036 30080 14029 16746  2  8  6 
83  0
 0 17    831     57      0     94    0    0 14140     0 1082 1316  0  2 13 86  0
 1 18    849     68      1    101    0   17  9908 17444  691  873  0  2  6 92  0

[1] 
https://docs.google.com/spreadsheets/d/1E6AMiB8ceJpExL6jWpH9u2yy6DZxzhmDUyFf-eUuJ0c/edit#gid=1390267663
[2] https://github.com/paulina-szubarczyk/xen-benchmark/tree/master/gprof

Thanks and regards, 
Paulina

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.