On a Luminous 12.2.10 cluster that is healthy and not in use, I am trying to run rados bench tests but am finding that the read tests, both 'seq' and 'rand' often take far too long to complete and Im wondering how to diagnose where the problem lies.

I populate the data with rados bench write as follows:

$ rados bench -p rbd -t 4 30 write --no-cleanup -b 4M
- this runs for 30 seconds as expected and exits.

Then I try to run 'seq' read tests and they never fully complete. the "finished" column takes a very long time to catch up to the "started" numbers and the test often runs far longer than the given time limit (is that a bug in itself?).

$ rados bench -p rbd -t 4 10 seq
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1       4        30        26   103.983       104  0.00539085   0.0157025
    2       4        30        26   51.9923         0           -   0.0157025
    3       4        30        26   34.6622         0           -   0.0157025
    4       4        30        26   25.9968         0           -   0.0157025
    5       4        30        26   20.7975         0           -   0.0157025
    6       4        30        26   17.3313         0           -   0.0157025
    7       4        30        26   14.8555         0           -   0.0157025
    8       4        30        26   12.9986         0           -   0.0157025
    9       4        30        26   11.5543         0           -   0.0157025
   10       4        30        26   10.3989         0           -   0.0157025
...

At this point it continues for quite a while before it finally exits after several minutes.  The "avg MB/s" value trends down to 0 since all IO seems to be stuck.  'rand' read testing is not faring any better.  The health remains "OK" during this process and Im not sure where to look to find the bottleneck.

Thanks,
  Wyllys Ingersoll