[ceph-users] Re: adding block.db to OSD

28 Apr 2020

HI Igor,

but the performance issue is still present even on the recreated OSD.

# ceph tell osd.38 bench -f plain 12288000 4096
bench: wrote 12 MiB in blocks of 4 KiB in 1.63389 sec at 7.2 MiB/sec
1.84k IOPS

vs.

# ceph tell osd.10 bench -f plain 12288000 4096
bench: wrote 12 MiB in blocks of 4 KiB in 10.7454 sec at 1.1 MiB/sec 279
IOPS

both baked by the same SAMSUNG SSD as block.db.

Greets,
Stefan

Am 28.04.20 um 19:12 schrieb Stefan Priebe - Profihost AG:
> Hi Igore,
> Am 27.04.20 um 15:03 schrieb Igor Fedotov:
>> Just left a comment at https://tracker.ceph.com/issues/44509
>>
>> Generally bdev-new-db performs no migration, RocksDB might eventually do
>> that but no guarantee it moves everything.
>>
>> One should use bluefs-bdev-migrate to do actual migration.
>>
>> And I think that's the root cause for the above ticket.
> 
> perfect - this removed all spillover in seconds.
> 
> Greets,
> Stefan
> 
> 
>> Thanks,
>>
>> Igor
>>
>> On 4/24/2020 2:37 PM, Stefan Priebe - Profihost AG wrote:
>>> No not a standalone Wal I wanted to ask whether bdev-new-db migrated
>>> dB and Wal from hdd to ssd.
>>>
>>> Stefan
>>>
>>>> Am 24.04.2020 um 13:01 schrieb Igor Fedotov &lt;ifedotov(a)suse.de&gt;de>:
>>>>
>>>> 
>>>>
>>>> Unless you have 3 different types of disks beyond OSD (e.g. HDD, SSD,
>>>> NVMe) standalone WAL makes no sense.
>>>>
>>>>
>>>> On 4/24/2020 1:58 PM, Stefan Priebe - Profihost AG wrote:
>>>>> Is Wal device missing? Do I need to run *bluefs-bdev-new-db and
Wal?*
>>>>>
>>>>> Greets,
>>>>> Stefan
>>>>>
>>>>>> Am 24.04.2020 um 11:32 schrieb Stefan Priebe - Profihost AG
>>>>>> &lt;s.priebe(a)profihost.ag&gt;ag>:
>>>>>>
>>>>>> Hi Igor,
>>>>>>
>>>>>> there must be a difference. I purged osd.0 and recreated it.
>>>>>>
>>>>>> Now it gives:
>>>>>> ceph tell osd.0 bench
>>>>>> {
>>>>>>    "bytes_written": 1073741824,
>>>>>>    "blocksize": 4194304,
>>>>>>    "elapsed_sec": 8.1554735639999993,
>>>>>>    "bytes_per_sec": 131659040.46819863,
>>>>>>    "iops": 31.389961354303033
>>>>>> }
>>>>>>
>>>>>> What's wrong wiht adding a block.db device later?
>>>>>>
>>>>>> Stefan
>>>>>>
>>>>>> Am 23.04.20 um 20:34 schrieb Stefan Priebe - Profihost AG:
>>>>>>> Hi,
>>>>>>> if the OSDs are idle the difference is even more worse:
>>>>>>> # ceph tell osd.0 bench
>>>>>>> {
>>>>>>>     "bytes_written": 1073741824,
>>>>>>>     "blocksize": 4194304,
>>>>>>>     "elapsed_sec": 15.396707875000001,
>>>>>>>     "bytes_per_sec": 69738403.346825853,
>>>>>>>     "iops": 16.626931034761871
>>>>>>> }
>>>>>>> # ceph tell osd.38 bench
>>>>>>> {
>>>>>>>     "bytes_written": 1073741824,
>>>>>>>     "blocksize": 4194304,
>>>>>>>     "elapsed_sec": 6.8903985170000004,
>>>>>>>     "bytes_per_sec": 155831599.77624846,
>>>>>>>     "iops": 37.153148597776521
>>>>>>> }
>>>>>>> Stefan
>>>>>>> Am 23.04.20 um 14:39 schrieb Stefan Priebe - Profihost AG:
>>>>>>>> Hi,
>>>>>>>> Am 23.04.20 um 14:06 schrieb Igor Fedotov:
>>>>>>>>> I don't recall any additional tuning to be
applied to new DB
>>>>>>>>> volume. And assume the hardware is pretty the
same...
>>>>>>>>>
>>>>>>>>> Do you still have any significant amount of data
spilled over
>>>>>>>>> for these updated OSDs? If not I don't have any
valid
>>>>>>>>> explanation for the phenomena.
>>>>>>>>
>>>>>>>> just the 64k from here:
>>>>>>>> https://tracker.ceph.com/issues/44509
>>>>>>>>
>>>>>>>>> You might want to try "ceph osd bench" to
compare OSDs under
>>>>>>>>> pretty the same load. Any difference observed
>>>>>>>>
>>>>>>>> Servers are the same HW. OSD Bench is:
>>>>>>>> # ceph tell osd.0 bench
>>>>>>>> {
>>>>>>>>      "bytes_written": 1073741824,
>>>>>>>>      "blocksize": 4194304,
>>>>>>>>      "elapsed_sec": 16.091414781000001,
>>>>>>>>      "bytes_per_sec": 66727620.822242722,
>>>>>>>>      "iops": 15.909104543266945
>>>>>>>> }
>>>>>>>>
>>>>>>>> # ceph tell osd.36 bench
>>>>>>>> {
>>>>>>>>      "bytes_written": 1073741824,
>>>>>>>>      "blocksize": 4194304,
>>>>>>>>      "elapsed_sec": 10.023828538,
>>>>>>>>      "bytes_per_sec": 107118933.6419194,
>>>>>>>>      "iops": 25.539143953780986
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> OSD 0 is a Toshiba MG07SCA12TA SAS 12G
>>>>>>>> OSD 36 is a Seagate ST12000NM0008-2H SATA 6G
>>>>>>>>
>>>>>>>> SSDs are all the same like the rest of the HW. But both
drives
>>>>>>>> should give the same performance from their specs. The
only other
>>>>>>>> difference is that OSD 36 was directly created with the
block.db
>>>>>>>> device (Nautilus 14.2.7) and OSD 0 (14.2.8) does not.
>>>>>>>>
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 4/23/2020 8:35 AM, Stefan Priebe - Profihost AG
wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> is there anything else needed beside running:
>>>>>>>>>> ceph-bluestore-tool --path
/var/lib/ceph/osd/ceph-${OSD}
>>>>>>>>>> bluefs-bdev-new-db --dev-target
/dev/vgroup/lvdb-1
>>>>>>>>>>
>>>>>>>>>> I did so some weeks ago and currently i'm
seeing that all osds
>>>>>>>>>> originally deployed with --block-db show 10-20%
I/O waits while
>>>>>>>>>> all those got converted using ceph-bluestore-tool
show 80-100%
>>>>>>>>>> I/O waits.
>>>>>>>>>>
>>>>>>>>>> Also is there some tuning available to use more
of the SSD? The
>>>>>>>>>> SSD (block-db) is only saturated at 0-2%.
>>>>>>>>>>
>>>>>>>>>> Greets,
>>>>>>>>>> Stefan
>>>>>>>>>> _______________________________________________
>>>>>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>>>>>>> To unsubscribe send an email to
ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: adding block.db to OSD