Hi all
I want to study the effect of bluestore rocksdb compression on ceph and whether it is necessary to optimize it. But currently, bluestore rocksdb compression is disabled by default in ceph.
I simply replaced the rocksdb compression algorithm, and then performed a 4KB rand read fio test, but the test results are not different.
So I would like to ask you, is there any scenario where rocksdb needs to be compressed in ceph, and if so, what type of test is used to better demonstrate the impact of rocksdb compression on the upper layer of ceph.
// create 10GB image
ceph osd pool create rbd 128
rbd pool init rbd
rbd create --size 10240 --image rbd/image --image-format 2 --thick-provision
//fio
[global]
ioengine=rbd
iodepth=128
rw=randwrite
bs=4KB
time_based=1
ramp_time=60s
runtime=300s
clientname=admin
pool=rbd
group_reporting
buffer_compress_percentage=80
refill_buffers
buffer_pattern=0xdeadbeef
[volumes]
rbdname=image
numjobs=1
Thanks
-Hualong
Hi,
today I saw a strange situation where files which were copied to a cephfs via Ganesha NFS (deployed via cephadm) disappeared from the NFS directory and then did not show up anymore until I restarted the ganesha instance. This could be observed on different NFS client hosts. While the files were not showing, they could still be seen via cephfs-shell, so I assume the issue is somewhere on the NFS side. Interestingly, the folder modification time (shown in ls) matched the time the file was copied to that folder.
Any idea what could be the issue here? Unfortunately there were no interesting logs and I couldn't find a way yet to reproduce this.
The ceph version in use is 17.2.5.
Thanks,
Patrick
Hi everyone, I’ve got a quick question regarding one of our RadosGW bucket.
This bucket is used to store docker registries, and the total amount of
data we use is supposed to be 4.5Tb BUT it looks like ceph told us we
rather use ~53Tb of data.
One interesting thing is, this bucket seems to shard for unknown reason as
it is supposed to be disabled by default, but even taking that into account
we’re not supposed to see such a massive amount of additional data isn’t it?
Here is the bucket stats of it:
https://paste.opendev.org/show/bdWFRvNFtxyHnbPfXWu9/
Has anybody run into a 'stuck' OSD service specification? I've tried
to delete it, but it's stuck in 'deleting' state, and has been for
quite some time (even prior to upgrade, on 15.2.x). This is on 16.2.3:
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
osd.osd_spec 504/525 <deleting> 12m label:osd
root@ceph01:/# ceph orch rm osd.osd_spec
Removed service osd.osd_spec
From active monitor:
debug 2021-05-06T23:14:48.909+0000 7f17d310b700 0
log_channel(cephadm) log [INF] : Remove service osd.osd_spec
Yet in ls, it's still there, same as above. --export on it:
root@ceph01:/# ceph orch ls osd.osd_spec --export
service_type: osd
service_id: osd_spec
service_name: osd.osd_spec
placement: {}
unmanaged: true
spec:
filter_logic: AND
objectstore: bluestore
We've tried --force, as well, with no luck.
To be clear, the --export even prior to delete looks nothing like the
actual service specification we're using, even after I re-apply it, so
something seems 'bugged'. Here's the OSD specification we're applying:
service_type: osd
service_id: osd_spec
placement:
label: "osd"
data_devices:
rotational: 1
db_devices:
rotational: 0
db_slots: 12
I would appreciate any insight into how to clear this up (without
removing the actual OSDs, we're just wanting to apply the updated
service specification - we used to use host placement rules and are
switching to label-based).
Thanks,
David
Hi everyone,
Today is the last day to sponsor Cephalocon Amsterdam 2023! I want to
thank our current sponsors:
Platinum: IBM
Silver: 42on, Canonical Ubuntu, Clyso
Startup: Koor
Also, thank you to Clyso for their lanyard add-on and 42on's offsite
attendee party.
We are still short in covering the costs for the event, so I'm asking
for contributors and members of the Ceph Foundation to consider
applying today.
https://events.linuxfoundation.org/cephalocon/sponsor/
Sponsor Prospectus:
https://events.linuxfoundation.org/wp-content/uploads/2023/03/sponsor-ceph-…
Please get in touch with us at sponsorships(a)ceph.foundation to get
started. Thank you!
--
Mike Perez
Hi all,
osd_heartbeat_grace = 20 and osd_pool_default_read_lease_ratio = 0.8 by
default, so, pg will wait 16s when osd restart in the worst case. This wait
time is too long, client i/o can not be unacceptable. I think adjusting
the osd_pool_default_read_lease_ratio to lower is a good way. Have any good
suggestions about reduce pg wait time?
Best Regard
Yite Gu
Hi,
I have a problem on one of ceph clusters I do not understand.
ceph 17.2.5 on 17 servers, 400 HDD OSDs, 10 and 25Gb/s NICs
3TB rbd image is on erasure coded 8+3 pool with 128pgs , xfs filesystem,
4MB objects in rbd image, mostly empy.
I have created a bunch of 10G files, most of them were written with
1.5GB/s, few of them were really slow, ~10MB/s, a factor of 100.
When reading these files back, the fast-written ones are read fast,
~2-2.5GB/s, the slowly-written are also extremely slow in reading, iotop
shows between 1 and 30 MB/s reading speed.
This does not happen at all on replicated images. There are some OSDs
with higher apply/commit latency, eg 200ms, but there are no slow ops.
The tests were done actually on proxmox vm with librbd, but the same
happens with krbd, and on bare metal with mounted krbd as well.
I have tried to check all OSDs for laggy drives, but they all look about
the same.
I have also copied entire image with "rados get...", object by object,
the strange thing here is that most of objects were copied within
0.1-0.2s, but quite some took more than 1s.
The cluster is quite busy with base traffic of ~1-2GB/s, so the speeds
can vary due to that. But I would not expect a factor of 100 slowdown
for some writes/reads with rbds.
Any clues on what might be wrong or what else to check? I have another
similar ceph cluster where everything looks fine.
Best,
Andrej
--
_____________________________________________________________
prof. dr. Andrej Filipcic, E-mail: Andrej.Filipcic(a)ijs.si
Department of Experimental High Energy Physics - F9
Jozef Stefan Institute, Jamova 39, P.o.Box 3000
SI-1001 Ljubljana, Slovenia
Tel.: +386-1-477-3674 Fax: +386-1-477-3166
-------------------------------------------------------------
hi , all
I m using rgw multisite with ceph 17.2.5 deployed with rook.
A number of bucket.sync-status mdlogs with names of buckets deleted during maintenance were found.
test env )
bash-4.4$ rados -p master.rgw.log ls | grep bucket.sync-status | grep test1
bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:5
bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:4
bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:1
bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:7
bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:2
bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:10
bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:6
bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:3
bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:0
bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:9
bucket.sync-status.a788ebed-10a9-48da-8fd4-709323da68e7:test1:8da53b60-0940-46e1-a821-551347d82d2c.16016.2:8
bash-4.4$ radosgw-admin bucket list
[
"rook-ceph-bucket-checker-94e835a7-6356-46bf-a90c-591b23b15959",
"230314",
"cyyoon",
"test23031303"
]
How to delete the bucket.sync-status mdlog of the deleted bucket in this situation? Should I proceed with mdlog trim ?
As these issues accumulated in the prod environment, a large omap issue occurred in the log pool.
root@osd-001:~# radosgw-admin log list| grep [DELETED BUCKET NAME] | wc -l
25636
Any other ideas as to what might be causing this, or anything else we could try to help diagnose or fix this? Thanks in advance!