March 2024 - ceph-users - lists.ceph.io

el7 + nautilus rbd snapshot map + lvs mount crash

by Marc

Looks like this procedure crashes the ceph node. Tried this now for 2nd time after updating and again crash. el7 + nautilus -> rbd snapshot map -> lvs mount -> crash (lvs are not even duplicate names)

1 month, 3 weeks

1
0
0 0

log_latency slow operation observed for submit_transact, latency = 22.644258499s

by Torkil Svensgaard

Good morning, Cephadm Reef 18.2.1. We recently added 4 hosts and changed a failure domain from host to datacenter which is the reason for the large misplaced percentage. We were seeing some pretty crazy spikes in "OSD Read Latencies" and "OSD Write Latencies" on the dashboard. Most of the time everything is well but then for periods of time, 1-4 hours, latencies will go to 10+ seconds for one or more OSDs. This also happens outside scrub hours and it is not the same OSDs every time. The OSDs affected are HDD with DB/WAL on NVMe. Log snippet: " ... 2024-03-22T06:48:22.859+0000 7fb184b52700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fb169898700' had timed out after 15.000000954s 2024-03-22T06:48:22.859+0000 7fb185b54700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fb169898700' had timed out after 15.000000954s 2024-03-22T06:48:22.864+0000 7fb169898700 1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fb169898700' had timed out after 15.000000954s 2024-03-22T06:48:22.864+0000 7fb169898700 0 bluestore(/var/lib/ceph/osd/ceph-112) log_latency slow operation observed for submit_transact, latency = 17.716707230s 2024-03-22T06:48:22.880+0000 7fb1748ae700 0 bluestore(/var/lib/ceph/osd/ceph-112) log_latency_fn slow operation observed for _txc_committed_kv, latency = 17.732601166s, txc = 0x55a5bcda0f00 2024-03-22T06:48:38.077+0000 7fb184b52700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fb169898700' had timed out after 15.000000954s 2024-03-22T06:48:38.077+0000 7fb184b52700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fb169898700' had timed out after 15.000000954s ... " " [root@dopey ~]# ceph -s cluster: id: 8ee2d228-ed21-4580-8bbf-0649f229e21d health: HEALTH_WARN 1 failed cephadm daemon(s) Low space hindering backfill (add storage if this doesn't resolve itself): 1 pg backfill_toofull services: mon: 5 daemons, quorum lazy,jolly,happy,dopey,sleepy (age 3d) mgr: jolly.tpgixt(active, since 10d), standbys: dopey.lxajvk, lazy.xuhetq mds: 1/1 daemons up, 2 standby osd: 540 osds: 539 up (since 6m), 539 in (since 15h); 6250 remapped pgs data: volumes: 1/1 healthy pools: 15 pools, 10849 pgs objects: 546.35M objects, 1.1 PiB usage: 1.9 PiB used, 2.3 PiB / 4.2 PiB avail pgs: 1425479651/3163081036 objects misplaced (45.066%) 6224 active+remapped+backfill_wait 4516 active+clean 67 active+clean+scrubbing 25 active+remapped+backfilling 16 active+clean+scrubbing+deep 1 active+remapped+backfill_wait+backfill_toofull io: client: 117 MiB/s rd, 68 MiB/s wr, 274 op/s rd, 183 op/s wr recovery: 438 MiB/s, 192 objects/s " Anyone know what the issue might be? Given that is happens on and off with large periods of time in between with normal low latencies I think it unlikely that it is just because the cluster is busy. Also, how come there's only a small amount of PGs doing backfill when we have such a large misplaced percentage? Can this be just from backfill reservation logjam? Mvh. Torkil -- Torkil Svensgaard Systems Administrator Danish Research Centre for Magnetic Resonance DRCMR, Section 714 Copenhagen University Hospital Amager and Hvidovre Kettegaard Allé 30, 2650 Hvidovre, Denmark

1 month, 3 weeks

4
6
0 0

Laptop Losing Connectivity To CephFS On Sleep/Hibernation

by duluxoz

Hi All, I'm looking for some help/advice to solve the issue outlined in the heading. I'm running CepfFS (name: cephfs) on a Ceph Reef (v18.2.2 - latest update) cluster, connecting from a laptop running Rocky Linux v9.3 (latest update) with KDE v5 (latest update). I've set up the laptop to connect to a number of directories on CephFS via the `/etc/fstab' folder, an example of such is: `ceph_user@.cephfs=/my_folder /mnt/my_folder ceph noatime,_netdev 0 0`. Everything is working great; the required Ceph Key is on the laptop (with a chmod of 600), I can access the files on the Ceph Cluster, etc, etc, etc - all good. However, whenever the laptop is in sleep or hibernate mode (ie when I close the laptop's lid) and then bring the laptop out of sleep/hibernation (ie I open the laptop's lid) I've lost the CephFS mountings. The only way to bring them back is to run `mount -a` as root (or sudo). This is, as I'm sure you'll agree, not a long-term viable options - especially as this is a running as a pilot-project and the eventual end-users won't have access to root/sudo. So I'm seeking the collective wisdom of the community in how to solve this issue. I've taken a brief look at autofs, and even half-heartedly had a go at configuring it, but it didn't seem to work - honestly, it was late and I wanted to get home after a long day. :-) Is this the solution to my issue, or is there a better way to construct the fstab entries, or is there another solution I haven't found yet in the doco or via google-foo? All help and advice greatly appreciated - thanks in advance Cheers Dulux-Oz

1 month, 4 weeks

3
6
0 0

Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

by Michel Jouvin

Hi, We have a Reef cluster that started to complain a couple of weeks ago about ~20 PGs (over 10K) not scrubbed/deep-scrubbed in time. Looking at it since a few days, I saw this affect only those PGs that could not be scrubbed since mid-February. Old the other PGs are regularly scrubbed. I decided to look if one OSD was present in all these PGs and found one! I restarted this OSD but it had no effect. Looking at the logs for the suspect OSD, I found nothing related to abnormal behaviour (but the log is very verbose at restart time so easy to miss something...). And there is no error associated with the OSD disk. Any advice about where to look for some useful information would be appreciated! Should I try to destroy the OSD and readd it? I'll be more confortable if I was able to find some diagnostics before... Best regards, Michel

1 month, 4 weeks

9
24
0 0

How you manage log

by Albert Shih

Hi, With our small cluster (11 nodes) I notice ceph log a lot Beside to keep that somewhere «just in case», is they are anything to check regularly in the log (in prevention of more serious problem) ? Or can we trust «ceph health» and use the log only for debug. Regards -- Albert SHIH 🦫 🐸 France Heure locale/Local time: ven. 22 mars 2024 22:28:42 CET

1 month, 4 weeks

1
0
0 0

High OSD commit_latency after kernel upgrade

by Özkan Göksu

Hello! After upgrading "5.15.0-84-generic" to "5.15.0-100-generic" (Ubuntu 22.04.2 LTS) , commit latency started acting weird with "CT4000MX500SSD" drives. osd commit_latency(ms) apply_latency(ms) 36 867 867 37 3045 3045 38 15 15 39 18 18 42 1409 1409 43 1224 1224 I downgraded the kernel but the result did not change. I have a similar build and it didn't get upgraded and it is just fine. While I was digging I realised a difference. This is high latency cluster and as you can see the "DISC-GRAN=0B", "DISC-MAX=0B" root@sd-01:~# lsblk -D NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO sdc 0 0B 0B 0 ├─ceph--76b7d255--2a01--4bd4--8d3e--880190181183-osd--block--201d5050--db0c--41b4--85c4--6416ee989d6c │ 0 0B 0B 0 └─ceph--76b7d255--2a01--4bd4--8d3e--880190181183-osd--block--5a376133--47de--4e29--9b75--2314665c2862 root@sd-01:~# find /sys/ -name provisioning_mode -exec grep -H . {} + | sort /sys/devices/pci0000:80/0000:80:03.0/0000:81:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/scsi_disk/0:0:0:0/provisioning_mode:full ------------------------------------------------------------------------------------------ This is low latency cluster and as you can see the "DISC-GRAN=4K", "DISC-MAX=2G" root@ud-01:~# lsblk -D NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO sdc 0 4K 2G 0 ├─ceph--7496095f--18c7--41fd--90f2--d9b3e382bc8e-osd--block--ec86a029--23f7--4328--9600--a24a290e3003 │ 0 4K 2G 0 └─ceph--7496095f--18c7--41fd--90f2--d9b3e382bc8e-osd--block--5b69b748--d899--4f55--afc3--2ea3c8a05ca1 root@ud-01:~# find /sys/ -name provisioning_mode -exec grep -H . {} + | sort /sys/devices/pci0000:00/0000:00:11.4/ata3/host2/target2:0:0/2:0:0:0/scsi_disk/2:0:0:0/provisioning_mode:writesame_16 I think the problem is related to provisioning_mode but I really did not understand the reason. I boot with a live iso and still the drive was "provisioning_mode:full" so it means this is not related to my OS at all. With the upgrade something changed and I think during boot sequence negotiation between LSI controller, drives and kernel started to assign "provisioning_mode:full" but I'm not sure. What should I do ? Best regards.

1 month, 4 weeks

2
5
0 0

Ceph fs understand usage

by Marcus

Hi all, I have setup a test cluster with 3 servers, Everything has default values with a replication of 3. I have created one volume called gds-common and the data pool has been configured with compression lz4 and compression_mode aggressive. I have copied 71TB data to this volume but I can not get my head around usage information on the cluster. Most of this data is quite small files that contain plain text, so I expect the compression rate to be quite good. With both the data storage where I copy from and the ceph fs mounted a df -h gives: urd-gds-031:/gds-common 163T 71T 92T 44% /gds-common 10.10.100.0:6789,10.10.100.1:6789,10.10.100.2:6789:/ 92T 68T 25T 74% /ceph-gds-common Looking at this, the compression rate do not seem to be that good, or is the used column showing an uncompressed value? Using ceph and command ceph fs df detail: --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 262 TiB 94 TiB 168 TiB 168 TiB 64.10 TOTAL 262 TiB 94 TiB 168 TiB 168 TiB 64.10 --- POOLS --- POOL ID PGS STORED (DATA) (OMAP) OBJECTS USED (DATA) (OMAP) %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR .mgr 1 1 24 MiB 24 MiB 0 B 8 73 MiB 73 MiB 0 B 0 25 TiB N/A N/A N/A 0 B 0 B gds-common_data 2 1024 67 TiB 67 TiB 0 B 23.31M 167 TiB 167 TiB 0 B 69.43 25 TiB N/A N/A N/A 35 TiB 70 TiB gds-common_metadata 3 32 4.0 GiB 251 MiB 3.8 GiB 680.88k 12 GiB 753 MiB 11 GiB 0.02 25 TiB N/A N/A N/A 0 B 0 B .rgw.root 4 32 1.4 KiB 1.4 KiB 0 B 4 48 KiB 48 KiB 0 B 0 25 TiB N/A N/A N/A 0 B 0 B default.rgw.log 5 32 182 B 182 B 0 B 2 24 KiB 24 KiB 0 B 0 25 TiB N/A N/A N/A 0 B 0 B default.rgw.control 6 32 0 B 0 B 0 B 7 0 B 0 B 0 B 0 25 TiB N/A N/A N/A 0 B 0 B default.rgw.meta 7 32 0 B 0 B 0 B 0 0 B 0 B 0 B 0 25 TiB N/A N/A N/A 0 B 0 B From my understanding the raw storage used contain all the 3 copies so this means 56TB per copy and gives an compression of about 20% if this is a compressed value? Looking at the pool gds-common_data value STORED 67TB is an uncompressed value and a value per copy, right? The used value from gds-common_data is the raw usage of all 3 copies, right? The %RAW USED value make sense (64.10) but the gds-common_data %USED differs (69.43) and I can not figure out what this value relates to? UNDER COMPR is the amount of data that ceph has recognized that it can be used in compression (70TB) so it is about all the data. I did not understand the value USED COMPR (35TB), do this specify how much it has been compressed, so 70TB has been compressed to 35TB? But what values are specified as compressed and what values shows the raw uncompressed values? Are all values uncompressed values and the only place I see compression is "USED COMPR" and "UNDER COMPR"? But when do I run out of storage in my cluster then and what value should I keep my eyes on if %used is calculated on uncompressed data? Does this mean that I have more storage available then shown from %USED? Does df -h on a mount shows the uncompressed used value? Then we have mon_osd_full_ratio does this mean that the first osd that reaches .95 full (default) make the system stop the clients write aso? But does this mon_osd_full_ratio always reaches its limit before %RAW USAGE reaches 100% or pool %USED reaches 100% or what does happen if one of the used values reaches 100% before mon_osd_full_ratio? I am sorry for all the questions but even after reading the documentaion I do not seem to be able to figure this out. All help is appreciated. Many thanks in advance! Best regards Marcus

1 month, 4 weeks

1
0
0 0

Return value from cephadm host-maintenance?

by Daniel Brown

Possibly a naive question, and possibly seemingly trivial, but is there any good reason to return a “1” on success for cephadm host-maintenance enter and exit: ~$ sudo cephadm host-maintenance enter --fsid XXXX-XXXXXX-XXXX-XXXXX Inferring config /var/lib/ceph/XXXX-XXXXXX-XXXX-XXXXXconfig/ceph.conf Requested to place host into maintenance success - systemd target ceph-XXXX-XXXXXX-XXXX-XXXXX.target disabled ~$ echo $? 1 ~$ sudo cephadm host-maintenance exit --fsidXXXX-XXXXXX-XXXX-XXXXX Inferring config /var/lib/ceph/XXXX-XXXXXX-XXXX-XXXXX /config/ceph.conf Requested to exit maintenance state success - systemd target ceph-XXXX-XXXXXX-XXXX-XXXXX .target enabled and started ~$ echo $? 1

1 month, 4 weeks

2
3
0 0

mon stuck in probing

by faicker mo

Hello, The problem is a mon stucked in probing state. The env is ceph 18.2.1 on ubuntu22.04 with rdma, 5 mons. One mon memb4 is out of quorum. The debug log is attached. Thanks.

1 month, 4 weeks

2
4
0 0

Upgrading from Reef v18.2.1 to v18.2.2

by Michael Worsham

I originally used Cephadm to build my sandbox Ceph cluster (Reef v18.2.1) using Cephadm and Ansible. It's stable and works fine. Now that Reef v18.2.2 has come out, is there a set of instructions on how to upgrade to the latest version via using Cephadm? -- Michael This message and its attachments are from Data Dimensions and are intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately and permanently delete the original email and destroy any copies or printouts of this email as well as any attachments.

1 month, 4 weeks

3
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2024