Looks like this procedure crashes the ceph node. Tried this now for 2nd time after updating and again crash.
el7 + nautilus -> rbd snapshot map -> lvs mount -> crash
(lvs are not even duplicate names)
Good morning,
Cephadm Reef 18.2.1. We recently added 4 hosts and changed a failure
domain from host to datacenter which is the reason for the large
misplaced percentage.
We were seeing some pretty crazy spikes in "OSD Read Latencies" and "OSD
Write Latencies" on the dashboard. Most of the time everything is well
but then for periods of time, 1-4 hours, latencies will go to 10+
seconds for one or more OSDs. This also happens outside scrub hours and
it is not the same OSDs every time. The OSDs affected are HDD with
DB/WAL on NVMe.
Log snippet:
"
...
2024-03-22T06:48:22.859+0000 7fb184b52700 1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fb169898700' had timed out after 15.000000954s
2024-03-22T06:48:22.859+0000 7fb185b54700 1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fb169898700' had timed out after 15.000000954s
2024-03-22T06:48:22.864+0000 7fb169898700 1 heartbeat_map clear_timeout
'OSD::osd_op_tp thread 0x7fb169898700' had timed out after 15.000000954s
2024-03-22T06:48:22.864+0000 7fb169898700 0
bluestore(/var/lib/ceph/osd/ceph-112) log_latency slow operation
observed for submit_transact, latency = 17.716707230s
2024-03-22T06:48:22.880+0000 7fb1748ae700 0
bluestore(/var/lib/ceph/osd/ceph-112) log_latency_fn slow operation
observed for _txc_committed_kv, latency = 17.732601166s, txc =
0x55a5bcda0f00
2024-03-22T06:48:38.077+0000 7fb184b52700 1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fb169898700' had timed out after 15.000000954s
2024-03-22T06:48:38.077+0000 7fb184b52700 1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fb169898700' had timed out after 15.000000954s
...
"
"
[root@dopey ~]# ceph -s
cluster:
id: 8ee2d228-ed21-4580-8bbf-0649f229e21d
health: HEALTH_WARN
1 failed cephadm daemon(s)
Low space hindering backfill (add storage if this doesn't
resolve itself): 1 pg backfill_toofull
services:
mon: 5 daemons, quorum lazy,jolly,happy,dopey,sleepy (age 3d)
mgr: jolly.tpgixt(active, since 10d), standbys: dopey.lxajvk,
lazy.xuhetq
mds: 1/1 daemons up, 2 standby
osd: 540 osds: 539 up (since 6m), 539 in (since 15h); 6250 remapped pgs
data:
volumes: 1/1 healthy
pools: 15 pools, 10849 pgs
objects: 546.35M objects, 1.1 PiB
usage: 1.9 PiB used, 2.3 PiB / 4.2 PiB avail
pgs: 1425479651/3163081036 objects misplaced (45.066%)
6224 active+remapped+backfill_wait
4516 active+clean
67 active+clean+scrubbing
25 active+remapped+backfilling
16 active+clean+scrubbing+deep
1 active+remapped+backfill_wait+backfill_toofull
io:
client: 117 MiB/s rd, 68 MiB/s wr, 274 op/s rd, 183 op/s wr
recovery: 438 MiB/s, 192 objects/s
"
Anyone know what the issue might be? Given that is happens on and off
with large periods of time in between with normal low latencies I think
it unlikely that it is just because the cluster is busy.
Also, how come there's only a small amount of PGs doing backfill when we
have such a large misplaced percentage? Can this be just from backfill
reservation logjam?
Mvh.
Torkil
--
Torkil Svensgaard
Systems Administrator
Danish Research Centre for Magnetic Resonance DRCMR, Section 714
Copenhagen University Hospital Amager and Hvidovre
Kettegaard Allé 30, 2650 Hvidovre, Denmark
Hi All,
I'm looking for some help/advice to solve the issue outlined in the heading.
I'm running CepfFS (name: cephfs) on a Ceph Reef (v18.2.2 - latest
update) cluster, connecting from a laptop running Rocky Linux v9.3
(latest update) with KDE v5 (latest update).
I've set up the laptop to connect to a number of directories on CephFS
via the `/etc/fstab' folder, an example of such is:
`ceph_user@.cephfs=/my_folder /mnt/my_folder ceph noatime,_netdev 0 0`.
Everything is working great; the required Ceph Key is on the laptop
(with a chmod of 600), I can access the files on the Ceph Cluster, etc,
etc, etc - all good.
However, whenever the laptop is in sleep or hibernate mode (ie when I
close the laptop's lid) and then bring the laptop out of
sleep/hibernation (ie I open the laptop's lid) I've lost the CephFS
mountings. The only way to bring them back is to run `mount -a` as root
(or sudo). This is, as I'm sure you'll agree, not a long-term viable
options - especially as this is a running as a pilot-project and the
eventual end-users won't have access to root/sudo.
So I'm seeking the collective wisdom of the community in how to solve
this issue.
I've taken a brief look at autofs, and even half-heartedly had a go at
configuring it, but it didn't seem to work - honestly, it was late and I
wanted to get home after a long day. :-)
Is this the solution to my issue, or is there a better way to construct
the fstab entries, or is there another solution I haven't found yet in
the doco or via google-foo?
All help and advice greatly appreciated - thanks in advance
Cheers
Dulux-Oz
Hi,
We have a Reef cluster that started to complain a couple of weeks ago
about ~20 PGs (over 10K) not scrubbed/deep-scrubbed in time. Looking at
it since a few days, I saw this affect only those PGs that could not be
scrubbed since mid-February. Old the other PGs are regularly scrubbed.
I decided to look if one OSD was present in all these PGs and found one!
I restarted this OSD but it had no effect. Looking at the logs for the
suspect OSD, I found nothing related to abnormal behaviour (but the log
is very verbose at restart time so easy to miss something...). And there
is no error associated with the OSD disk.
Any advice about where to look for some useful information would be
appreciated! Should I try to destroy the OSD and readd it? I'll be more
confortable if I was able to find some diagnostics before...
Best regards,
Michel
Hi,
With our small cluster (11 nodes) I notice ceph log a lot
Beside to keep that somewhere «just in case», is they are anything to check
regularly in the log (in prevention of more serious problem) ? Or can we
trust «ceph health» and use the log only for debug.
Regards
--
Albert SHIH 🦫 🐸
France
Heure locale/Local time:
ven. 22 mars 2024 22:28:42 CET
Hello!
After upgrading "5.15.0-84-generic" to "5.15.0-100-generic" (Ubuntu 22.04.2
LTS) , commit latency started acting weird with "CT4000MX500SSD" drives.
osd commit_latency(ms) apply_latency(ms)
36 867 867
37 3045 3045
38 15 15
39 18 18
42 1409 1409
43 1224 1224
I downgraded the kernel but the result did not change.
I have a similar build and it didn't get upgraded and it is just fine.
While I was digging I realised a difference.
This is high latency cluster and as you can see the "DISC-GRAN=0B",
"DISC-MAX=0B"
root@sd-01:~# lsblk -D
NAME DISC-ALN DISC-GRAN DISC-MAX
DISC-ZERO
sdc 0 0B 0B
0
├─ceph--76b7d255--2a01--4bd4--8d3e--880190181183-osd--block--201d5050--db0c--41b4--85c4--6416ee989d6c
│ 0 0B 0B
0
└─ceph--76b7d255--2a01--4bd4--8d3e--880190181183-osd--block--5a376133--47de--4e29--9b75--2314665c2862
root@sd-01:~# find /sys/ -name provisioning_mode -exec grep -H . {} + | sort
/sys/devices/pci0000:80/0000:80:03.0/0000:81:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/scsi_disk/0:0:0:0/provisioning_mode:full
------------------------------------------------------------------------------------------
This is low latency cluster and as you can see the "DISC-GRAN=4K",
"DISC-MAX=2G"
root@ud-01:~# lsblk -D
NAME DISC-ALN
DISC-GRAN DISC-MAX DISC-ZERO
sdc 0
4K 2G 0
├─ceph--7496095f--18c7--41fd--90f2--d9b3e382bc8e-osd--block--ec86a029--23f7--4328--9600--a24a290e3003
│ 0
4K 2G 0
└─ceph--7496095f--18c7--41fd--90f2--d9b3e382bc8e-osd--block--5b69b748--d899--4f55--afc3--2ea3c8a05ca1
root@ud-01:~# find /sys/ -name provisioning_mode -exec grep -H . {} + | sort
/sys/devices/pci0000:00/0000:00:11.4/ata3/host2/target2:0:0/2:0:0:0/scsi_disk/2:0:0:0/provisioning_mode:writesame_16
I think the problem is related to provisioning_mode but I really did not
understand the reason.
I boot with a live iso and still the drive was "provisioning_mode:full" so
it means this is not related to my OS at all.
With the upgrade something changed and I think during boot sequence
negotiation between LSI controller, drives and kernel started to assign
"provisioning_mode:full" but I'm not sure.
What should I do ?
Best regards.
Hi all,
I have setup a test cluster with 3 servers,
Everything has default values with a replication
of 3.
I have created one volume called gds-common
and the data pool has been configured with compression lz4
and compression_mode aggressive.
I have copied 71TB data to this volume but I can
not get my head around usage information on the cluster.
Most of this data is quite small files that contain plain text,
so I expect the compression rate to be quite good.
With both the data storage where I copy from and the ceph fs
mounted a df -h gives:
urd-gds-031:/gds-common 163T 71T 92T
44% /gds-common
10.10.100.0:6789,10.10.100.1:6789,10.10.100.2:6789:/ 92T 68T 25T
74% /ceph-gds-common
Looking at this, the compression rate do not seem to be that good,
or is the used column showing an uncompressed value?
Using ceph and command ceph fs df detail:
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 262 TiB 94 TiB 168 TiB 168 TiB 64.10
TOTAL 262 TiB 94 TiB 168 TiB 168 TiB 64.10
--- POOLS ---
POOL ID PGS STORED (DATA) (OMAP) OBJECTS
USED (DATA) (OMAP) %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES
DIRTY USED COMPR UNDER COMPR
.mgr 1 1 24 MiB 24 MiB 0 B 8 73
MiB 73 MiB 0 B 0 25 TiB N/A N/A
N/A 0 B 0 B
gds-common_data 2 1024 67 TiB 67 TiB 0 B 23.31M 167
TiB 167 TiB 0 B 69.43 25 TiB N/A N/A
N/A 35 TiB 70 TiB
gds-common_metadata 3 32 4.0 GiB 251 MiB 3.8 GiB 680.88k 12
GiB 753 MiB 11 GiB 0.02 25 TiB N/A N/A
N/A 0 B 0 B
.rgw.root 4 32 1.4 KiB 1.4 KiB 0 B 4 48
KiB 48 KiB 0 B 0 25 TiB N/A N/A
N/A 0 B 0 B
default.rgw.log 5 32 182 B 182 B 0 B 2 24
KiB 24 KiB 0 B 0 25 TiB N/A N/A
N/A 0 B 0 B
default.rgw.control 6 32 0 B 0 B 0 B 7
0 B 0 B 0 B 0 25 TiB N/A N/A
N/A 0 B 0 B
default.rgw.meta 7 32 0 B 0 B 0 B 0
0 B 0 B 0 B 0 25 TiB N/A N/A
N/A 0 B 0 B
From my understanding the raw storage used contain all the 3 copies
so this means 56TB per copy and gives an compression of about 20% if
this is a compressed value?
Looking at the pool gds-common_data value STORED 67TB is an
uncompressed value
and a value per copy, right?
The used value from gds-common_data is the raw usage of all 3 copies,
right?
The %RAW USED value make sense (64.10) but the gds-common_data %USED
differs
(69.43) and I can not figure out what this value relates to?
UNDER COMPR is the amount of data that ceph has recognized that it can
be
used in compression (70TB) so it is about all the data.
I did not understand the value USED COMPR (35TB), do this specify how
much
it has been compressed, so 70TB has been compressed to 35TB?
But what values are specified as compressed and what values shows the
raw uncompressed values?
Are all values uncompressed values and the only place I see compression
is "USED COMPR" and "UNDER COMPR"?
But when do I run out of storage in my cluster then and what value
should I keep my eyes on if %used is calculated on uncompressed data?
Does this mean that I have more storage available then shown from %USED?
Does df -h on a mount shows the uncompressed used value?
Then we have mon_osd_full_ratio does this mean that the first osd
that reaches .95 full (default) make the system stop the clients write
aso?
But does this mon_osd_full_ratio always reaches its limit before
%RAW USAGE reaches 100% or pool %USED reaches 100% or what does
happen if one of the used values reaches 100% before mon_osd_full_ratio?
I am sorry for all the questions but even after reading the documentaion
I do not seem to be able to figure this out.
All help is appreciated.
Many thanks in advance!
Best regards
Marcus
Possibly a naive question, and possibly seemingly trivial, but is there any good reason to return a “1” on success for cephadm host-maintenance enter and exit:
~$ sudo cephadm host-maintenance enter --fsid XXXX-XXXXXX-XXXX-XXXXX Inferring config /var/lib/ceph/XXXX-XXXXXX-XXXX-XXXXXconfig/ceph.conf
Requested to place host into maintenance
success - systemd target ceph-XXXX-XXXXXX-XXXX-XXXXX.target disabled
~$ echo $?
1
~$ sudo cephadm host-maintenance exit --fsidXXXX-XXXXXX-XXXX-XXXXX
Inferring config /var/lib/ceph/XXXX-XXXXXX-XXXX-XXXXX /config/ceph.conf
Requested to exit maintenance state
success - systemd target ceph-XXXX-XXXXXX-XXXX-XXXXX .target enabled and started
~$ echo $?
1
Hello,
The problem is a mon stucked in probing state.
The env is ceph 18.2.1 on ubuntu22.04 with rdma, 5 mons. One mon memb4 is
out of quorum.
The debug log is attached.
Thanks.
I originally used Cephadm to build my sandbox Ceph cluster (Reef v18.2.1) using Cephadm and Ansible. It's stable and works fine.
Now that Reef v18.2.2 has come out, is there a set of instructions on how to upgrade to the latest version via using Cephadm?
-- Michael
This message and its attachments are from Data Dimensions and are intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately and permanently delete the original email and destroy any copies or printouts of this email as well as any attachments.