Good day all,
I've an issue with a few OSDs (in two different nodes) that attempt to
start but fail / crash quite quickly. They are all LVM disks.
I've tried upgrading software, health checks on the hardware (nodes and
disks) and there doesn't seem to be any issues there.
Recently I've had a few "other" disks physically fail in the cluster and
now have one PG down which is blocking some IO on CephFS.
I've added the output of the osd journalctl and the osd log below in case
it's helpful to identify anything obvious.
I also set debug bluefs = 20 , saw this in another post.
I recently manually upgraded this node to (17.2.0) before the problem
began, later to (17.2.5). - The other osds in this node start / run fine.
The other node (15.2.17) also has a few osds that will not start and some
that run without issue.
Could anyone point me in the right direction to investigate and solve my
osd issues.
https://pastebin.com/3PkCabdfhttps://pastebin.com/BT9bnhSb
Production system mainly used for CephFS
OS: Ubuntu 20.04.5 LTS
Ceph versions: 15.2.17 - Octopus (one OSD node manually upgraded to 17.2.5
- Quincy)
Erasure data pool (K=4, M=2) - The journal's for each osd are co-located
on each drive
Kind regards
Geoffrey Rhodes
Hello All,
In the ceph quincy Not able to find rbd_mirror_journal_max_fetch_bytes config
in rbd mirror
i configured the ceph cluster almost 400 tb and enable the rbd-mirror in the
starting stage i'm able to achive the almost 9 GB speed , but after the rebalane
completed of the all the images . rbd-mirror speed got automaticily reduce to between 4 to
5 mbps.
in my primary cluster we are continuelsy writing the 50 to 400 mbps data but replication
speed only we get the 4 to 5 mbps. also we have the 10 Gbps replication network
bandwidth.
Note::- I also try to find the option rbd_mirror_journal_max_fetch_bytes but i'm not
able to find the this option in the configuration. also when i try to set from the command
line it's showing error like
command:
ceph config set client.rbd rbd_mirror_journal_max_fetch_bytes 33554432
error:
Error EINVAL: unrecognized config option 'rbd_mirror_journal_max_fetch_bytes'
cluster version
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
Please suggest any alternative way to configurre this option or how i can improve the
replication n/w speed.
Hello,
If buffered_io is enabled, is there a way to know what is the exactly used physical memory from each osd?
What I've found is the dump_mempools which last entries are the following, but this bytes would be the real physical memory usage?
"total": {
"items": 60005205,
"bytes": 995781359
Also which metric is this value? I haven't found any.
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Hi,
I wonder if it is possible to define a host pattern, which includes the
host names
ceph01…ceph19, but no other hosts, especially not ceph00. That means, this
pattern is wrong: ceph[01][0-9] , since it includes ceph00.
Not really a problem, but it seems that the "“host-pattern” is a regex that
matches against hostnames and returns only matching hosts"¹ is not defined
more precisely in the docs.
1) https://docs.ceph.com/en/latest/cephadm/host-management/
> >
> > Hi all,
> >
> > we are observing a problem on a libvirt virtualisation cluster that
> might come from ceph rbd clients. Something went wrong during execution
> of a live-migration operation and as a result we have two instances of
> the same VM running on 2 different hosts, the source- and the
> destination host. What we observe now is the the exclusive lock of the
> RBD disk image moves between these two clients periodically (every few
> minutes the owner flips).
>
> Hi Frank,
>
> If you are talking about RBD exclusive lock feature ("exclusive-lock"
> under "features" in "rbd info" output) then this is expected. This
> feature provides automatic cooperative lock transitions between clients
> to ensure that only a single client is writing to the image at any
> given time. It's there to protect internal per-image data structures
> such as the object map, the journal or the client-side PWL (persistent
> write log) cache from concurrent modifications in case the image is
> opened by two or more clients. The name is confusing but it's NOT
> about preventing other clients from opening and writing to the image.
> Rather it's about serializing those writes.
>
I can remember asking this also quite some time ago. Maybe this is helpful
https://www.wogri.at/scripts/ceph-libvirt-locking/
Hello Team,
Please help me i deploy two ceph cluster with 6 node configuration almost 800tb of capacity. and configurae in the DC-DR configuration for the data high availability. i eanbel the rwg and rbd block device mirroring for the replocatio of the data. we have the 10 GBPS fiber replication network .
when we first start rbd mirror from our dc to dr starting time when we are replication our exsisting data that time we are getting almomst 8 GBPS replication speed and it's work fine. once all the exesting images data replicated now we are facing the replication speed issue . now only we are getting the 5 to 10 mbps relication speed. we also try to find the option like rbd_journal_max_payload_bytes and rbd_mirror_journal_max_fetch_bytes but max payload size we try to increase but we don't get any result regarding the speed. it still same . and rbd_mirror_journal_max_fetch_bytes option we are not able to find on the our ceph version. i also try to modify some other values and increase like
rbd_mirror_memory_target
rbd_mirror_memory_cache_min
you also can find some refrence regarding this values for increase performace.
Eugen
[1]
https://tracker.ceph.com/projects/ceph/repository/revisions/1ef12ea0d29f955…
[2]
https://github.com/ceph/ceph/pull/27670
Information of my ceph Cluster.
Version: ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
rbd-mirror daemon version: 17.2.5
Mirror mode; pool
max image mirro at time: 5
replication network: 10 gbps (dedicated)
Client: DC cluster we are continue writing the 50 to 400 mbps data but
replication only 5 to 10 mbps.
issue: speed only we get the 4 to 5 mbps. also we have the 10 Gbps replication network
bandwidth.
Note::- I also try to find the option rbd_mirror_journal_max_fetch_bytes but i'm not
able to find the this option in the configuration. also when i try to set from the
command
line it's showing error like
command:
ceph config set client.rbd rbd_mirror_journal_max_fetch_bytes 33554432
error:
Error EINVAL: unrecognized config option 'rbd_mirror_journal_max_fetch_bytes'
We encountered the following problems while trying to perform
maintenance on a Ceph cluster:
The cluster consists of 7 Nodes with 10 OSDs each.
There are 4 pools on it: 3 of them are replicated pools with 3/2
size/min_size and one is an erasure coded pool with m=2 and k=5.
The following global flags were set:
* noout
* norebalance
* nobackfill
* norecover
Then, after those flags were set, all OSDs were stopped via the command
ceph osd stop, which seems to have caused the issue.
After maintenance was done, all OSDs were started again via systemctl.
Only about half of the 70 OSDs in total started at first - while the
other half started, but got killed after a few seconds with the
following log messages:
ceph-osd[197270]: 2023-01-24T13:39:12.103+0100 7ff3fcf8d700 -1 osd.51
12161 map says i am stopped by admin. shutting down.
ceph-osd[197270]: 2023-01-24T13:39:12.103+0100 7ff40da55700 -1 received
signal: Interrupt from Kernel ( Could be generated by pthread_kill(),
raise(), abort(), alarm() ) UID: 0
ceph-osd[197270]: 2023-01-24T13:39:12.103+0100 7ff40da55700 -1 osd.51
12161 *** Got signal Interrupt ***
ceph-osd[197270]: 2023-01-24T13:39:12.103+0100 7ff40da55700 -1 osd.51
12161 *** Immediate shutdown (osd_fast_shutdown=true) ***
And indeed, when looking into the osd map via ceph osd dump, the
remaining OSDs seem to be marked as stopped:
osd.50 down out weight 0 up_from 9213 up_thru 9416 down_at 9760
last_clean_interval [9106,9207)
[v2:10.0.1.61:6813/6211,v1:10.0.1.61:6818/6211]
[v2:10.0.0.61:6814/6211,v1:10.0.0.61:6816/6211] exists,stop
9a2590c4-f50b-4550-bfd1-5aafb543cb59
We were able to restore some of the remaining OSDs via running
ceph out osd XX
ceph in osd XX
and then starting the service again (via systemctl start). This did work
for most OSDs, except for the OSDs that are located on one specific
host. Some OSDs required several restarts until they did not kill
themselves a few seconds after starting.
This whole issue seems to be caused by the OSDs being marked as stopped
in the OSD map [1]. Apparently this state should get reset when
re-starting the OSD again [2], but for some reason this doesn't happen
for some of the OSDs. This behavior seems to have been introduced via
the following pull request [3]. We have also found the following commit
where the logic regarding stop seemed to have been introduced [4].
We were looking into commands that reset the stopped status of the OSD
in the OSD map, but did not find any way of forcing this.
Since we are out of ideas on how to proceed with the remaining 10 OSDs
that cannot get brought up: How does one recover from this situation? It
seems like by running ceph osd stop the cluster got in a state that
seems irrecoverable with the normal CLI commands available. We even
looked into the possibility of manually manipulating the osdmap via the
osdmaptool, but there doesn't seem to be a way to edit the start/stopped
status and it also seems like a very invasive procedure. There does not
seem to be any way we can see of recovering from this, apart from
rebuilding all the OSDs - which we refrained from for now.
Kind Regards
Hanreich Stefan
[1]
https://github.com/ceph/ceph/blob/63a77b2c5b683cb241f865daec92c046152175b4/…
[2]
https://github.com/ceph/ceph/blob/63a77b2c5b683cb241f865daec92c046152175b4/…
[3] https://github.com/ceph/ceph/pull/43664
[4]
https://github.com/ceph/ceph/commit/5dbae13ce0f5b0104ab43e0ccfe94f832d0e1268
Dear Ceph-Users,
i am struggling to replace a disk. My ceph-cluster is not replacing the old OSD even though I did:
ceph orch osd rm 232 --replace
The OSD 232 is still shown in the osd list, but the new hdd will be placed as a new OSD. This wouldnt mind me much, if the OSD was also placed on the bluestoreDB / NVME, but it doesn't.
My steps:
"ceph orch osd rm 232 --replace"
remove the failed hdd.
add the new one.
Convert the disk within the servers bios, so that the node can have direct access on it.
It shows up as /dev/sdt,
enter maintenance mode
reboot server
drive is now /dev/sdm (which the old drive had)
"ceph orch device zap node-x /dev/sdm "
A new OSD is placed on the cluster.
Can you give me a hint, where did I take a wrong turn? Why is the disk not being used as OSD 232?
Best
Ken
P.S. Sorry for double sending this message, somehow this mail-address was not subscribed to the list anymore.
Dear Ceph-Users,
i am struggling to replace a disk. My ceph-cluster is not replacing the
old OSD even though I did:
ceph orch osd rm 232 --replace
The OSD 232 is still shown in the osd list, but the new hdd will be
placed as a new OSD. This wouldnt mind me much, if the OSD was also
placed on the bluestoreDB / NVME, but it doesn't.
My steps:
"ceph orch osd rm 232 --replace"
remove the failed hdd.
add the new one.
Convert the disk within the servers bios, so that the node can have
direct access on it.
It shows up as /dev/sdt,
enter maintenance mode
reboot server
drive is now /dev/sdm (which the old drive had)
"ceph orch device zap node-x /dev/sdm "
A new OSD is placed on the cluster.
Can you give me a hint, where did I take a wrong turn? Why is the disk
not being used as OSD 232?
Best
Ken
P.S. Sorry for double sending this message, somehow this mail was not
subscribed to the list anymore.