Hi,
I wonder if it is possible to define a host pattern, which includes the
host names
ceph01…ceph19, but no other hosts, especially not ceph00. That means, this
pattern is wrong: ceph[01][0-9] , since it includes ceph00.
Not really a problem, but it seems that the "“host-pattern” is a regex that
matches against hostnames and returns only matching hosts"¹ is not defined
more precisely in the docs.
1) https://docs.ceph.com/en/latest/cephadm/host-management/
> >
> > Hi all,
> >
> > we are observing a problem on a libvirt virtualisation cluster that
> might come from ceph rbd clients. Something went wrong during execution
> of a live-migration operation and as a result we have two instances of
> the same VM running on 2 different hosts, the source- and the
> destination host. What we observe now is the the exclusive lock of the
> RBD disk image moves between these two clients periodically (every few
> minutes the owner flips).
>
> Hi Frank,
>
> If you are talking about RBD exclusive lock feature ("exclusive-lock"
> under "features" in "rbd info" output) then this is expected. This
> feature provides automatic cooperative lock transitions between clients
> to ensure that only a single client is writing to the image at any
> given time. It's there to protect internal per-image data structures
> such as the object map, the journal or the client-side PWL (persistent
> write log) cache from concurrent modifications in case the image is
> opened by two or more clients. The name is confusing but it's NOT
> about preventing other clients from opening and writing to the image.
> Rather it's about serializing those writes.
>
I can remember asking this also quite some time ago. Maybe this is helpful
https://www.wogri.at/scripts/ceph-libvirt-locking/
Hello Team,
Please help me i deploy two ceph cluster with 6 node configuration almost 800tb of capacity. and configurae in the DC-DR configuration for the data high availability. i eanbel the rwg and rbd block device mirroring for the replocatio of the data. we have the 10 GBPS fiber replication network .
when we first start rbd mirror from our dc to dr starting time when we are replication our exsisting data that time we are getting almomst 8 GBPS replication speed and it's work fine. once all the exesting images data replicated now we are facing the replication speed issue . now only we are getting the 5 to 10 mbps relication speed. we also try to find the option like rbd_journal_max_payload_bytes and rbd_mirror_journal_max_fetch_bytes but max payload size we try to increase but we don't get any result regarding the speed. it still same . and rbd_mirror_journal_max_fetch_bytes option we are not able to find on the our ceph version. i also try to modify some other values and increase like
rbd_mirror_memory_target
rbd_mirror_memory_cache_min
you also can find some refrence regarding this values for increase performace.
Eugen
[1]
https://tracker.ceph.com/projects/ceph/repository/revisions/1ef12ea0d29f955…
[2]
https://github.com/ceph/ceph/pull/27670
Information of my ceph Cluster.
Version: ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
rbd-mirror daemon version: 17.2.5
Mirror mode; pool
max image mirro at time: 5
replication network: 10 gbps (dedicated)
Client: DC cluster we are continue writing the 50 to 400 mbps data but
replication only 5 to 10 mbps.
issue: speed only we get the 4 to 5 mbps. also we have the 10 Gbps replication network
bandwidth.
Note::- I also try to find the option rbd_mirror_journal_max_fetch_bytes but i'm not
able to find the this option in the configuration. also when i try to set from the
command
line it's showing error like
command:
ceph config set client.rbd rbd_mirror_journal_max_fetch_bytes 33554432
error:
Error EINVAL: unrecognized config option 'rbd_mirror_journal_max_fetch_bytes'
We encountered the following problems while trying to perform
maintenance on a Ceph cluster:
The cluster consists of 7 Nodes with 10 OSDs each.
There are 4 pools on it: 3 of them are replicated pools with 3/2
size/min_size and one is an erasure coded pool with m=2 and k=5.
The following global flags were set:
* noout
* norebalance
* nobackfill
* norecover
Then, after those flags were set, all OSDs were stopped via the command
ceph osd stop, which seems to have caused the issue.
After maintenance was done, all OSDs were started again via systemctl.
Only about half of the 70 OSDs in total started at first - while the
other half started, but got killed after a few seconds with the
following log messages:
ceph-osd[197270]: 2023-01-24T13:39:12.103+0100 7ff3fcf8d700 -1 osd.51
12161 map says i am stopped by admin. shutting down.
ceph-osd[197270]: 2023-01-24T13:39:12.103+0100 7ff40da55700 -1 received
signal: Interrupt from Kernel ( Could be generated by pthread_kill(),
raise(), abort(), alarm() ) UID: 0
ceph-osd[197270]: 2023-01-24T13:39:12.103+0100 7ff40da55700 -1 osd.51
12161 *** Got signal Interrupt ***
ceph-osd[197270]: 2023-01-24T13:39:12.103+0100 7ff40da55700 -1 osd.51
12161 *** Immediate shutdown (osd_fast_shutdown=true) ***
And indeed, when looking into the osd map via ceph osd dump, the
remaining OSDs seem to be marked as stopped:
osd.50 down out weight 0 up_from 9213 up_thru 9416 down_at 9760
last_clean_interval [9106,9207)
[v2:10.0.1.61:6813/6211,v1:10.0.1.61:6818/6211]
[v2:10.0.0.61:6814/6211,v1:10.0.0.61:6816/6211] exists,stop
9a2590c4-f50b-4550-bfd1-5aafb543cb59
We were able to restore some of the remaining OSDs via running
ceph out osd XX
ceph in osd XX
and then starting the service again (via systemctl start). This did work
for most OSDs, except for the OSDs that are located on one specific
host. Some OSDs required several restarts until they did not kill
themselves a few seconds after starting.
This whole issue seems to be caused by the OSDs being marked as stopped
in the OSD map [1]. Apparently this state should get reset when
re-starting the OSD again [2], but for some reason this doesn't happen
for some of the OSDs. This behavior seems to have been introduced via
the following pull request [3]. We have also found the following commit
where the logic regarding stop seemed to have been introduced [4].
We were looking into commands that reset the stopped status of the OSD
in the OSD map, but did not find any way of forcing this.
Since we are out of ideas on how to proceed with the remaining 10 OSDs
that cannot get brought up: How does one recover from this situation? It
seems like by running ceph osd stop the cluster got in a state that
seems irrecoverable with the normal CLI commands available. We even
looked into the possibility of manually manipulating the osdmap via the
osdmaptool, but there doesn't seem to be a way to edit the start/stopped
status and it also seems like a very invasive procedure. There does not
seem to be any way we can see of recovering from this, apart from
rebuilding all the OSDs - which we refrained from for now.
Kind Regards
Hanreich Stefan
[1]
https://github.com/ceph/ceph/blob/63a77b2c5b683cb241f865daec92c046152175b4/…
[2]
https://github.com/ceph/ceph/blob/63a77b2c5b683cb241f865daec92c046152175b4/…
[3] https://github.com/ceph/ceph/pull/43664
[4]
https://github.com/ceph/ceph/commit/5dbae13ce0f5b0104ab43e0ccfe94f832d0e1268
Dear Ceph-Users,
i am struggling to replace a disk. My ceph-cluster is not replacing the old OSD even though I did:
ceph orch osd rm 232 --replace
The OSD 232 is still shown in the osd list, but the new hdd will be placed as a new OSD. This wouldnt mind me much, if the OSD was also placed on the bluestoreDB / NVME, but it doesn't.
My steps:
"ceph orch osd rm 232 --replace"
remove the failed hdd.
add the new one.
Convert the disk within the servers bios, so that the node can have direct access on it.
It shows up as /dev/sdt,
enter maintenance mode
reboot server
drive is now /dev/sdm (which the old drive had)
"ceph orch device zap node-x /dev/sdm "
A new OSD is placed on the cluster.
Can you give me a hint, where did I take a wrong turn? Why is the disk not being used as OSD 232?
Best
Ken
P.S. Sorry for double sending this message, somehow this mail-address was not subscribed to the list anymore.
Dear Ceph-Users,
i am struggling to replace a disk. My ceph-cluster is not replacing the
old OSD even though I did:
ceph orch osd rm 232 --replace
The OSD 232 is still shown in the osd list, but the new hdd will be
placed as a new OSD. This wouldnt mind me much, if the OSD was also
placed on the bluestoreDB / NVME, but it doesn't.
My steps:
"ceph orch osd rm 232 --replace"
remove the failed hdd.
add the new one.
Convert the disk within the servers bios, so that the node can have
direct access on it.
It shows up as /dev/sdt,
enter maintenance mode
reboot server
drive is now /dev/sdm (which the old drive had)
"ceph orch device zap node-x /dev/sdm "
A new OSD is placed on the cluster.
Can you give me a hint, where did I take a wrong turn? Why is the disk
not being used as OSD 232?
Best
Ken
P.S. Sorry for double sending this message, somehow this mail was not
subscribed to the list anymore.
Hi,
Are the creation of RBD volumes and RGW buckets audited? If yes, what do
the audit logs look like? Is there any documentation about it? I tried to
find the related audit logs from the "/var/log/ceph/ceph.audit.log" file
but didn't find any.
Thanks,
Jinhao
Are there alternatives to TheJJ balancer? I have a (temporary) rebalance
problem, and that code chokes[1].
Essentially, I have a few pgs in remapped+backfill_toofull, but plenty of
space in the parent's parent bucket(s).
[1] https://github.com/TheJJ/ceph-balancer/issues/23
On Wed, Dec 14, 2022 at 6:55 AM Denis Polom <denispolom(a)gmail.com> wrote:
> Hi,
>
> looks like TheJJ balancer solved the issue!
>
> Thx!
>
>
> On 11/9/22 13:35, Denis Polom wrote:
> > Hi Stefan,
> >
> > thank you for help. Looks very interesting and command you sent helps
> > to have better insight on that. Still wandering why some of OSDs keeps
> > primary for more PGs as others. I was thinking that balancer and CRUSH
> > should take care of that.
> >
> > I will try balancer you sent a link for and will post result. But this
> > will take more time as first I have to test it on some non-production
> > Ceph.
> >
> > Thx!
> >
> >
> > On 11/9/22 08:20, Stefan Kooman wrote:
> >> On 11/1/22 13:45, Denis Polom wrote:
> >>> Hi
> >>>
> >>> I observed on my Ceph cluster running latest Pacific that same size
> >>> OSDs are utilized differently even if balancer is running and
> >>> reports status as perfectly balanced.
> >>>
> >>
> >> That might be true because the primary PGs are not evenly balanced.
> >> You can check that with: ceph pg dump. The last output is the
> >> overview for how many PGs an OSD is primary for. To get more detail
> >> by pool you can run this (source: unknown, but it works :-)):
> >>
> >> "ceph pg dump | awk '
> >> BEGIN { IGNORECASE = 1 }
> >> /^PG_STAT/ { col=1; while($col!="UP") {col++}; col++ }
> >> /^[0-9a-f]+\.[0-9a-f]+/ { match($0,/^[0-9a-f]+/); pool=substr($0,
> >> RSTART, RLENGTH); poollist[pool]=0;
> >> up=$col; i=0; RSTART=0; RLENGTH=0; delete osds;
> >> while(match(up,/[0-9]+/)>0) { osds[++i]=substr(up,RSTART,RLENGTH); up
> >> = substr(up, RSTART+RLENGTH) }
> >> for(i in osds) {array[osds[i],pool]++; osdlist[osds[i]];}
> >> }
> >> END {
> >> printf("\n");
> >> printf("pool :\t"); for (i in poollist) printf("%s\t",i); printf("|
> >> SUM \n");
> >> for (i in poollist) printf("--------"); printf("----------------\n");
> >> for (i in osdlist) { printf("osd.%i\t", i); sum=0;
> >> for (j in poollist) { printf("%i\t", array[i,j]); sum+=array[i,j];
> >> sumpool[j]+=array[i,j] }; printf("| %i\n",sum) }
> >> for (i in poollist) printf("--------"); printf("----------------\n");
> >> printf("SUM :\t"); for (i in poollist) printf("%s\t",sumpool[i]);
> >> printf("|\n");
> >> }'"
> >>
> >> 11/15/2022 14:35 UTC there is a talk about this: New workload
> >> balancer in Ceph (Ceph virtual 2022).
> >>
> >> The balancer made by Jonas Jelten works very well for us (though does
> >> not balance primary PGs): https://github.com/TheJJ/ceph-balancer. It
> >> outperforms the ceph-balancer module by far. And had faster
> >> convergence. This is true up to and including octopus release.
> >>
> >> Gr. Stefan
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
--
Jeremy Austin
jhaustin(a)gmail.com
Hello,
I'm currently investigating a downed ceph cluster that I cannot communicate with.
Setup is:
3 hosts with each 12 disks (osd/mon)
3 vm's with mon/mds/mgr
The vm's are unavailable at the moment and one of the hosts is online with osd/mon running.
When issuing the command ceph -s nothing happens and after 5 minutes the following.
2023-01-26T12:40:08.111+0100 7f68b4b8b700 0 monclient(hunting): authenticate timed out after 300
What would be the best way of troubleshooting this?
Venlig hilsen - Mit freundlichen Grüßen - Kind Regards,
Jens Galsgaard
Hi everyone,
Ceph Days are coming to Southern California, co-located with our friends at
SCALE - the Southern California Linux Expo! The event will be a full day of
Ceph content on March 9.
The CFP ends on *2023-02-02*, so start drafting and completing your
proposals quickly.
https://survey.zohopublic.com/zs/3UBUpChttps://ceph.io/en/community/events/2023/ceph-days-socal/
Here are some suggested topics:
- Ceph operations, management, and development
- New and proposed Ceph features, development status
- Ceph development roadmap
- Best practices
- Ceph use-cases, solution architectures, and user experiences
- Ceph performance and optimization
- Platform Integrations
- Kubernetes, OpenShift
- OpenStack (Cinder, Manila, etc.)
- Spark
- Multi-site and multi-cluster data services
- Persistent memory, ZNS SSDs, SMR HDDs, DPUs, and other new hardware
technologies
- Storage management, monitoring, and deployment automation
- Experiences deploying and operating Ceph in production and/or at scale
- Small-scale or edge deployments
- Long-term, archival storage
- Data compression, deduplication, and storage optimization
- Developer processes, tools, challenges
- Ceph testing infrastructure, tools
- Ceph community issues, outreach, and project governance
- Ceph documentation, training, and learner experience
--
Mike Perez
There are other Ceph speaking opportunities to consider:
- Ceph Days NYC <https://ceph.io/en/community/events/2023/ceph-days-nyc/> -
February 21st, 2023 - Schedule and registration are available
- Ceph Days Southern California
<https://ceph.io/en/community/events/2023/ceph-days-socal/> - March 9th,
2023 - CFP open until *February 2nd*.
- Cephalocon 2023 <https://events.linuxfoundation.org/cephalocon/>
(co-located
with KubeCon in Amsterdam) - April 16 - 18 - CFP now available!
- Ceph Tech Talk (virtual) <https://ceph.io/en/community/tech-talks/> -
Monthly
Make sure to join our Announcement list or social media for further updates
on events
- Ceph Announcement list
<https://lists.ceph.io/postorius/lists/ceph-announce.ceph.io/>
- Twitter <https://twitter.com/ceph>
- LinkedIn <https://www.linkedin.com/company/ceph/>
- FaceBook <https://www.facebook.com/cephstorage/>