On Mon, May 3, 2021 at 12:00 PM Magnus Harlander <magnus(a)harlan.de> wrote:
> Am 03.05.21 um 11:22 schrieb Ilya Dryomov:
> max_osd 12
> I never had more then 10 osds on the two osd nodes of this cluster.
> I was running a 3 osd-node cluster earlier with more than 10
> osds, but the current cluster has been setup from scratch and
> I definitely don't remember having ever more than 10 osds!
> Very strange!
> I had to replace 2 disks because of DOA-Problems, but for that
> I removed 2 osds and created new ones.
> I used ceph-deploy do create new osds.
> To delete osd.8 I used:
> # take it out
> ceph osd out 8
> # wait for rebalancing to finish
> systemctl stop ceph-osd@8
> # wait for a healthy cluster
> ceph osd purge 8 --yes-i-really-mean-it
> # edit ceph.conf and remove osd.8
> ceph-deploy --overwrie-conf admin s0 s1
> # Add the new disk and:
> ceph-deploy osd create --data /dev/sdc s0
> it get's created with the next free osd num (8) because purge releases 8 for reuse
It would be nice to track it down, but for the immediate issue of
kernel 5.11 not working, "ceph osd setmaxosd 10" should fix it.
On Mon, May 3, 2021 at 9:20 AM Magnus Harlander <magnus(a)harlan.de> wrote:
> Am 03.05.21 um 00:44 schrieb Ilya Dryomov:
> On Sun, May 2, 2021 at 11:15 PM Magnus Harlander <magnus(a)harlan.de> wrote:
> I know there is a thread about problems with mounting cephfs with 5.11 kernels.
> Hi Magnus,
> What is the output of "ceph config dump"?
> Instead of providing those lines, can you run "ceph osd getmap 64281 -o
> osdmap.64281" and attach osdmap.64281 file?
> Hi Ilya,
> [root@s1 ~]# ceph config dump
> WHO MASK LEVEL OPTION VALUE RO
> global basic device_failure_prediction_mode local
> global advanced ms_bind_ipv4 false
> mon advanced auth_allow_insecure_global_id_reclaim false
> mon advanced mon_lease 8.000000
> mgr advanced mgr/devicehealth/enable_monitoring true
> getmap output is attached,
I see the problem, but I don't understand the root cause yet. It is
related to the two missing OSDs:
> May 02 22:54:05 islay kernel: libceph: no match of type 1 in addrvec
> May 02 22:54:05 islay kernel: libceph: corrupt full osdmap (-2) epoch 64281 off 3154 (00000000a90fe1d7 of 000000000083f4bd-00000000c03bdc9b)
> max_osd 12
> osd.0 up in ... [v2:192.168.200.141:6804/3027,v1:192.168.200.141:6805/3027] ... exists,up 631bc170-45fd-4948-9a5e-4c278569c0bc
> osd.1 up in ... [v2:192.168.200.140:6811/3066,v1:192.168.200.140:6813/3066] ... exists,up 660a762c-001d-4160-a9ee-d0acd078e776
> osd.2 up in ... [v2:192.168.200.141:6815/3008,v1:192.168.200.141:6816/3008] ... exists,up e4d94d3a-ec58-46a1-b61c-c47dd39012ed
> osd.3 up in ... [v2:192.168.200.140:6800/3067,v1:192.168.200.140:6801/3067] ... exists,up 26d25060-fd99-4d15-a1b2-ebb77646671e
> osd.4 up in ... [v2:192.168.200.140:6804/3049,v1:192.168.200.140:6806/3049] ... exists,up 238f197d-ecbc-4588-8a99-6a63c9bb1a17
> osd.5 up in ... [v2:192.168.200.140:6816/3073,v1:192.168.200.140:6817/3073] ... exists,up a9dcb26f-0f1c-4067-a26b-a29939285e0b
> osd.6 up in ... [v2:192.168.200.141:6808/3020,v1:192.168.200.141:6809/3020] ... exists,up f399b47d-063f-4b2f-bd93-289377dc9945
> osd.7 up in ... [v2:192.168.200.141:6800/3023,v1:192.168.200.141:6801/3023] ... exists,up 3557ceca-7bd8-401e-abd3-59bee168e8f6
> osd.8 up in ... [v2:192.168.200.141:6812/3017,v1:192.168.200.141:6813/3017] ... exists,up 7f9cad3f-163d-4bb7-85b2-fffd46982fff
> osd.9 up in ... [v2:192.168.200.140:6805/3053,v1:192.168.200.140:6807/3053] ... exists,up c543b12a-f9bf-4b83-af16-f6b8a3926e69
The kernel client is failing to parse addrvec entries for non-existent
osd10 and osd11. It is probably being too stringent, but before fixing
it I'd like to understand what happened to those OSDs. It looks like
they were removed but not completely.
What let to their removal? What commands were used?
I have a lot of multipart uploads that look like they never finished. Some
of them date back to 2019.
Is there a way to clean them up when they didn't finish in 28 days?
I know I can implement a LC policy per bucket, but how do I implement it
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
I'm trying to understand what and where radosgw listen ?
There is a lot of contradictory or redundant informations about that.
First about the contradictory informations for the socket.
At https://docs.ceph.com/en/pacific/radosgw/config-ref/ <https://docs.ceph.com/en/pacific/radosgw/config-ref/>, it says rgw_socket_path, but at https://docs.ceph.com/en/pacific/man/8/radosgw/ <https://docs.ceph.com/en/pacific/man/8/radosgw/> is says 'rgw socket path'
That problem is quite common in the ceph documentation. Are both value accepted ?
Next about some naming, or binding IP. Where it's defined, and how ?
rgw_frontends = "beast ssl_endpoint=0.0.0.0:443 port=443 ..."
That's a lot of redundancy, or contradictory informations. What is the purpose of each one ? What is the difference between
rgw_frontends = ".. port = ..."
Or rgw_host and rgw_dns_name. What is the difference ?
The documentation provides no help at all:
Description: The DNS name of the served domain. See also the hostnames setting within regions.
The description says nothing new, it just repeat the field name.
Is one of them used by the manager for communication ? I already had the problem for the entry in the certificate used by the frontend, it used an IP coming from nowhere.
If a fcgi is used, how the manager find the endpoint ?
I recently added 2 OSD nodes to my Nautilus cluster, increasing the OSD
count from 32 to 48 - all 12TB HDDs with NVMe for db.
I generally keep an ssh session open where I can run 'watch cepf -s'. My
observations are mostly based on what I saw from watching this.
Even with 10GB networking, rebalancing 529 pgs took 10 days, during which
there were always a few PGs undersized+degraded, frequent flashes of slow
ops, occasional OSD restarts, and the scrub and deep-scrub backlog steadily
increased. When the backfills completed I had 24 missed deep-scrubs and 10
I suspect that this is because of some settings that I had fiddled with, so
this post may be an advertisement for what not to do to your cluster.
However, I'd like to know if my understanding is accurate. I believe that
my settings resulted
In short, I think I had my config set up so there was contention due to too
many processes trying to do things to some OSDs all at once:
- osd_scrub_during_recovery: I think I had this set to true for the
first 9 days, but set it to false when I started to realize that it might
be causing contention
- osd_max_scrubs: I had this set high - global:30 osd:10. At some
earlier time when I had a scrub backlog I thought that these were counts
for simultaneous scrubs across all OSDs rather than 'per OSD'
- Now I see why the default is 1.
- Assumption: on an HDD multiple competing scrubs cause excessive
seeking and thus compound impacts to scrub progress
- osd_max_backfills: I had bumped this up as well - global:30 osd:10,
thinking it would speed up the rebalancing of my PGs onto my new OSDs.
- Now, the same thinking as for osd_max_scrubs: compounding
contention, further compounded by the scrub acivity that should have been
inhibited by osd_scrub_during_recovery:false.
I believe that all of this also resulted in my EC pgs (8 + 2) becoming
degraded. My assumption here is that collisions between deep-scrubs and
backfills sometimes locked the backfill process out of a piece of an EC PG,
causing backfil to rebuild instead of copy.
The good news is that I haven't lost and data and, other than the scrub
backlog things seem to be working smoothly. It seems like with 1 or 2
scrubs (deep or regular) running they are taking about 2 hours per scrub.
As the scrubs progress, more scrub deadlines are missed, so it's not a
steady march to zero.
Please feel free to comment. I'd be glad to know if I'm on the right track
as we expect the cluster to double in size over the next 12 to 18 months.
I'm trying to set up a new ceph cluster, and I've hit a bit of a blank.
I started off with centos7 and cephadm. Worked fine to a point, except I
had to upgrade podman but it mostly worked with octopus.
Since this is a fresh cluster and hence no data at risk, I decided to jump
straight into Pacific when it came out and upgrade. Which is where my
trouble began. Mostly because Pacific needs a version on lvm later than
what's in centos7.
I can't upgrade to centos8 as my boot drives are not supported by centos8
due to the way redhst disabled lots of disk drivers. I think I'm looking at
Ubuntu or debian.
Given cephadm has a very limited set of depends it would be good to have a
supported matrix, it would also be good to have a check in cephadm on
upgrade, that says no I won't upgrade if the version of lvm2 is too low on
any host and let's the admin fix the issue and try again.
I was thinking to upgrade to centos8 for this project anyway until I
relised that centos8 can't support my hardware I've inherited. But
currently I've got a broken cluster unless I can workout some way to
upgrade lvm in centos7.