Hi all!
The cluster was installed before device classes were a thing so as a
prepartion to install some SSDs into a Ceph cluster with OSDs on 7 maschines
I migrated all replicated pools to a CRUSH rule with device-class set. Lots
of misplaced objects (probably because of changed ids in the CRUSH tree) but
the cluster was HEALTH_OK the whole time.
I'm now testing the migration of a 4+2 erasure coded pool to also include
the device-class. Here are the old and new CRUSH rules:
# Old
rule default.rgw.buckets.data {
id 3
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}
# New
rule default.rgw.buckets.data_hdd {
id 4
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
After changing the rule with
ceph osd pool set default.rgw.buckets.data crush_rule default.rgw.buckets.data_hdd
I'm getting quite a few degraded placement groups and I'm not sure if I still have
a redundant setup or not during the migration:
[root@master1 tmp]# ceph version
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
[root@master1 tmp]# ceph -s
cluster:
id: 3b46f93c-788a-11e9-bc8c-bcaec503b525
health: HEALTH_WARN
Degraded data redundancy: 4903/15474 objects degraded (31.685%), 12 pgs degraded
services:
mon: 5 daemons, quorum master1.dev,master2.dev,master3.dev,master4.dev,master5.dev (age 10m)
mgr: master3.dev(active, since 8m), standbys: master5.dev
mds: 1/1 daemons up, 1 standby
osd: 14 osds: 14 up (since 8m), 14 in (since 9m)
rgw: 4 daemons active (4 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 21 pools, 586 pgs
objects: 2.70k objects, 365 MiB
usage: 281 GiB used, 279 GiB / 560 GiB avail
pgs: 4903/15474 objects degraded (31.685%)
7054/15474 objects misplaced (45.586%)
554 active+clean
20 active+recovering
12 active+recovery_wait+degraded
io:
recovery: 545 KiB/s, 18 objects/s
[root@master1 tmp]# ceph health detail
HEALTH_WARN Degraded data redundancy: 4903/15474 objects degraded (31.685%), 12 pgs degraded
[WRN] PG_DEGRADED: Degraded data redundancy: 4903/15474 objects degraded (31.685%), 12 pgs degraded
pg 19.0 is active+recovery_wait+degraded, acting [9,2,3,13,11,5]
pg 19.2 is active+recovery_wait+degraded, acting [2,6,0,11,7,12]
pg 19.5 is active+recovery_wait+degraded, acting [5,1,0,11,4,12]
pg 19.6 is active+recovery_wait+degraded, acting [1,6,3,8,7,11]
pg 19.7 is active+recovery_wait+degraded, acting [13,6,10,3,7,8]
pg 19.b is active+recovery_wait+degraded, acting [4,1,13,6,11,9]
pg 19.c is active+recovery_wait+degraded, acting [8,6,11,7,13,2]
pg 19.e is active+recovery_wait+degraded, acting [6,8,0,11,12,4]
pg 19.f is active+recovery_wait+degraded, acting [5,10,13,1,0,4]
pg 19.10 is active+recovery_wait+degraded, acting [3,11,5,9,7,1]
pg 19.1a is active+recovery_wait+degraded, acting [7,0,8,1,11,5]
pg 19.1f is active+recovery_wait+degraded, acting [7,5,8,3,1,11]
[root@master1 tmp]# ceph osd pool list detail | grep default.rgw.buckets.data
pool 19 'default.rgw.buckets.data' erasure profile default.rgw.buckets.data size 6 min_size 5 crush_rule 4 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode off last_change 174 flags hashpspool stripe_width 16384 pg_num_min 32 target_size_ratio 69.8 application rgw
The output of ceph health detail looks to me as if all the six chunks of the
erasure coded data are still there as expected.
But why are these placement groups shown as degraded? I'd expect only
misplaced object chunks in this scenario the same as with replicated pools.
Is there actually any reduction in redundancy or not?
The cluster recovers after some time. I've also tested with more (256)
placement groups but the outcome is the same.
Thanks,
LF.
--
Lars Fenneberg, lf(a)elemental.net
I had many movement in my cluster. Broken node, replacement, rebalancing.
Noy I'm stuck in upgrade to 18.2.0 (mgr and mon upgraded) and the
cluster is in "Global Recovery Event"
The health is OK
I don't know how to search for the problem
Hello Everyone,
Recently, an issue related to inconsistency in the output of
"ceph config dump" command was reported. The inconsistency
is between the normal (non-pretty-print) and pretty-print
outputs. The non-pretty print output displays the localized
option name whereas the pretty-print output displays the
normalized option name. For e.g.,
Normalized: mgr/dashboard/ssl_server_port
Localized: mgr/dashboard/x/ssl_server_port
The fix ensures that the localized option name is shown in all
cases. The issue is tracked in https://tracker.ceph.com/issues/62379
and the fix is not yet merged.
This is to give a heads up in case you have any kind of automation
that relies on the pretty-printed output (json, xml). This fix would soon
be made available in upstream and downstream branches.
If you have any concerns around this change, please let us know.
Thanks,
-Sridhar
I'd like to try reef, but we are on debian 11 (bullseye).
In the ceph repos, there is debian-quincy/bullseye and
debian-quincy/focal, but under reef there is only focal & jammy.
Is there a reason why there is no reef/bullseye build? I had thought
that the blocker only affected debian-bookworm builds.
Thanks, Chris
Hi, guys:
I'm using ceph 14 on HDD, and observed obvious high latency for
pg.lock(). Further inspection shows the root cause seems to be the
function pgbackend->objects_read_sync() called in
PrimaryLogPG::do_read() which will hold the pg lock until the disk
read finish.
My question is why not use aio for read like what we are doing for
write in bluestore? Is there any known problem for aio read in osd?
Thanks in advance,
Xinying Song
Dear Ceph users,
I see that the OSD page of the Ceph dashboard offers three possibilities
for "removing" an OSD: delete, destroy and purge. The delete operation
has the possibility to flag the "Preserve OSD ID(s) for replacement."
option. I searched for explanations of the differences between the three
commands but I didn't find anything definitive, so I'd need some help
with this.
Thanks in advance,
Nicola
We're very happy to announce the first stable release of the Reef series.
We express our gratitude to all members of the Ceph community who
contributed by proposing pull requests, testing this release,
providing feedback, and offering valuable suggestions.
Major Changes from Quincy:
- RADOS: RocksDB has been upgraded to version 7.9.2.
- RADOS: There have been significant improvements to RocksDB iteration
overhead and performance.
- RADOS: The perf dump and perf schema commands have been deprecated
in favor of the new counter dump and counter schema commands.
- RADOS: Cache tiering is now deprecated.
- RADOS: A new feature, the "read balancer", is now available, which
allows users to balance primary PGs per pool on their clusters.
- RGW: Bucket resharding is now supported for multi-site configurations.
- RGW: There have been significant improvements to the stability and
consistency of multi-site replication.
- RGW: Compression is now supported for objects uploaded with
Server-Side Encryption.
- Dashboard: There is a new Dashboard page with improved layout.
Active alerts and some important charts are now displayed inside
cards.
- RBD: Support for layered client-side encryption has been added.
- Telemetry: Users can now opt in to participate in a leaderboard in
the telemetry public dashboards.
We encourage you to read the full release notes at
https://ceph.io/en/news/blog/2023/v18-2-0-reef-released/
Getting Ceph
------------
* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-18.2.0.tar.gz
* Containers at https://quay.io/repository/ceph/ceph
* For packages, see https://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: 5dd24139a1eada541a3bc16b6941c5dde975e26d
Did you know? Every Ceph release is built and tested on resources
funded directly by the non-profit Ceph Foundation.
If you would like to support this and our other efforts, please
consider joining now https://ceph.io/en/foundation/.
Hello all,
"radosgw-admin sync error list" returns errors from 2022. I want to
clear those out.
I tried "radosgw-admin sync error trim" but it seems to do nothing.
The man page seems to offer no suggestions
https://docs.ceph.com/en/quincy/man/8/radosgw-admin/
Any ideas what I need to do to remove old errors? (or at least I want
to see more recent errors)
ceph version 17.2.6 (quincy)
Thanks.
Hi - I have a 4 node cluster, and started to have some odd access issues to my file system "Home"
When I started investigating, saw the message "1 MDSs behind on trimming", but I also noticed that I seem to have 2 MDSs running on each server - 3 Daemons up, with 5 standby. Is this expected behavior after the upgrade to 18.2? or did something go wrong?
[root@cube ~]# ceph status
cluster:
id: fe3a7cb0-69ca-11eb-8d45-c86000d08867
health: HEALTH_WARN
1 filesystem is degraded
1 MDSs behind on trimming
services:
mon: 3 daemons, quorum rhel1,cube,hiho (age 23m)
mgr: hiho.bphqff(active, since 23m), standbys: rhel1.owrvaz, cube.sdhftu
mds: 3/3 daemons up, 5 standby
osd: 16 osds: 16 up (since 23m), 16 in (since 26h)
rgw: 4 daemons active (4 hosts, 1 zones)
data:
volumes: 0/1 healthy, 1 recovering
pools: 12 pools, 769 pgs
objects: 3.64M objects, 3.1 TiB
usage: 17 TiB used, 49 TiB / 65 TiB avail
pgs: 765 active+clean
4 active+clean+scrubbing+deep
io:
client: 154 MiB/s rd, 38 op/s rd, 0 op/s wr
[root@cube ~]# ceph health detail
HEALTH_WARN 1 filesystem is degraded; 1 MDSs behind on trimming
[WRN] FS_DEGRADED: 1 filesystem is degraded
fs home is degraded
[WRN] MDS_TRIM: 1 MDSs behind on trimming
mds.home.story.sodtjs(mds.0): Behind on trimming (5546/128) max_segments: 128, num_segments: 5546
[root@cube ~]# ceph fs status home
home - 10 clients
====
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 replay home.story.sodtjs<http://home.story.sodtjs> 802k 766k 36.7k 0
1 resolve home.cube.xljmfz<http://home.cube.xljmfz> 735k 680k 39.0k 0
2 resolve home.rhel1.nwpmbg<http://home.rhel1.nwpmbg> 322k 316k 17.5k 0
POOL TYPE USED AVAIL
home.meta metadata 361G 14.9T
home.data data 9206G 14.9T
STANDBY MDS
home.rhel1.ffrufi<http://home.rhel1.ffrufi>
home.hiho.mssdyh<http://home.hiho.mssdyh>
home.cube.kmpbku<http://home.cube.kmpbku>
home.hiho.cfuswn<http://home.hiho.cfuswn>
home.story.gmieio<http://home.story.gmieio>
MDS version: ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef (stable)