Hi folks,
Originally our osd tree looked like this:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT
PRI-AFF
-1 2073.15186 root default
-14 176.63100 rack s01-rack
-19 176.63100 host s01
<snip osds>
-15 171.29900 rack s02-rack
-20 171.29900 host s02
<snip osds>
etc. You get the idea. It was a legacy thing as we've been upgrading
this cluster
since probably firefly, and started with way less hardware.
The crush rule was set up like this originally:
step take default
step chooseleaf firstn 0 type rack
which we have modified to
step take default
step chooseleaf firstn 0 type host
taking advantage of chooseleaf's behavior (eg searching in depth instead
of just
a single level).
Now we thought we could get rid of the rack buckets simply by moving the
host buckets to the root using "ceph osd crush move s01 root=default",
however
this resulted in a bunch of data movement.
Swapping the IDs manually in the crushmap seems to work (verified via
crushtool's
--compare), eg. changing the ID of s01 to s01-rack's and vice versa,
including
all shadow trees.
Looking around I saw that there is a swap-bucket command but that does
not swap
the IDs just bucket contents, so would result in data movement.
Other than manually editing the crushmap is there a better way to
achieve this?
Is this way the most optimal?
Cheers,
Zoltan
Before I write something that's already been done, are there any built in
utilities or tools that can tell me if it's safe to reboot a host? I'm
looking for something better than just checking the health status, but
rather checking pg status and ensuring that a reboot wouldn't take any
undersized PGs offline. Thanks.
-Brett
Hi,
after adding an OSD to Ceph it is adviseable to create a relevant entry
in Crush map using a weight size depending on disk size.
Example:
ceph osd crush set osd.<id> <weight> root=default host=<hostname>
Question:
How is the weight defined depending on disk size?
Which algorithm can be used to calculate the weight?
From my first Ceph installation (luminous) the Crush map entry for
- HDD device with 1.80TB size (output of lsscsi -s): weight was 1.627229
- NVMI device with 3.20TB size (output of lsscsi -s): weight was 2.910889
THX
I observe the same issue after adding two new OSD hosts to an almost empty mimic cluster.
> Let's try to restrict discussion to the original thread
> "backfill_toofull while OSDs are not full" and get a tracker opened up
> for this issue.
Is this the issue you are referring to: https://tracker.ceph.com/issues/41255 ?
I have a number of larger rebalance operations ahead and will probably see this for a couple of days. If there is any information (logs etc.) I can provide, please let me know. Status right now is:
[root@ceph-01 ~]# ceph status
cluster:
id: e4ece518-f2cb-4708-b00f-b6bf511e91d9
health: HEALTH_ERR
15227159/90990337 objects misplaced (16.735%)
Degraded data redundancy (low space): 64 pgs backfill_toofull
too few PGs per OSD (29 < min 30)
services:
mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
mgr: ceph-01(active), standbys: ceph-03, ceph-02
mds: con-fs-1/1/1 up {0=ceph-12=up:active}, 1 up:standby-replay
osd: 208 osds: 208 up, 208 in; 273 remapped pgs
data:
pools: 7 pools, 790 pgs
objects: 9.45 M objects, 17 TiB
usage: 21 TiB used, 1.4 PiB / 1.4 PiB avail
pgs: 15227159/90990337 objects misplaced (16.735%)
517 active+clean
190 active+remapped+backfill_wait
64 active+remapped+backfill_wait+backfill_toofull
19 active+remapped+backfilling
io:
client: 893 KiB/s rd, 6.3 MiB/s wr, 208 op/s rd, 306 op/s wr
recovery: 298 MiB/s, 156 objects/s
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hi,
I have created OSD on HDD w/o putting DB on faster drive.
In order to improve performance I have now a single SSD drive with 3.8TB.
I modified /etc/ceph/ceph.conf by adding this in [global]:
bluestore_block_db_size = 53687091200
This should create RockDB with size 50GB.
Then I tried to move DB to a new device (SSD) that is not formatted:
root@ld5505:~# ceph-bluestore-tool bluefs-bdev-new-db –-path
/var/lib/ceph/osd/ceph-76 --dev-target /dev/sdbk
too many positional options have been specified on the command line
Checking the content of /var/lib/ceph/osd/ceph-76 it appears that
there's no link to block.db:
root@ld5505:~# ls -l /var/lib/ceph/osd/ceph-76/
insgesamt 52
-rw-r--r-- 1 ceph ceph 418 Aug 27 11:08 activate.monmap
lrwxrwxrwx 1 ceph ceph 93 Aug 27 11:08 block ->
/dev/ceph-8cd045dc-9eb2-47ad-9668-116cf425a66a/osd-block-9c51bde1-3c75-4767-8808-f7e7b58b8f97
-rw-r--r-- 1 ceph ceph 2 Aug 27 11:08 bluefs
-rw-r--r-- 1 ceph ceph 37 Aug 27 11:08 ceph_fsid
-rw-r--r-- 1 ceph ceph 37 Aug 27 11:08 fsid
-rw------- 1 ceph ceph 56 Aug 27 11:08 keyring
-rw-r--r-- 1 ceph ceph 8 Aug 27 11:08 kv_backend
-rw-r--r-- 1 ceph ceph 21 Aug 27 11:08 magic
-rw-r--r-- 1 ceph ceph 4 Aug 27 11:08 mkfs_done
-rw-r--r-- 1 ceph ceph 41 Aug 27 11:08 osd_key
-rw-r--r-- 1 ceph ceph 6 Aug 27 11:08 ready
-rw-r--r-- 1 ceph ceph 3 Aug 27 11:08 require_osd_release
-rw-r--r-- 1 ceph ceph 10 Aug 27 11:08 type
-rw-r--r-- 1 ceph ceph 3 Aug 27 11:08 whoami
root@ld5505:~# more /var/lib/ceph/osd/ceph-76/bluefs
1
Questions:
How can I add DB device for every single existing OSD to this new SSD drive?
How can I increase the DB size later in case it's insufficient?
THX
Hi,
I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.
my ceph health status showing warning .
"ceph health"
HEALTH_WARN Degraded data redundancy: 1197023/7723191 objects degraded
(15.499%)
"ceph health detail"
HEALTH_WARN Degraded data redundancy: 1197128/7723191 objects degraded
(15.500%)
PG_DEGRADED Degraded data redundancy: 1197128/7723191 objects degraded
(15.500%)
pg 2.0 is stuck undersized for 1076.454929, current state
active+undersized+
pg 2.2 is stuck undersized for 1076.456639, current state
active+undersized+
pg 2.3 is stuck undersized for 1076.456113, current state
active+undersized+
pg 2.7 is stuck undersized for 1076.456342, current state
active+undersized+
pg 2.8 is stuck undersized for 1076.455920, current state
active+undersized+
pg 2.a is stuck undersized for 1076.486412, current state
active+undersized+
pg 2.b is stuck undersized for 1076.485975, current state
active+undersized+
pg 2.f is stuck undersized for 1076.486953, current state
active+undersized+
pg 2.10 is stuck undersized for 1076.486763, current state
active+undersized
pg 2.12 is stuck undersized for 1076.486539, current state
active+undersized
pg 2.13 is stuck undersized for 1075.419199, current state
active+undersized
pg 2.17 is stuck undersized for 1076.455424, current state
active+undersized
pg 2.18 is stuck undersized for 1075.419639, current state
active+undersized
pg 2.1a is stuck undersized for 1076.455966, current state
active+undersized
pg 2.1b is stuck undersized for 1076.486677, current state
active+undersized
pg 2.1f is stuck undersized for 1076.455572, current state
active+undersized
how to bring it health status OK
regards
Amudhan
We're mainly using CephFS using the Centos/Rhel 7 kernel client and I'm
pondering if I should go for bluestore compression mode" = passive or
aggressive with this client to get compression on (preferably) only
compressible objects.
Is there any list of CephFS clients that send compressible hints?
If not, is there any other way to detect this, other than checking a client
by just writing some compressible and some uncompressible data?
The documentation is rather sparse on this matter, as far as I can see.
I was a little bit afraid I would be deleting this snapshot without
result. How do I fix this error (pg repair is not working)
pg 17.36 is active+clean+inconsistent, acting [7,29,12]
2019-08-30 10:40:04.580470 7f9b3f061700 -1 log_channel(cluster) log
[ERR] : repair 17.36
17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:head : expected
clone 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4 1 missing
Hi,
I've upgraded to Nautilus from Mimic a while ago and enabled the pg_autoscaler.
When pg_autoscaler was activated I got a HEALTH_WARN regarding:
POOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_bytes
Pools ['cephfs_data_reduced', 'cephfs_data', 'cephfs_metadata'] overcommit available storage by 1.460x due to target_size_bytes 0 on pools []
POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_ratio
Pools ['cephfs_data_reduced', 'cephfs_data', 'cephfs_metadata'] overcommit available storage by 1.460x due to target_size_ratio 0.000 on pools []
Both target_size_bytes and target_size_ratio on all the pools are set to 0, so I started to wonder why this error message appear.
My autoscale-status looks like this:
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
cephfs_metadata 16708M 4.0 34465G 0.0019 1.0 8 warn
cephfs_data_reduced 15506G 2.0 34465G 0.8998 1.0 375 warn
cephfs_data 6451G 3.0 34465G 0.5616 1.0 250 warn
So the ratio in total is 1.4633..
Isn't 1.0 of the combined ratio of all pools equal of full?
I also enabled the Dashboard and saw that the PG Status showed "645% clean" PG's.
This cluster was originally installed with version Jewel, so may it be any legacy setting or such that causing this?