I guess I should probably have been more clear, this is one pool of many, so the other OSDs aren't idle.
So I don't necessarily think that the PG bump would be the worst thing to try, but its definitely not as bad as I may have made it sound.
Thanks,
Reed
> On May 27, 2021, at 11:37 PM, Anthony D'Atri <anthony.datri(a)gmail.com> wrote:
>
> That gives you a PG ratio of …. 5.3 ???
>
> Run `ceph osd df` ; I wouldn’t be surprised if some of your drives have 0 PGs on them, for sure I would suspect that they aren’t even at all.
>
> There are bottlenecks in the PG code, and in the OSD code — one reason why with NVMe clusters it’s common to split each drive into at least 2 OSDs. With spinners you don’t want to do that, but you get the idea.
>
> The pg autoscaler is usually out of its Vulcan mind. 512 would give you a ratio of just 21.
>
> Prior to 12.2.1 conventional wisdom was a PG ratio of 100-200 on spinners.
>
> 2048 PGs would give you a ratio of 85, which current (retconned) guidance would call good. I’d probably go to 4096 but 2048 would be way better than 128.
>
> I strongly suspect that PG splitting would still get you done faster than the way it is, esp. if you’re running BlueStore OSDs.
>
> Try bumping pg_num up to say 262 and see how bad it is, and if when pgp_num catches up if your ingest rate isn’t a bit higher than it was before.
>
>> EC8:2, across about 16 hosts, 240 OSDs, with 24 of those being 8TB 7.2k SAS, and the other 216 being 2TB 7.2K SATA. So there are quite a few spindles in play here.
>> Only 128 PGs, in this pool, but its the only RBD image in this pool. Autoscaler recommends going to 512, but was hoping to avoid the performance overhead of the PG splits if possible, given perf is bad enough as is.
>
>
Hello,
I am trying to place the two MDS daemons for CephFS on dedicated nodes. For that purpose I tried out a few different "cephadm orch apply ..." commands with a label but at the end it looks like I messed up with the placement as I now have two mds service_types as you can see below:
# ceph orch ls --service-type mds --export
service_type: mds
service_id: ceph1fs
service_name: mds.ceph1fs
placement:
count: 2
hosts:
- ceph1g
- ceph1a
---
service_type: mds
service_id: label:mds
service_name: mds.label:mds
placement:
count: 2
This second entry at the bottom seems totally wrong and I would like to remove it but I haven't found how to remove it totally. Any ideas?
Ideally I just want to place two MDS daemons on node ceph1a and ceph1g.
Regards,
Mabi
I'm attempting to get get ceph up and running, and currently feel like I'm
going around in circles.
I'm attempting to use cephadm and Pacific, currently on debian buster,
mostly because centos7 ain't supported any more and cenotos8 ain't support
by some of my hardware.
Anyway I have a few nodes with 59x 7.2TB disks but for some reason the osd
daemons don't start, the disks get formatted and the osd are created but
the daemons never come up.
They are probably the wrong spec for ceph (48gb of memory and only 4 cores)
but I was expecting them to start and be either dirt slow or crash later,
anyway I've got upto 30 of them, so I was hoping on getting at least get
6PB of raw storage out of them.
As yet I've not spotted any helpful error messages.
This is for a archive / slow ceph cluster so I'm not expecting speed.
Thanks in advance.
Peter.
Hi,
I have removed one node, but now ceph seems to stuck in:
Degraded data redundancy: 67/2393 objects degraded (2.800%), 12 pgs
degraded, 12 pgs undersized
How to "force" rebalancing? Or should I just wait a little bit more?
Kind regards,
rok
Hi
The server run 15.2.9 and has 15 HDD and 3 SSD.
The OSDs was created with this YAML file
hdd.yml
--------
service_type: osd
service_id: hdd
placement:
host_pattern: 'pech-hd-*'
data_devices:
rotational: 1
db_devices:
rotational: 0
The result was that the 3 SSD is added to 1 VG with 15 LV on it.
# vgs | egrep "VG|dbs"
VG #PV #LV #SN Attr
VSize VFree
ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b 3 15 0 wz--n-
<5.24t 48.00m
One of the osd failed and I run rm with replace
# ceph orch osd rm 178 --replace
and the result is
# ceph osd tree | grep "ID|destroyed"
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT
PRI-AFF
178 hdd 12.82390 osd.178 destroyed 0
1.00000
But I'm not able to replace the disk with the same YAML file as shown
above.
# ceph orch apply osd -i hdd.yml --dry-run
################
OSDSPEC PREVIEWS
################
+---------+------+------+------+----+-----+
|SERVICE |NAME |HOST |DATA |DB |WAL |
+---------+------+------+------+----+-----+
+---------+------+------+------+----+-----+
I guess this is the wrong way to do it, but I can't find the answer in
the documentation.
So how can I replace this failed disk in Cephadm?
--
Kai Stian Olstad
Hello,
I have by mistake re-installed the OS of an OSD node of my Octopus cluster (managed by cephadm). Luckily the OSD data is on a separate disk and did not get affected by the re-install.
Now I have the following state:
health: HEALTH_WARN
1 stray daemon(s) not managed by cephadm
1 osds down
1 host (1 osds) down
To fix that I tried to run:
# ceph orch daemon add osd ceph1f:/dev/sda
Created no osd(s) on host ceph1f; already created?
That did not work, so I tried:
# ceph cephadm osd activate ceph1f
no valid command found; 10 closest matches:
...
Error EINVAL: invalid command
Did not work either. So I wanted to ask how can I "adopt" back an OSD disk to my cluster?
Thanks for your help.
Regards,
Mabi
After scaling the number of MDS daemons down, we now have a daemon stuck in the
"up:stopping" state. The documentation says it can take several minutes to stop the
daemon, but it has been stuck in this state for almost a full day. According to
the "ceph fs status" output attached below, it still holds information about 2
inodes, which we assume is the reason why it cannot stop completely.
Does anyone know what we can do to finally stop it?
cephfs - 71 clients
======
RANK STATE MDS ACTIVITY DNS INOS
0 active ceph-mon-01 Reqs: 0 /s 15.7M 15.4M
1 active ceph-mon-02 Reqs: 48 /s 19.7M 17.1M
2 stopping ceph-mon-03 0 2
POOL TYPE USED AVAIL
cephfs_metadata metadata 652G 185T
cephfs_data data 1637T 539T
STANDBY MDS
ceph-mon-03-mds-2
MDS version: ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)
Hi! is is (technically) possible to instruct cephfs to store files < 1Mib on a (replicate) pool
and the others files on another (ec) pool?
And even more, is it possible to take the same kind of decision on the path of the file?
(let's say that critical files with names like r"/critical_path/critical_.*" i want them in a 6x replication ssd pool)
Thank you!
Adrian
Could you share the output of
lsblk -o name,rota,size,type
from the affected osd node?
My spec file is for a tiny lab cluster, in your case the db drive size
should be something like '5T:6T' to specify a range.
How large are the HDDs? Also maybe you should use the option
'filter_logic: AND', but I'm not sure if that's already the default, I
remember that there were issues in Nautilus because the default was
OR. I tried this just recently with a version similar to this, I
believe it was 15.2.8 and it worked for me, but again, it's just a
tiny virtual lab cluster.
Zitat von Kai Stian Olstad <ceph+list(a)olstad.com>:
> On 26.05.2021 11:16, Eugen Block wrote:
>> Yes, the LVs are not removed automatically, you need to free up the
>> VG, there are a couple of ways to do so, for example remotely:
>>
>> pacific1:~ # ceph orch device zap pacific4 /dev/vdb --force
>>
>> or directly on the host with:
>>
>> pacific1:~ # cephadm ceph-volume lvm zap --destroy /dev/<CEPH_VG>/<DB_LV>
>
> Thanks,
>
> I used the cephadm command and deleted the LV and the VG now has free space
>
> # vgs | egrep "VG|dbs"
> VG #PV #LV #SN
> Attr VSize VFree
> ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b 3 14 0
> wz--n- <5.24t 357.74g
>
> But it doesn't seams to be able to use it, because it can find anyting
>
> # ceph orch apply osd -i hdd.yml --dry-run
> ################
> OSDSPEC PREVIEWS
> ################
> +---------+------+-------------+----------+----+-----+
> |SERVICE |NAME |HOST |DATA |DB |WAL |
> +---------+------+-------------+----------+----+-----+
> +---------+------+-------------+----------+----+-----+
>
> I tried adding size as you have in your configuration
> db_devices:
> rotational: 0
> size: '30G:'
>
> Still it was unable to create the OSD.
>
> If I removed the : so it is 30GB exact size, it did find the disk,
> but DB is not placed on a SSD since I do not have one with 30 GB
> exact size
> ################
> OSDSPEC PREVIEWS
> ################
> +---------+------+-------------+----------+----+-----+
> |SERVICE |NAME |HOST |DATA |DB |WAL |
> +---------+------+-------------+----------+----+-----+
> |osd |hdd |pech-hd-7 |/dev/sdt |- |- |
> +---------+------+-------------+----------+----+-----+
>
>
> To me I looks like Cephadm can't use/find the free space on the VG
> and use that as a new LV for the OSD.
>
>
> --
> Kai Stian Olstad
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
Hello togehter,
is there any best practive on the balance mode when I have a HAproxy
in front of my rgw_frontend?
currently we use "balance leastconn".
Cheers
Boris