Just making sure this makes the list:
Mac Wynkoop
---------- Forwarded message ---------
From: 胡 玮文 <huww98(a)outlook.com>
Date: Wed, Oct 7, 2020 at 9:00 PM
Subject: Re: pool pgp_num not updated
To: Mac Wynkoop <mwynkoop(a)netdepot.com>
Hi,
You can read about this behavior at
https://ceph.io/rados/new-in-nautilus-pg-merging-and-autotuning/
In short, ceph will not increase pgp_num if misplaced > 5% (by default),
and once you got misplaced < 5%, it will increase pgp_num gradually, until
reaching the value you set. This 5% can be configured by
target_max_misplaced_ratio config option.
在 2020年10月8日,03:22,Mac Wynkoop <mwynkoop(a)netdepot.com> 写道:
Well, backfilling sure, but will it allow me to actually change the pgp_num
as more space frees up? Because the issue is that I cannot modify that
value.
Thanks,
Mac Wynkoop, Senior Datacenter Engineer
*NetDepot.com:* Cloud Servers; Delivered
Houston | Atlanta | NYC | Colorado Springs
1-844-25-CLOUD Ext 806
On Wed, Oct 7, 2020 at 1:50 PM Eugen Block <eblock(a)nde.ag> wrote:
Yes, I think that’s exactly the reason. As soon as the cluster has
more space the backfill will continue.
Zitat von Mac Wynkoop <mwynkoop(a)netdepot.com>:
The cluster is currently in a warn state, here's the scrubbed output of
ceph -s:
*cluster: id: *redacted* health: HEALTH_WARN
noscrub,nodeep-scrub flag(s) set 22 nearfull osd(s)
2
pool(s) nearfull Low space hindering backfill (add storage if
this doesn't resolve itself): 277 pgs backfill_toofull
Degraded
data redundancy: 32652738/3651947772 objects degraded (0.894%), 281 pgs
degraded, 341 pgs undersized 1214 pgs not deep-scrubbed in
time
2647 pgs not scrubbed in time 2 daemons have
recently
crashed services: mon: 5 daemons, *redacted* (age 44h)
mgr:
*redacted* osd: 162 osds: 162 up (since 44h), 162 in
(since 4d); 971 remapped pgs flags noscrub,nodeep-scrub
rgw: 3 daemons active *redacted* tcmu-runner: 18 daemons
active
*redacted* data: pools: 10 pools, 2648 pgs objects: 409.56M
objects, 738 TiB usage: 1.3 PiB used, 580 TiB / 1.8 PiB avail
pgs:
32652738/3651947772 objects degraded (0.894%)
517370913/3651947772 objects misplaced (14.167%) 1677
active+clean 477 active+remapped+backfill_wait
100
active+remapped+backfill_wait+backfill_toofull 80
active+undersized+degraded+remapped+backfill_wait 60
active+undersized+degraded+remapped+backfill_wait+backfill_toofull
42 active+undersized+degraded+remapped+backfill_toofull
33
active+undersized+degraded+remapped+backfilling 25
active+remapped+backfilling 25
active+remapped+backfill_toofull 24
active+undersized+remapped+backfilling 23
active+forced_recovery+undersized+degraded+remapped+backfill_wait
19
active+forced_recovery+undersized+degraded+remapped+backfill_wait+backfill_toofull
15 active+undersized+remapped+backfill_wait 14
active+undersized+remapped+backfill_wait+backfill_toofull 12
active+forced_recovery+undersized+degraded+remapped+backfill_toofull
12 active+forced_recovery+undersized+degraded+remapped+backfilling
5 active+undersized+remapped+backfill_toofull 3
active+remapped 1 active+undersized+remapped
1
active+forced_recovery+undersized+remapped+backfilling io:
client:
287 MiB/s rd, 40 MiB/s wr, 1.94k op/s rd, 165 op/s wr recovery: 425
MiB/s, 225 objects/s*
Now as you can see, we do have a lot of backfill operations going on at
the
moment. Does that actually prevent Ceph from modifying the pgp_num value
of
a pool?
Thanks,
Mac Wynkoop
On Wed, Oct 7, 2020 at 8:57 AM Eugen Block <eblock(a)nde.ag> wrote:
What is the current cluster status, is it healthy? Maybe increasing
pg_num would hit the limit of mon_max_pg_per_osd? Can you share 'ceph
-s' output?
Zitat von Mac Wynkoop <mwynkoop(a)netdepot.com>:
Right, both Norman and I set the pg_num before the pgp_num. For
example,
here is my current pool settings:
*"pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7
crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024
pgp_num_target
2048 last_change 8458830 lfor 0/0/8445757 flags
hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576
fast_read
1 application rgw"*
So, when I set:
"*ceph osd pool set hou-ec-1.rgw.buckets.data pgp_num 2048*"
it returns:
"*set pool 40 pgp_num to 2048*"
But upon checking the pool details again:
"*pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7
crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024
pgp_num_target
2048 last_change 8458870 lfor 0/0/8445757 flags
hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576
fast_read
1 application rgw*"
and the pgp_num value does not increase. Am I just doing something
totally wrong?
Thanks,
Mac Wynkoop
On Tue, Oct 6, 2020 at 2:32 PM Marc Roos <M.Roos(a)f1-outsourcing.eu>
wrote:
pg_num and pgp_num need to be the same, not?
3.5.1. Set the Number of PGs
To set the number of placement groups in a pool, you must specify the
number of placement groups at the time you create the pool. See
Create a
Pool for details. Once you set placement groups for a pool, you can
increase the number of placement groups (but you cannot decrease the
number of placement groups). To increase the number of placement
groups,
execute the following:
ceph osd pool set {pool-name} pg_num {pg_num}
Once you increase the number of placement groups, you must also
increase
the number of placement groups for placement (pgp_num) before your
cluster will rebalance. The pgp_num should be equal to the pg_num. To
increase the number of placement groups for placement, execute the
following:
ceph osd pool set {pool-name} pgp_num {pgp_num}
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/s…
-----Original Message-----
To: norman
Cc: ceph-users
Subject: [ceph-users] Re: pool pgp_num not updated
Hi everyone,
I'm seeing a similar issue here. Any ideas on this?
Mac Wynkoop,
On Sun, Sep 6, 2020 at 11:09 PM norman <norman.kern(a)gmx.com> wrote:
Hi guys,
When I update the pg_num of a pool, I found it not worked(no
rebalanced), anyone know the reason? Pool's info:
pool 21 'openstack-volumes-rs' replicated size 3 min_size 2
crush_rule
21 object_hash rjenkins pg_num 1024 pgp_num 512 pgp_num_target 1024
autoscale_mode warn last_change 85103 lfor 82044/82044/82044 flags
hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application
rbd
removed_snaps
[1~1e6,1e8~300,4e9~18,502~3f,542~11,554~1a,56f~1d7]
pool 22 'openstack-vms-rs' replicated size 3 min_size 2 crush_rule
22
object_hash rjenkins pg_num 512 pgp_num 512 pg_num_target 256
pgp_num_target 256 autoscale_mode warn last_change 84769 lfor
0/0/55294 flags hashpspool,nodelete,selfmanaged_snaps stripe_width
0
application rbd
The pgp_num_target is set, but pgp_num not set.
I have scale out new OSDs and is backfilling before setting the
value,
is it the reason?
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send
an
email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
Hi guys,
I have stumbled over an error that I cannot solve after more than 2 days trying. Now I turn to you in hope to get some help! :)
First of all, I am totally new to Ceph and this is just a test setup for now. I have followed the instructions on the ceph.com homepage as far as I understand.
I am about to set up a Ceph storage cluster on three different nodes (running Ubuntu server 20.04). I am using Ceph Octopus.
On these 3 nodes I have installed 3 monitors, 3 managers and 12 OSDs (I do know that OSDs and Monitors on the same machines are not recommended but for this test setup I do it anyway).
So far is everything working well for me. However, when I try to create an RBD pool it fails with some strange (at least strange for me) error message:
2020-10-08T07:48:36.601+0000 7f8b9b7fe700 -1 librbd::image::GetMetadataRequest: 0x564279b9d670 handle_metadata_list: failed to retrieve image metadata: (1) Operation not permitted
2020-10-08T07:48:36.601+0000 7f8bb6af8380 -1 librbd::PoolMetadata: list: failed listing metadata: (1) Operation not permitted
2020-10-08T07:48:36.601+0000 7f8bb6af8380 -1 librbd::Config: apply_pool_overrides: failed to read pool config overrides: (1) Operation not permitted
2020-10-08T07:48:36.601+0000 7f8b9b7fe700 -1 librbd::image::ValidatePoolRequest: handle_read_rbd_info: failed to read RBD info: (1) Operation not permitted
rbd: pool already registered to a different application.
I run the commands as the Linux root user and as the Ceph user client.admin (I have turned off apparmor and other hardening things as well). The chep user client.admin has the following setup in its keyring:
[client.admin]
key = .....
caps mds = "allow *"
caps mgr = "allow *"
caps mon = "allow *"
For more information regarding permissions in the system see the attached file permissions.txt that contains the output of the ceph auth list command.
I have also attached a file (report.txt) containing the output from the ceph report command.
(NOTE: I have removed some information from the files, i.e. keys, fsid, fingerprints, uuid, and replaced them with "..." instead)
Can anyone please help me understand why I get this error and what I need to do in order to solve it?
Thanks in advanced!
Best regards,
Fredrik
Hi,
When I run `ceph orch ps` I see a couple of containers running on our MON
nodes whose names end with the `-safe` suffix, and I was wondering what
they are?
I couldn't find information about it in https://docs.ceph.com
This cluster is running Ceph 15.2.5, recently upgraded from 15.2.4
Many thanks,
Sebastian
Normally I would install ceph-common.rpm and access some rbd image via
rbdmap. What would be the best way to do this on an old el6? There is
not even a luminous el6 on download.ceph.com.
Hallo all!
I am configuring a new storage class on my Kubernetes cluster,
pointing to a pool on a Ceph cluster which was recently upgraded to
Nautilus (was Luminous).
The old storage class points to a Luminous pool in a separate cluster
and works fine. On the new one, I think I did the configuration
properly, yet when creating a volume I get this:
2020-10-07 10:00:53.849128 7f2f8c6f9700 0 -- 10.2.3.13:0/3520982056 >>
10.2.3.23:6789/0 pipe(0x7f2f780008c0 sd=3 :60192 s=1 pgs=0 cs=0 l=1
c=0x7f2f780068e0).connect protocol feature mismatch, my 27ffffffefdfbfff
< peer 27fddff8efacbfff missing 200000
Looking at
https://ceph.io/geen-categorie/feature-set-mismatch-error-on-ceph-kernel-cl…
looks like the 200000 corresponds to CEPH_FEATURE_MON_GV, which by the
way is listed here
https://github.com/ceph/ceph/pull/8214
as a feature which could/should be removed.
Things being as I described, I guess it would be safe to change the
value of the tunable, correct?
Unfortunately, I was unable to find any way to achieve this... the
trivial "ceph osd crush set-tunable mon_gv 0" does not work.
Any idea, please, how to fix my error?
Upgrading Ceph packages on Kubernetes workers (now at Luminous) would
help, may be?
Thanks!
Fulvio
--
Fulvio Galeazzi
GARR-CSD Department
skype: fgaleazzi70
tel.: +39-334-6533-250
Hello everyone,
I have single Ceph cluster with multiple pools for purposes of
Kubernetes PVs and Virtualization (biggest pool, using snapshots here).
Every client cluster has its own pool with max_bytes and max_objects
quotas in place. When quotas are reached, Ceph halts all write I/O
(which I suppose is OK).
I am trying to setup best possible monitoring to prevent reaching pool
quotas, but I am really struggling to find correct metrics Ceph is
driven by, because almost every command I try gives me different results
(rados df, rbd du, ceph health detail) and I feel like quotas are not
usable for this purpose.
After some fiddling with test pool I have somehow got it into state,
when there are some phantom objects in stats (maybe also written to
OSDs) which cannot be listed, accessed or deleted, but are counted into
pool quotas.
-> ceph version
ceph version 14.2.10 (9f0d3f5a3ce352651da4c2437689144fcbec0131) nautilus
(stable)
I have my test pool here (quotas 4k objects, 15 GiB data):
-> ceph osd pool ls detail | grep mirektest
pool 2 'mirektest' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 21360 lfor
0/10929/10927 flags hashpspool,selfmanaged_snaps max_bytes 16106127360
max_objects 4000 stripe_width 0 target_size_bytes 10737418240
application rbd
Cluster health says pool it getting full:
-> ceph health detail
HEALTH_WARN 1 pools nearfull
POOL_NEAR_FULL 1 pools nearfull
pool 'mirektest' has 3209 objects (max 4000)
pool 'mirektest' has 13 GiB (max 15 GiB)
But pool looks completely empty (no output, so I tried JSON output also):
-> rbd -p mirektest --format json ls
[]
-> rbd -p mirektest --format json du
{"images":[],"total_provisioned_size":0,"total_used_size":0}
I am not using pool snapshots:
-> rados -p mirektest lssnap
0 snaps
But there are some RADOS object in it:
-> rados -p mirektest df
POOL_NAME *USED**OBJECTS*CLONES *COPIES*MISSING_ON_PRIMARY UNFOUND
DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
mirektest *192 KiB* *3209* 0 *9627* 0
0 0 23998515 192 GiB 1221030 297 GiB 0 B 0 B
Okay, lets try to list them:
-> rados -p mirektest ls --all
rbd_directory
rbd_info
rbd_trash
Thats weird, expected more than 3 objetcs. Let's check it's sizes and
content:
-> rados -p mirektest stat rbd_directory
mirektest/rbd_directory mtime 2020-10-07 11:55:16.000000, size 0
-> rados -p mirektest stat rbd_info
mirektest/rbd_info mtime 2019-12-05 12:11:39.000000, size 19
-> rados -p mirektest get rbd_info -
overwrite validated
-> rados -p mirektest stat rbd_trash
mirektest/rbd_trash mtime 2020-10-07 11:55:17.000000, size 0
Do any of you have some idea what's happening here? I am trying to find
way how to cleanup pool without interrupting existing content in
general. Also I have probably no idea how to replicate this and it's not
the story of every pool in cluster. There is e.g. pool which was used
for Kubernetes but it has no issues:
-> rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY
UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER
COMPR
pool12 192 KiB 3 0 9
0 0 0 8710688 807 GiB 5490880 576 GiB 0
B 0 B
I don't know how to check other pools, because there is data in them .
How can I rely on quotas when in empty cluster there is already 13GiB
data and 3.2k objects (which btw corresponds because 3200 * 4MiB = 12.8
GiB).
Also is there any way how to calculate same number of objects/bytes Ceph
is using to enforce quotas?
Thanks for you time if you read it until the end, really appreciated
that! :-)
Any idea or hint is welcome.
--
Miroslav Kalina
Systems development specialist
miroslav.kalina(a)livesport.eu
+420 773 071 848
Livesport s.r.o.
Aspira Business Centre
Bucharova 2928/14a, 158 00 Praha 5
www.livesport.eu
Hi all,
We have a Ceph cluster which has been expanded from 10 to 16 nodes.
Each node has between 14 and 16 OSDs of which 2 are NVMe disks.
Most disks (except NVMe's) are 16TB large.
The expansion of 16 nodes went ok, but we've configured the system to
prevent auto balance towards the new disks (weight was set to 0) so we
could control the expansion.
We started adding 6 disks last week (1 disk on each new node) which didn't
give a lot of issues.
When the Ceph status indicated the PG degraded was almost finished, we've
added 2 disks on each node again.
All seemed to go fine, till yesterday morning... IOs towards the system
were slowing down.
Diving onto the nodes we could see that the OSD daemons are consuming the
CPU power, resulting in average CPU loads going near 10 (!).
The RGWs nor monitors nor other involved servers are having CPU issues
(except for the management server which is fighting with Prometheus), so
it's latency seems to be related to the ODS hosts.
All of the hosts are interconnected with 25Gbit connections, no bottlenecks
are reached on the network either.
Important piece of information: We are using erasure coding (6/3), and we
do have a lot of small files...
The current health detail indicates degraded health redundancy where
1192911/103387889228 objects are degraded. (1 pg degraded, 1 pg undersized).
Diving into the historic ops of an OSD we can see that the main latency is
found between the event "queued_for_pg" and "reached_pg". (Averaging +/- 3
secs)
As the system load is quite high I assume the systems are busy
recalculating the code chunks for using the new disks we've added (though
not sure), but I was wondering how I can better fine tune the system or
pinpoint the exact bottle neck.
Latency towards the disks doesn't seem an issue at first sight...
We are running Ceph 14.2.11
Who can give me some thoughts on how I can better pinpoint the bottle neck?
Thanks
Kristof
All;
I've finally gotten around to setting up iSCSI gateways on my primary production cluster, and performance is terrible.
We're talking 1/4 to 1/3 of our current solution.
I see no evidence of network congestion on any involved network link. I see no evidence CPU or memory being a problem on any involved server (MON / OSD / gateway /client).
What can I look at to tune this, preferably on the iSCSI gateways?
Thank you,
Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International, Inc.
DHilsbos(a)PerformAir.com
www.PerformAir.com
Hi guys,
When I update the pg_num of a pool, I found it not worked(no
rebalanced), anyone know the reason? Pool's info:
pool 21 'openstack-volumes-rs' replicated size 3 min_size 2 crush_rule
21 object_hash rjenkins pg_num 1024 pgp_num 512 pgp_num_target 1024
autoscale_mode warn last_change 85103 lfor 82044/82044/82044 flags
hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application rbd
removed_snaps [1~1e6,1e8~300,4e9~18,502~3f,542~11,554~1a,56f~1d7]
pool 22 'openstack-vms-rs' replicated size 3 min_size 2 crush_rule 22
object_hash rjenkins pg_num 512 pgp_num 512 pg_num_target 256
pgp_num_target 256 autoscale_mode warn last_change 84769 lfor 0/0/55294
flags hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application rbd
The pgp_num_target is set, but pgp_num not set.
I have scale out new OSDs and is backfilling before setting the value,
is it the reason?