I have a Nautilus cluster with 7 nodes, 210 HDDs. I recently added the 7th node with 30 OSDs which are currently rebalancing very slowly. I just noticed that the ethernet interface only negotiated a 1Gb connection, even though it has a 10Gb interface. I’m not sure why, but would like to reboot the node to get the interface back to 10Gb.
Is it ok to do this? What should I do to prep the cluster for the reboot?
Jeffrey Turmelle
International Research Institute for Climate & Society <https://iri.columbia.edu/>
The Climate School <https://climate.columbia.edu/> at Columbia University <https://columbia.edu/>
845-652-3461
Hi,
I have many 'not {deep-}scrubbed in time' and a1 PG remapped+backfilling
and I don't understand why this backfilling is taking so long.
root@hbgt-ceph1-mon3:/# ceph -s
cluster:
id: c300532c-51fa-11ec-9a41-0050569c3b55
health: HEALTH_WARN
15 pgs not deep-scrubbed in time
13 pgs not scrubbed in time
services:
mon: 3 daemons, quorum hbgt-ceph1-mon1,hbgt-ceph1-mon2,hbgt-ceph1-mon3
(age 36h)
mgr: hbgt-ceph1-mon2.nteihj(active, since 2d), standbys:
hbgt-ceph1-mon1.thrnnu, hbgt-ceph1-mon3.gmfzqm
osd: 60 osds: 60 up (since 13h), 60 in (since 13h); 1 remapped pgs
rgw: 3 daemons active (3 hosts, 2 zones)
data:
pools: 13 pools, 289 pgs
objects: 67.74M objects, 127 TiB
usage: 272 TiB used, 769 TiB / 1.0 PiB avail
pgs: 288 active+clean
1 active+remapped+backfilling
io:
client: 3.3 KiB/s rd, 1.5 MiB/s wr, 3 op/s rd, 8 op/s wr
recovery: 790 KiB/s, 0 objects/s
What can I do to understand this slow recovery (is it the backfill action ?)
Thanks you
'Jof
Hi to all and thanks for sharing your experience on ceph !
We have an easy setup with 9 osd all hdd and 3 nodes, 3 osd for each node.
We started the cluster to test how it works with hdd with default and easy bootstrap . Then we decide to add ssd and create a pool to use only ssd.
In order to have pools on hdd and pools on ssd only we edited the crushmap to add class hdd
We do not enter anything about ssd till now, nor disk or rules only add the class map to the default rule.
So i show you the rules before introducing class hdd
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule erasure-code {
id 1
type erasure
min_size 3
max_size 4
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}
rule erasure2_1 {
id 2
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}
rule erasure-pool.meta {
id 3
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}
rule erasure-pool.data {
id 4
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}
And here is the after
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type host
step emit
}
rule erasure-code {
id 1
type erasure
min_size 3
max_size 4
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
rule erasure2_1 {
id 2
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
rule erasure-pool.meta {
id 3
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
rule erasure-pool.data {
id 4
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
Just doing this triggered the misplaced of all pgs bind to EC pool.
Is that correct ? and why ?
Best regards
Alessandro Bolgia
I've setup RadosGW with STS ontop of my ceph cluster. It works great and fine but I'm also trying to setup authentication with an OpenIDConnect provider. I'm have a hard time troubleshooting issues because the radosgw log file doesn't have much information in it. For example when I try to use the `sts:AssumeRoleWithWebIdentity` API it fails with `{'Code': 'AccessDenied', ...}` and all I see is the beat log showing an HTTP 403.
Is there a way to enable more verbose logging so I can see what is failing and why I'm getting certain errors with STS, S3, or IAM apis?
My ceph.conf looks like this for each node (mildly redacted):
```
[client.radosgw.pve4]
host = pve4
keyring = /etc/pve/priv/ceph.client.radosgw.keyring
log file = /var/log/ceph/client.radosgw.$host.log
rgw_dns_name = s3.lab
rgw_frontends = beast endpoint=0.0.0.0:7480 ssl_endpoint=0.0.0.0:443 ssl_certificate=/etc/pve/priv/ceph/s3.lab.crt ssl_private_key=/etc/pve/priv/ceph/s3.lab.key
rgw_sts_key = 1111111111111111
rgw_s3_auth_use_sts = true
rgw_enable_apis = s3, s3website, admin, sts, iam
```
Hello
Looking to get some official guidance on PG and PGP sizing.
Is the goal to maintain approximately 100 PGs per OSD per pool or for the
cluster general?
Assume the following scenario:
Cluster with 80 OSD across 8 nodes;
3 Pools:
- Pool1 = Replicated 3x
- Pool2 = Replicated 3x
- Pool3 = Erasure Coded 6-4
Assuming the well published formula:
Let (Target PGs / OSD) = 100
[ (Target PGs / OSD) * (# of OSDs) ] / (Replica Size)
- Pool1 = (100*80)/3 = 2666.67 => 4096
- Pool2 = (100*80)/3 = 2666.67 => 4096
- Pool3 = (100*80)/10 = 800 => 1024
Total cluster would have 9216 PGs and PGPs.
Are there any implications (performance / monitor / MDS / RGW sizing) with
how many PGs are created on the cluster?
Looking for validation and / or clarification of the above.
Thank you.
The latest version of quincy seems to be having problems cleaning up multipart fragments from canceled uploads.
The bucket is empty:
% s3cmd -c .s3cfg ls s3://warp-benchmark
%
However, it's got 11TB of data and 700k objects.
# radosgw-admin bucket stats --bucket=warp-benchmark
{
"bucket": "warp-benchmark",
"num_shards": 10,
"tenant": "",
"zonegroup": "6be863e8-a9f2-42c9-b114-c8651b1f1afa",
"placement_rule": "ssd.ec63",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
"id": "aa099b5e-01d5-4394-b287-df99a4d63298.18924.1",
"marker": "aa099b5e-01d5-4394-b287-df99a4d63298.37403.1",
"index_type": "Normal",
"owner": "warp_benchmark",
"ver": "0#5580404,1#5593184,2#5586262,3#5591427,4#5591937,5#5588120,6#5589760,7#5582923,8#5579062,9#5578699",
"master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0",
"mtime": "0.000000",
"creation_time": "2023-02-10T21:45:12.721604Z",
"max_marker": "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#",
"usage": {
"rgw.main": {
"size": 12047620866048,
"size_actual": 12047620866048,
"size_utilized": 12047620866048,
"size_kb": 11765254752,
"size_kb_actual": 11765254752,
"size_kb_utilized": 11765254752,
"num_objects": 736113
}
},
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
}
}
A bucket list shows that they are all multipart fragments
# radosgw-admin bucket list --bucket=warp-benchmark
[
... (LOTS OF THESE)
{
"name": "_multipart_(2F3(gCS/1.GagoUCrCRqawswb6.rnd.tg1efLm7-es41Xg3i-Nm6bYjS-c-No79.12",
"instance": "",
"ver": {
"pool": 20,
"epoch": 30984
},
"locator": "",
"exists": "true",
"meta": {
"category": 1,
"size": 16777216,
"mtime": "2023-02-16T00:03:01.586472Z",
"etag": "e7475bca6a58de35648ca5f25d6653bf",
"storage_class": "",
"owner": "warp_benchmark",
"owner_display_name": "Warp Benchmark",
"content_type": "",
"accounted_size": 16777216,
"user_data": "",
"appendable": "false"
},
"tag": "_YdopX7yxnVrvg2h35MIQGN3vsPyZx5W",
"flags": 0,
"pending_map": [],
"versioned_epoch": 0
}
]
Note that the timestamp is from 2 weeks ago so a lifecycle policy of "cleanup after 1 day" should delete them.
cat cleanup-multipart.xml
<LifecycleConfiguration>
<Rule>
<ID>abort-multipart-rule</ID>
<Filter>
<Prefix></Prefix>
</Filter>
<Status>Enabled</Status>
<AbortIncompleteMultipartUpload>
<DaysAfterInitiation>1</DaysAfterInitiation>
</AbortIncompleteMultipartUpload>
</Rule>
</LifecycleConfiguration>
% s3cmd dellifecycle s3://warp-benchmark
s3://warp-benchmark/: Lifecycle Policy deleted
% s3cmd setlifecycle cleanup-multipart.xml s3://warp-benchmark
s3://warp-benchmark/: Lifecycle Policy updated
A secondary problem is that the lifecycle policy never runs automatically and is stuck in the UNINITIAL state. This problem is for another day of debugging.
# radosgw-admin lc list
[
{
"bucket": ":warp-benchmark:aa099b5e-01d5-4394-b287-df99a4d63298.37403.1",
"started": "Thu, 01 Jan 1970 00:00:00 GMT",
"status": "UNINITIAL"
}
]
However, it can be started manually
# radosgw-admin lc process
# radosgw-admin lc list
[
{
"bucket": ":warp-benchmark:aa099b5e-01d5-4394-b287-df99a4d63298.37403.1",
"started": "Wed, 01 Mar 2023 17:35:27 GMT",
"status": "COMPLETE"
}
]
This has no effect on the bucket and the bucket stats show the exact same size and object count (output omitted for brevity).
Running a gc pass also has no effect
# radosgw-admin gc list
[]
# radosgw-admin gc process
# radosgw-admin gc list
[]
Any ideas?
Hello
We are planning to start QE validation release next week.
If you have PRs that are to be part of it, please let us know by
adding "needs-qa" for 'quincy' milestone ASAP.
Thx
YuriW