Hi Ceph users
We are using Ceph Pacific (16) in this specific deployment.
In our use case we do not want our users to be able to generate signature v4 URLs because they bypass the policies that we set on buckets (e.g IP restrictions).
Currently we have a sidecar reverse proxy running that filters requests with signature URL specific request parameters.
This is obviously not very efficient and we are looking to replace this somehow in the future.
1. Is there an option in RGW to disable this signed URLs (e.g returning status 403)?
2. If not is this planned or would it make sense to add it as a configuration option?
3. Or is the behaviour of not respecting bucket policies in RGW with signature v4 URLs a bug and they should be actually applied?
Thanks you for your help and let me know if you have any questions
Marc Singer
hi folks,
I currently test erasure-code-lrc (1) in a multi-room multi-rack setup.
The idea is to be able to repair a disk-failures within the rack
itself to lower bandwidth-usage
```bash
ceph osd erasure-code-profile set lrc_hdd \
plugin=lrc \
crush-root=default \
crush-locality=rack \
crush-failure-domain=host \
crush-device-class=hdd \
mapping=__DDDDD__DDDDD__DDDDD__DDDDD \
layers='
[
[ "_cDDDDD_cDDDDD_cDDDDD_cDDDDD", "" ],
[ "cDDDDDD_____________________", "" ],
[ "_______cDDDDDD______________", "" ],
[ "______________cDDDDDD_______", "" ],
[ "_____________________cDDDDDD", "" ],
]' \
crush-steps='[
[ "choose", "room", 4 ],
[ "choose", "rack", 1 ],
[ "chooseleaf", "host", 7 ],
]'
```
The roule picks 4 out of 5 rooms and keeps the PG in one rack like expected!
However it looks like the PG will not move to another Room if the PG
is undersized or the entire Room or Rack is down!
Questions:
* do I miss something to allow LRC (PG's) to move across Racks/Rooms for repair?
* Is it even possible to build such a 'Multi-stage' grushmap?
Thanks for your help,
Ansgar
1) https://docs.ceph.com/en/quincy/rados/operations/erasure-code-jerasure/
I want to perform non cephadm upgrade from Quincy to Reef. Reason for not using cephadm is do not want to go for ceph in containers.
My test deployment is as given below.
Total cluster hosts : 5
ceph-mon hosts: 3
ceph-mgr hosts: 3 (ceph-mgr active on one node, and other ceph-mgr each on ceph-mon host)
ceph-mds : 1
ceph-osd : 5 (one ceph-osd on each of the host in the cluster.)
While I try to follow the steps - https://docs.ceph.com/en/latest/releases/reef/#upgrading-non-cephadm-cluste… - on the step - Upgrade monitors by installing the new packages and restarting the monitor daemons. when I try to upgrade only ceph-mon using "apt upgrade ceph-mon" command it upgrades all packages including ceph-mgr, ceph-mds, ceph-osd etc. as ceph-mon package has dependency on these packages.
My question is - does this mean I need to upgrade all ceph packages (ceph, ceph-common) and restart only monitor daemon first? Or there is any way I can upgrade only ceph-mon pacakge first, then ceph-mgr, ceph-osd and so on?
I have implemented a ceph cluster with cephadm which has three monitors and
three OSDs
each node have one interface 192.168.0.0/24 network.
I want to change the address of the machines to the range 10.4.4.0/24.
Is there a solution for this change without data loss and failure?
i change the pubic_network in mon and change the ip node but its not worked
.
how can i sovle this problem?
```
ceph orch host ls
HOST ADDR LABELS STATUS
ceph-01 192.168.0.130 _admin,rgw
ceph-02 192.168.0.131 _admin,rgw
ceph-03 192.168.0.132 _admin,rgw
3 hosts in cluster
````
[root@ceph-01 ~]# ceph config get mon public_network
192.168.0.0/24
````
[root@ceph-01 ~]# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE
PLACEMENT
alertmanager ?:9093,9094 1/1 112s ago 9M
count:1
ceph-exporter 3/3 114s ago 8M *
crash 3/3 114s ago 9M *
grafana ?:3000 1/1 112s ago 8M
count:1
mgr 2/2 113s ago 9M
count:2
mon 3/3 114s ago 8M
count:3
node-exporter ?:9100 3/3 114s ago 9M *
osd.dashboard-admin-1685787597651 6 114s ago 8M *
prometheus ?:9095 1/1 112s ago 3M
count:1
````
Hi,
We have 2 clusters (v18.2.1) primarily used for RGW which has over 2+ billion RGW objects. They are also in multisite configuration totaling to 2 zones and we've got around 2 Gbps of bandwidth dedicated (P2P) for the multisite traffic. We see that using "radosgw-admin sync status" on the zone 2, all the 128 shards are recovering and unfortunately there is very less data transfer from primary zone ie., the link utilization is barely 100 Mbps / 2 Gbps. Our objects are quite small as well like avg. of 1 MB in size.
On further inspection, we noticed the rgw access the logs at primary site are mostly yielding "304 Not Modified" for RGWs at site-2. Is this expected? Here are some of the logs (information is redacted)
root@host-04:~# tail -f /var/log/haproxy-msync.log
Feb 12 05:06:51 host-04 haproxy[971171]: 10.1.85.14:33730 [12/Feb/2024:05:06:51.047] https~ backend/host-04-msync 0/0/0/2/2 304 143 - - ---- 56/55/1/0/0 0/0 "GET /bucket1/object1.jpg?rgwx-zonegroup=71dceb3d-3092-4dc6-897f-a9abf60c9972&rgwx-prepend-metadata=true&rgwx-sync-manifest&rgwx-sync-cloudtiered&rgwx-skip-decrypt&rgwx-if-not-replicated-to=a8204ce2-b69e-4d90-bca1-93edd05a1a29%3Abucket1%3A8b96aea5-c763-40a3-8430-efd67cff0c62.20010.7 HTTP/1.1"
Feb 12 05:06:51 host-04 haproxy[971171]: 10.1.85.14:59730 [12/Feb/2024:05:06:51.048] https~ backend/host-04-msync 0/0/0/2/2 304 143 - - ---- 56/55/3/1/0 0/0 "GET /bucket1/object91.jpg?rgwx-zonegroup=71dceb3d-3092-4dc6-897f-a9abf60c9972&rgwx-prepend-metadata=true&rgwx-sync-manifest&rgwx-sync-cloudtiered&rgwx-skip-decrypt&rgwx-if-not-replicated-to=a8204ce2-b69e-4d90-bca1-93edd05a1a29%3Abucket1%3A8b96aea5-c763-40a3-8430-efd67cff0c62.20010.7 HTTP/1.1"
We also took a look at our grafana instance and out of 1000 requests / second, 200 are "200 OK" and 800 are "304 Not Modified". Sync threads are run on only 2 rgw daemons per zone and are behind a Load Balancer. "# radosgw-admin sync error list" also contains around 20 errors which are mostly automatically recoverable.
As we understand, does it mean that RGW multisite sync logs in the log pool are yet to be generated or some sort? Please provide us some insights and let us know how to resolve this.
Thanks,
Saif
Just in case anybody is interested: Using dm-cache works and boosts
performance -- at least for my use case.
The "challenge" was to get 100 (identical) Linux-VMs started on a three
node hyperconverged cluster. The hardware is nothing special, each node
has a Supermicro server board with a single CPU with 24 cores and 4 x 4
TB hard disks. And there's that extra 1 TB NVMe...
I know that the general recommendation is to use the NVMe for WAL and
metadata, but this didn't seem appropriate for my use case and I'm still
not quite sure about failure scenarios with this configuration. So
instead I made each drive a logical volume (managed by an OSD) and added
85 GiB NVMe to each LV as read-only cache.
Each VM uses as system disk an RBD based on a snapshot from the master
image. The idea was that with this configuration, all VMs should share
most (actually almost all) of the data on their system disk and this
data should be available from the cache.
Well, it works. When booting the 100 VMs, almost all read operations are
satisfied from the cache. So I get close to NVMe speed but have payed
for conventional hard drives only (well, SSDs aren't that much more
expensive nowadays, but the hardware is 4 years old).
So, nothing sophisticated, but as I couldn't find anything about this
kind of setup, it might be of interest nevertheless.
- Michael
Hey ceph-users,
I just noticed issues with ceph-crash using the Debian /Ubuntu packages
(package: ceph-base):
While the /var/lib/ceph/crash/posted folder is created by the package
install,
it's not properly chowned to ceph:ceph by the postinst script.
This might also affect RPM based installs somehow, but I did not look
into that.
I opened a bug report with all the details and two ideas to fix this:
https://tracker.ceph.com/issues/64548
The wrong ownership causes ceph-crash to NOT work at all. I myself
missed quite a few crash reports. All of them were just sitting around
on the machines, but were reported right after I did
chown ceph:ceph /var/lib/ceph/crash/posted
systemctl restart ceph-crash.service
You might want to check if you might be affected as well.
Failing to post crashes to the local cluster results in them not being
reported back via telemetry.
Regards
Christian
Please don't drop the list from your response.
The first question coming to mind is, why do you have a cache-tier if
all your pools are on nvme decices anyway? I don't see any benefit here.
Did you try the suggested workaround and disable the cache-tier?
Zitat von Cedric <yipikai7(a)gmail.com>:
> Thanks Eugen, see attached infos.
>
> Some more details:
>
> - commands that actually hangs: ceph balancer status ; rbd -p vms ls ;
> rados -p vms_cache cache-flush-evict-all
> - all scrub running on vms_caches pgs are stall / start in a loop
> without actually doing anything
> - all io are 0 both from ceph status or iostat on nodes
>
> On Tue, Feb 20, 2024 at 10:00 AM Eugen Block <eblock(a)nde.ag> wrote:
>>
>> Hi,
>>
>> some more details would be helpful, for example what's the pool size
>> of the cache pool? Did you issue a PG split before or during the
>> upgrade? This thread [1] deals with the same problem, the described
>> workaround was to set hit_set_count to 0 and disable the cache layer
>> until that is resolved. Afterwards you could enable the cache layer
>> again. But keep in mind that the code for cache tier is entirely
>> removed in Reef (IIRC).
>>
>> Regards,
>> Eugen
>>
>> [1]
>> https://ceph-users.ceph.narkive.com/zChyOq5D/ceph-strange-issue-after-addin…
>>
>> Zitat von Cedric <yipikai7(a)gmail.com>:
>>
>> > Hello,
>> >
>> > Following an upgrade from Nautilus (14.2.22) to Pacific (16.2.13), we
>> > encounter an issue with a cache pool becoming completely stuck,
>> > relevant messages below:
>> >
>> > pg xx.x has invalid (post-split) stats; must scrub before tier agent
>> > can activate
>> >
>> > In OSD logs, scrubs are starting in a loop without succeeding for all
>> > pg of this pool.
>> >
>> > What we already tried without luck so far:
>> >
>> > - shutdown / restart OSD
>> > - rebalance pg between OSD
>> > - raise the memory on OSD
>> > - repeer PG
>> >
>> > Any idea what is causing this? any help will be greatly appreciated
>> >
>> > Thanks
>> >
>> > Cédric
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users(a)ceph.io
>> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
Hello. We have a requirement to change the hostname on some of our OSD
nodes. All of our nodes are Ubuntu 22.04 based and have been deployed
using 17.2.7 Orchestrator.
1. Is there a procedure to rename the existing node, without rebuilding
and have it detected by Ceph Orchestrator?
If not,
2. To minimize impact on cluster (rebuilding OSDs / balancing, etc) Is it
possible to REINTRODUCE existing OSDs into the cluster a the newly rebuilt
node? Is there a ceph orch process to scan local node OSDs, detect and
create OSD daemons?
Thank you.