Hello!
I am looking to simplify ceph management on bare-metal by deploying Rook onto kubernetes that has been deployed on bare metal (rke). I have used rook in a cloud environment but I have not used it on bare-metal. I am wondering if anyone here runs rook in bare-metal? Would you recommend it to cephadm or would you steer clear of it?
Hello Ceph-Users,
context or motivation of my question is S3 bucket policies and other
cases using the source IP address as condition.
I was wondering if and how RadosGW is able to access the source IP
address of clients if receiving their connections via a loadbalancer /
reverse proxy like HAProxy.
So naturally that is where the connection originates from in that case,
rendering a policy based on IP addresses useless.
Depending on whether the connection balanced as HTTP or TCP there are
two ways to carry information about the actual source:
* In case of HTTP via headers like "X-Forwarded-For". This is
apparently supported only for logging the source in the "rgw ops log" ([1])?
Or is this info used also when evaluating the source IP condition within
a bucket policy?
* In case of TCP loadbalancing, there is the proxy protocol v2. This
unfortunately seems not even supposed by the BEAST library which RGW uses.
I opened feature requests ...
** https://tracker.ceph.com/issues/59422
** https://github.com/chriskohlhoff/asio/issues/1091
** https://github.com/boostorg/beast/issues/2484
but there is no outcome yet.
Regards
Christian
[1]
https://docs.ceph.com/en/quincy/radosgw/config-ref/#confval-rgw_remote_addr…
Hi folks,
In the multisite environment, we can get one realm that contains multiple zonegroups, each in turn can have multiple zones. However, the purpose of zonegroup isn't clear to me. It seems that when a user is created, its metadata is synced to all zones within the same realm, regardless whether they are in different zonegroups or not. The same happens to buckets. Therefore, what is the purpose of having zonegroups? Wouldn't it be easier to just have realm and zones?
Thanks,Yixin
Hello!
Releasing Reef
---------------------
* RC2 is out but we still have several PRs to go, including blockers.
* RC3 might be worth doing but we Reef shall go before end of the month.
Misc
-------
* For the sake of unittesting of dencoders interoperatbility we're going
to impose some extra work (like registering types within ceph-dencoder)
on developers writing encodable structs. This will be discussed further
in a CDM.
* A lab issue got fixed.
Regards
Radek
Dear Ceph community,
I'm facing an issue with ACL changes in the secondary zone of my Ceph
cluster after making modifications to the API name in the master zone of my
master zonegroup. I would appreciate any insights or suggestions on how to
resolve this problem.
Here's the background information on my setup:
- I have two clusters that are part of a single realm.
- Each cluster has a zone within a single zonegroup.
- Initially, all functionality was working perfectly fine.
However, the problem arose when I changed the API name in the master zone
of my master zonegroup. Since then, all functionalities appear to be
functioning as expected, except for ACL changes, which have become
extremely slow specifically in the secondary zone. Whenever I attempt to
change the ACL in the secondary zone, it takes approximately 10 seconds or
more for a response to be received.
I would like to understand why this delay is occurring and find a solution
to improve the performance of ACL changes in the secondary zone. Any
suggestions, explanations, or guidance would be greatly appreciated.
Thank you in advance for your help.
We had a fauly disk which was causing many errors, and replacement took a
while so we had to try to stop ceph from using the OSD in during this time.
However I think we must have done that wrong and after the disk replacement
our ceph orch seems to have picked up /dev/sdp and added the a new osd and
automatically (588), without a separate DB device (since that was still
taken by the old OSD 31 maybe? I'm not sure how to ).
This led to issues where osd31 of course wouldn't start, and some actions
were attempted to clear this out, which might have just caused more harm.
Long story short, we are currently in a odd position where we have still
have ceph-volume lvm list osd.31 with only a [db] section:
====== osd.31 ======
[db]
/dev/ceph-1b309b1e-a4a6-4861-b16c-7c06ecde1a3d/osd-db-fb09a714-f955-4418-99f2-6bccd8c6220e
block device
/dev/ceph-48f7dbd8-4a7c-4f7e-8962-104e756ae864/osd-block-33538b36-52b3-421d-bf66-6c729a057707
block uuid bykFYi-z8T6-OWXp-i1OB-H7CE-uLDm-Td6QTI
cephx lockbox secret
cluster fsid 5406fed0-d52b-11ec-beff-7ed30a54847b
cluster name ceph
crush device class None
db device
/dev/ceph-1b309b1e-a4a6-4861-b16c-7c06ecde1a3d/osd-db-fb09a714-f955-4418-99f2-6bccd8c6220e
db uuid Vy3aOA-qseQ-RIDT-741e-z7o0-y376-kKTXRE
encrypted 0
osd fsid 33538b36-52b3-421d-bf66-6c729a057707
osd id 31
osdspec affinity osd_spec
type db
vdo 0
devices /dev/nvme0n1
and a seperate extra osd.588 (which is running) which has taken only the
[block] device
===== osd.588 ======
[block]
/dev/ceph-f63ef837-3b18-47a4-be55-d5c2c0db8927/osd-block-58b33b8f-9623-46b3-a86a-3061602a76b5
block device
/dev/ceph-f63ef837-3b18-47a4-be55-d5c2c0db8927/osd-block-58b33b8f-9623-46b3-a86a-3061602a76b5
block uuid KYHzBq-zgJJ-Nw93-j7Jx-Oz5i-BMuU-ndtTCH
cephx lockbox secret
cluster fsid 5406fed0-d52b-11ec-beff-7ed30a54847b
cluster name ceph
crush device class
encrypted 0
osd fsid 58b33b8f-9623-46b3-a86a-3061602a76b5
osd id 588
osdspec affinity all-available-devices
type block
vdo 0
devices /dev/sdp
I figured the best action was to clear out both of these faulty OSDs via
orch "ceph orch osd rm XX" but osd 31 isn't recognized
[ceph: root@mimer-osd01 /]# ceph orch osd rm 31
Unable to find OSDs: ['31']
Deleting 588 is recognized. Should I attempt to clear out the osd.31 from
ceph-volume manually?
I'd really like to get back to a situation where I have osd.31 with the osd
fsid that matches the device names, with /dev/sdp and /dev/nmve0n1 but I'm
really afraid of just breaking things even more.
From what i can see from files laying around, the OSD spec we have is
simply:
placement:
host_pattern: "mimer-osd01"
service_id: osd_spec
service_type: osd
spec:
data_devices:
rotational: 1
db_devices:
rotational: 0
in case this matters. I appreciate any help or guidance.
Best regards, Mikael
Hi,
I will be deploying a Proxmox HCI cluster with 3 nodes. Each node has 3
nvme disks of 3.8Tb each and a 4th nvme disk of 7.6Tb. Technically I need
one pool.
Is it good practice to use all disks to create the one pool I need, or is
it better to create two pools, one on each group of disks?
If the former is good (use all disks and create one pool), should I take
into account the difference in disk size?
Regards.
Hi.
I have a Ceph (NVME) based cluster with 12 hosts and 40 OSD's .. currently it is backfilling pg's but I cannot get it to run more than 20 backfilling (pgs) at the same time (6+2 profile)
osd_max_backfills = 100 and osd_recovery_max_active_ssd = 50 (non-sane) but it still stops at 20 with 40+ in backfill_wait
Any idea about how to speed it up?
Thanks.
Hello guys,
We have a Ceph cluster that runs just fine with Ceph Octopus; we use RBD
for some workloads, RadosGW (via S3) for others, and iSCSI for some Windows
clients.
Recently, we had the need to add some VMWare clusters as clients for the
iSCSI GW and also Windows systems with the use of Clustered Storage Volumes
(CSV), and we are facing a weird situation. In windows for instance, the
iSCSI block can be mounted, formatted and consumed by all nodes, but when
we add in the CSV it fails with some generic exception. The same happens in
VMWare, when we try to use it with VMFS it fails.
We do not seem to find the root cause for these errors. However, the errors
seem to be linked to the situation of multiple nodes consuming the same
block by shared file systems. Have you guys seen this before?
Are we missing some basic configuration in the iSCSI GW?
Awesome, thanks for the info!
By any chance, do you happen to know what configurations you needed to
adjust to make Veeam perform a bit better?
On Fri, Jun 23, 2023 at 10:42 AM Anthony D'Atri <aad(a)dreamsnake.net> wrote:
> Yes, with someone I did some consulting for. Veeam seems to be one of the
> prevalent uses for ceph-iscsi, though I'd try to use the native RBD client
> instead if possible.
>
> Veeam appears by default to store really tiny blocks, so there's a lot of
> protocol overhead. I understand that Veeam can be configured to use "large
> blocks" that can make a distinct difference.
>
>
>
> On Jun 23, 2023, at 09:33, Work Ceph <work.ceph.user.mailing(a)gmail.com>
> wrote:
>
> Great question!
>
> Yes, one of the slowness was detected in a Veeam setup. Have you
> experienced that before?
>
> On Fri, Jun 23, 2023 at 10:32 AM Anthony D'Atri <aad(a)dreamsnake.net>
> wrote:
>
>> Are you using Veeam by chance?
>>
>> > On Jun 22, 2023, at 21:18, Work Ceph <work.ceph.user.mailing(a)gmail.com>
>> wrote:
>> >
>> > Hello guys,
>> >
>> > We have a Ceph cluster that runs just fine with Ceph Octopus; we use RBD
>> > for some workloads, RadosGW (via S3) for others, and iSCSI for some
>> Windows
>> > clients.
>> >
>> > We started noticing some unexpected performance issues with iSCSI. I
>> mean,
>> > an SSD pool is reaching 100MB of write speed for an image, when it can
>> > reach up to 600MB+ of write speed for the same image when mounted and
>> > consumed directly via RBD.
>> >
>> > Is that performance degradation expected? We would expect some
>> degradation,
>> > but not as much as this one.
>> >
>> > Also, we have a question regarding the use of Intel Turbo boost. Should
>> we
>> > disable it? Is it possible that the root cause of the slowness in the
>> iSCSI
>> > GW is caused by the use of Intel Turbo boost feature, which reduces the
>> > clock of some cores?
>> >
>> > Any feedback is much appreciated.
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users(a)ceph.io
>> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
>>
>