Hi everyone,
This is the second release candidate for Reef.
The Reef release comes with a new RockDB version (7.9.2) [0], which
incorporates several performance improvements and features. Our
internal testing doesn't show any side effects from the new version,
but we are very eager to hear community feedback on it. This is the
first release to have the ability to tune RockDB settings per column
family [1], which allows for more granular tunings to be applied to
different kinds of data stored in RocksDB. A new set of settings has
been used in Reef to optimize performance for most kinds of workloads
with a slight penalty in some cases, outweighed by large improvements
in use cases such as RGW, in terms of compactions and write
amplification. We would highly encourage community members to give
these a try against their performance benchmarks and use cases. The
detailed list of changes in terms of RockDB and BlueStore can be found
in https://pad.ceph.com/p/reef-rc-relnotes.
If any of our community members would like to help us with performance
investigations or regression testing of the Reef release candidate,
please feel free to provide feedback via email or in
https://pad.ceph.com/p/reef_scale_testing. For more active
discussions, please use the #ceph-at-scale slack channel in
ceph-storage.slack.com.
This RC has gone thru partial testing due to issues we are
experiencing in the sepia lab.
Please try it out and report any issues you encounter. Happy testing!
Thanks,
YuriW
Dear Ceph community,
We want to restructure (i.e. move around) a lot of data (hundreds of
terabyte) in our CephFS.
And now I was wondering what happens within snapshots when I move data
around within a snapshotted folder.
I.e. do I need to account for a lot increased storage usage due to older
snapshots differing from the new restructured state?
In the end it is just metadata changes. Are the snapshots aware of this?
Consider the following examples.
Copying data:
Let's say I have a folder /test, with a file XYZ in sub-folder
/test/sub1 and an empty sub-folder /test/sub2.
I create snapshot snapA in /test/.snap, copy XYZ to sub-folder
/test/sub2, delete it from /test/sub1 and create another snapshot snapB.
I would have two snapshots each with distinct copies of XYZ, hence using
double the space in the FS:
/test/.snap/snapA/sub1/XYZ <-- copy 1
/test/.snap/snapA/sub2/
/test/.snap/snapB/sub1/
/test/.snap/snapB/sub2/XYZ <-- copy 2
Moving data:
Let's assume the same structure.
But now after creating snapshot snapA, I move XYZ to sub-folder
/test/sub2 and then create the other snapshot snapB.
The directory tree will look the same. But how is this treated internally?
Once I move the data, will there be an actually copy created in snapA to
represent the old state?
Or will this remain the same data (like a link to the inode or so)?
And hence not double the storage used for that file.
I couldn't find (or understand) anything related to this in the docs.
The closest seems to be the hard-link section here:
https://docs.ceph.com/en/quincy/dev/cephfs-snapshots/#hard-links
Which unfortunately goes a bit over my head.
So I'm not sure if this answers my question.
Thank you all for your help. Appreciate it.
Best Wishes,
Mathias Kuhring
Hello Ceph-Users,
context or motivation of my question is S3 bucket policies and other
cases using the source IP address as condition.
I was wondering if and how RadosGW is able to access the source IP
address of clients if receiving their connections via a loadbalancer /
reverse proxy like HAProxy.
So naturally that is where the connection originates from in that case,
rendering a policy based on IP addresses useless.
Depending on whether the connection balanced as HTTP or TCP there are
two ways to carry information about the actual source:
* In case of HTTP via headers like "X-Forwarded-For". This is
apparently supported only for logging the source in the "rgw ops log" ([1])?
Or is this info used also when evaluating the source IP condition within
a bucket policy?
* In case of TCP loadbalancing, there is the proxy protocol v2. This
unfortunately seems not even supposed by the BEAST library which RGW uses.
I opened feature requests ...
** https://tracker.ceph.com/issues/59422
** https://github.com/chriskohlhoff/asio/issues/1091
** https://github.com/boostorg/beast/issues/2484
but there is no outcome yet.
Regards
Christian
[1]
https://docs.ceph.com/en/quincy/radosgw/config-ref/#confval-rgw_remote_addr…
Hi folks,
In the multisite environment, we can get one realm that contains multiple zonegroups, each in turn can have multiple zones. However, the purpose of zonegroup isn't clear to me. It seems that when a user is created, its metadata is synced to all zones within the same realm, regardless whether they are in different zonegroups or not. The same happens to buckets. Therefore, what is the purpose of having zonegroups? Wouldn't it be easier to just have realm and zones?
Thanks,Yixin
Hello guys,
We have a Ceph cluster that runs just fine with Ceph Octopus; we use RBD
for some workloads, RadosGW (via S3) for others, and iSCSI for some Windows
clients.
Recently, we had the need to add some VMWare clusters as clients for the
iSCSI GW and also Windows systems with the use of Clustered Storage Volumes
(CSV), and we are facing a weird situation. In windows for instance, the
iSCSI block can be mounted, formatted and consumed by all nodes, but when
we add in the CSV it fails with some generic exception. The same happens in
VMWare, when we try to use it with VMFS it fails.
We do not seem to find the root cause for these errors. However, the errors
seem to be linked to the situation of multiple nodes consuming the same
block by shared file systems. Have you guys seen this before?
Are we missing some basic configuration in the iSCSI GW?
Awesome, thanks for the info!
By any chance, do you happen to know what configurations you needed to
adjust to make Veeam perform a bit better?
On Fri, Jun 23, 2023 at 10:42 AM Anthony D'Atri <aad(a)dreamsnake.net> wrote:
> Yes, with someone I did some consulting for. Veeam seems to be one of the
> prevalent uses for ceph-iscsi, though I'd try to use the native RBD client
> instead if possible.
>
> Veeam appears by default to store really tiny blocks, so there's a lot of
> protocol overhead. I understand that Veeam can be configured to use "large
> blocks" that can make a distinct difference.
>
>
>
> On Jun 23, 2023, at 09:33, Work Ceph <work.ceph.user.mailing(a)gmail.com>
> wrote:
>
> Great question!
>
> Yes, one of the slowness was detected in a Veeam setup. Have you
> experienced that before?
>
> On Fri, Jun 23, 2023 at 10:32 AM Anthony D'Atri <aad(a)dreamsnake.net>
> wrote:
>
>> Are you using Veeam by chance?
>>
>> > On Jun 22, 2023, at 21:18, Work Ceph <work.ceph.user.mailing(a)gmail.com>
>> wrote:
>> >
>> > Hello guys,
>> >
>> > We have a Ceph cluster that runs just fine with Ceph Octopus; we use RBD
>> > for some workloads, RadosGW (via S3) for others, and iSCSI for some
>> Windows
>> > clients.
>> >
>> > We started noticing some unexpected performance issues with iSCSI. I
>> mean,
>> > an SSD pool is reaching 100MB of write speed for an image, when it can
>> > reach up to 600MB+ of write speed for the same image when mounted and
>> > consumed directly via RBD.
>> >
>> > Is that performance degradation expected? We would expect some
>> degradation,
>> > but not as much as this one.
>> >
>> > Also, we have a question regarding the use of Intel Turbo boost. Should
>> we
>> > disable it? Is it possible that the root cause of the slowness in the
>> iSCSI
>> > GW is caused by the use of Intel Turbo boost feature, which reduces the
>> > clock of some cores?
>> >
>> > Any feedback is much appreciated.
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users(a)ceph.io
>> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
>>
>
Hi,
yesterday I added a new zonegroup and it looks like it seems to cycle over
the same requests over and over again.
In the log of the main zone I see these requests:
2023-06-20T09:48:37.979+0000 7f8941fb3700 1 beast: 0x7f8a602f3700:
fd00:2380:0:24::136 - - [2023-06-20T09:48:37.979941+0000] "GET
/admin/log?type=metadata&id=62&period=e8fc96f1-ae86-4dc1-b432-470b0772fded&max-entries=100&&rgwx-zonegroup=b39392eb-75f8-47f0-b4f3-7d3882930b26
HTTP/1.1" 200 44 - - -
Only thing that changes is the &id.
We have two other zonegroups that are configured identical (ceph.conf and
period) and these don;t seem to spam the main rgw.
root@host:~# radosgw-admin sync status
realm 5d6f2ea4-b84a-459b-bce2-bccac338b3ef (main)
zonegroup b39392eb-75f8-47f0-b4f3-7d3882930b26 (dc3)
zone 96f5eca9-425b-4194-a152-86e310e91ddb (dc3)
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
root@host:~# radosgw-admin period get
{
"id": "e8fc96f1-ae86-4dc1-b432-470b0772fded",
"epoch": 92,
"predecessor_uuid": "5349ac85-3d6d-4088-993f-7a1d4be3835a",
"sync_status": [
"",
...
""
],
"period_map": {
"id": "e8fc96f1-ae86-4dc1-b432-470b0772fded",
"zonegroups": [
{
"id": "b39392eb-75f8-47f0-b4f3-7d3882930b26",
"name": "dc3",
"api_name": "dc3",
"is_master": "false",
"endpoints": [
],
"hostnames": [
],
"hostnames_s3website": [
],
"master_zone": "96f5eca9-425b-4194-a152-86e310e91ddb",
"zones": [
{
"id": "96f5eca9-425b-4194-a152-86e310e91ddb",
"name": "dc3",
"endpoints": [
],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 11,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD"
]
}
],
"default_placement": "default-placement",
"realm_id": "5d6f2ea4-b84a-459b-bce2-bccac338b3ef",
"sync_policy": {
"groups": []
}
},
...
--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
hi , starting upgrade from 15.2.17 i got this error
Module 'cephadm' has failed: Expecting value: line 1 column 1 (char 0)
Cluster was in health ok before starting.
Hello,
we have a Ceph 17.2.5 cluster with a total of 26 nodes, where 15 nodes that have faulty NVMe drives,
where the db/wal resides (one NVMe for the first 6 OSDs and another for the remaining 6).
We replaced them with new drives and pvmoved it to avoid losing the OSDs.
So far, there are no issues, and the OSDs are functioning properly.
ceph see the correct news disks
root@node02:/# ceph daemon osd.26 list_devices
[
{
"device": "/dev/nvme0n1",
"device_id": "INTEL_SSDPEDME016T4S_CVMD516500851P6KGN"
},
{
"device": "/dev/sdc",
"device_id": "SEAGATE_ST18000NM004J_ZR52TT830000C148JFSJ"
}
]
However, the Cephadm GUI still shows the old NVMe drives and hasn't recognized the device change.
How can we make the GUI and Cephadm recognize the new devices?
I tried restarting the managers, thinking that it would rescan the OSDs during startup, but it didn't work.
If you have any ideas, I would appreciate it.
Should I perform something like that: ceph orch daemon reconfig osd.*
Thank you for your help.