July 2023 - ceph-users - lists.ceph.io

by Satoru Takeuchi

Hi, I have a request about docs.ceph.com. Could you provide per minor-version views on docs.ceph.com? Currently, we can select the Ceph version by using `https://docs.ceph.com/en/<version>". In this case, we can use the major version's code names (e.g., "quincy") or "latest". However, we can't use minor version numbers like "v17.2.6". It's convenient for me (and I guess for many other users, too) to be able to select the document for the version which we actually use. In my recent case, I've read the mclock's document of quincy because I use v17.2.6. However, the document has changed a lot from v17.2.6 to the quincy's latest one because of the recent mclock's rework. Thanks, Satoru

10 months, 1 week

3
3
0 0

bluestore/bluefs: A large number of unfounded read bandwidth

by yite gu

Hi, all and Igor Have a case: https://tracker.ceph.com/issues/61973, I'm not sure if it's related to this PR(https://github.com/ceph/ceph/pull/38902), but it looks very similar.

10 months, 1 week

1
0
0 0

Reef release candidate - v18.1.2

by Yuri Weinstein

Hi everyone, This is the second release candidate for Reef. The Reef release comes with a new RockDB version (7.9.2) [0], which incorporates several performance improvements and features. Our internal testing doesn't show any side effects from the new version, but we are very eager to hear community feedback on it. This is the first release to have the ability to tune RockDB settings per column family [1], which allows for more granular tunings to be applied to different kinds of data stored in RocksDB. A new set of settings has been used in Reef to optimize performance for most kinds of workloads with a slight penalty in some cases, outweighed by large improvements in use cases such as RGW, in terms of compactions and write amplification. We would highly encourage community members to give these a try against their performance benchmarks and use cases. The detailed list of changes in terms of RockDB and BlueStore can be found in https://pad.ceph.com/p/reef-rc-relnotes. If any of our community members would like to help us with performance investigations or regression testing of the Reef release candidate, please feel free to provide feedback via email or in https://pad.ceph.com/p/reef_scale_testing. For more active discussions, please use the #ceph-at-scale slack channel in ceph-storage.slack.com. This RC has gone thru partial testing due to issues we are experiencing in the sepia lab. Please try it out and report any issues you encounter. Happy testing! Thanks, YuriW

10 months, 1 week

4
6
0 0

radosgw + keystone breaks when projects have - in their names

by Andrew Bogott

I'm in the process of adding the radosgw service to our OpenStack cloud and hoping to re-use keystone for discovery and auth. Things seem to work fine with many keystone tenants, but as soon as we try to do something in a project with a '-' in its name everything fails. Here's an example, using the openstack swift cli: root@cloudcontrol2001-dev:~# OS_PROJECT_ID="testlabs" openstack container create 'makethiscontainer' +---------------+-------------------+----------------------------------------------------+ | account | container | x-trans-id | +---------------+-------------------+----------------------------------------------------+ | AUTH_testlabs | makethiscontainer | tx0000008c311dbda86c695-0064ac5fad-6927acd-default | +---------------+-------------------+----------------------------------------------------+ root@cloudcontrol2001-dev:~# OS_PROJECT_ID="service" openstack container create 'makethiscontainer' +--------------+-------------------+----------------------------------------------------+ | account | container | x-trans-id | +--------------+-------------------+----------------------------------------------------+ | AUTH_service | makethiscontainer | tx00000b341a22866f65e44-0064ac5fb7-6927acd-default | +--------------+-------------------+----------------------------------------------------+ root@cloudcontrol2001-dev:~# OS_PROJECT_ID="admin-monitoring" openstack container create 'makethiscontainer' Bad Request (HTTP 400) (Request-ID: tx00000f7326bb541b4d2a9-0064ac5fc2-6927acd-default) Before I dive into the source code, is this a known issue and/or something I can configure? Dash-named-projects work fine in keystone and seem to also work fine with standalone rados; I assume the issue is somewhere in the communication between the two. I suspected the implicit user creation code, but that seems to be working properly: # radosgw-admin user list [ "cloudvirt-canary$cloudvirt-canary", "testlabs$testlabs", "paws-dev$paws-dev", "andrewtestproject$andrewtestproject", "admin-monitoring$admin-monitoring", "taavi-test-project$taavi-test-project", "admin$admin", "taavitestproject$taavitestproject", "bastioninfra-codfw1dev$bastioninfra-codfw1dev", ] Here is the radosgw section of my ceph.conf: [client.radosgw] host = 10.192.20.9 keyring = /etc/ceph/ceph.client.radosgw.keyring rgw frontends = "civetweb port=18080" rgw_keystone_verify_ssl = false rgw_keystone_api_version = 3 rgw_keystone_url = https://openstack.codfw1dev.wikimediacloud.org:25000 rgw_keystone_accepted_roles = 'reader, admin, member' rgw_keystone_implicit_tenants = true rgw_keystone_admin_domain = default rgw_keystone_admin_project = service rgw_keystone_admin_user = swift rgw_keystone_admin_password = (redacted) rgw_s3_auth_use_keystone = true rgw_swift_account_in_url = true rgw_user_default_quota_max_objects = 4096 rgw_user_default_quota_max_size = 8589934592 And here's a debug log of a failed transaction: https://phabricator.wikimedia.org/P49539 Thanks in advance!

10 months, 1 week

1
1
0 0

Production random data not accessible(NoSuchKey)

by Jonas Nemeiksis

Hi all, I'm facing a strange problem, where from time to time there are no accessible S3 objects. I've found similar issues [1] , [2] but our clusters have already upgraded to the latest Pacific version. I have noted in the bug report https://tracker.ceph.com/issues/61716 RGW logs [3] Maybe someone has an idea what's wrong? Thanks. [1] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/WQ2F2GWI2WR… [2] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/RS2272EWAGC… [3] https://pastebin.com/ZvBdNi5j -- Jonas

10 months, 1 week

1
0
0 0

Cephadm fails to deploy loki with promtail correctly

by Sake Ceph

I'm not sure if it's a bug with Cephadm, but it looks like it. I've got Loki deployed on one machine and Promtail deployed to all machines. After creating a login, I can view only the logs on the hosts on which Loki is running. When inspecting the Promtail configuration, the configured URL for Loki is set to http://host.containers.internal:3100. Shouldn't this be configured by Cephadm and pointing to the Loki host? This looks a lot like the issues with incorrectly setting the Grafana or Prometheus URL's, bug 57018 is created for this. Should I create another bug report? And does someone know a workaround to set the correct URL for the time being? Best regards, Sake

10 months, 1 week

1
0
0 0

cephadm problem with MON deployment

by Adam Huffman

Hello I'm trying to add MONs in advance of a planned downtime. This has actually ended up removing an existing MON, which isn't helpful. The error I'm seeing is: Invalid argument: /var/lib/ceph/mon/ceph-<hostname>/store.db: does not exist (create_if_missing is false) error opening mon data directory at '/var/lib/ceph/mon/ceph-<hostname>': (22) Invalid argument It appears that the fsid is being stripped, because the directory was there. It's now in /var/lib/ceph/<fsid>/removed This appears to be similar to: https://tracker.ceph.com/issues/45167 which was closed for lack of a reproducer. The command I ran was: sudo ceph orch apply mon --placement="comma-separated hostname list" after running that with "--dry-run". Would be grateful for some advice here - I wasn't expecting to reduce the MON count. Best Wishes, Adam

10 months, 1 week

1
2
0 0

RGW dynamic resharding blocks write ops

by Eugen Block

Hi *, last week I successfully upgraded a customer cluster from Nautilus to Pacific, no real issues, their main use is RGW. A couple of hours after most of the OSDs were upgraded (the RGWs were not yet) their application software reported an error, it couldn't write to a bucket. This error occured again two days ago, in the RGW logs I found the relevant messages that resharding was happening at that time. I'm aware that this is nothing unusual, but I can't find anything helpful how to prevent this except for deactivating dynamic resharding and then manually do it during maintenance windows. We don't know yet if there's really data missing after the bucket access has recovered or not, that still needs to be investigated. Since Nautilus already had dynamic resharding enabled, I wonder if they were just lucky until now, for example resharding happened while no data was being written to the buckets. Or if resharding just didn't happen until then, I have no access to the cluster so I don't have any bucket stats available right now. I found this thread [1] about an approach how to prevent blocked IO but it's from 2019 and I don't know how far that got. There are many users/operators on this list who use RGW more than me, how do you deal with this? Are your clients better prepared for these events? Any comments are appreciated! Thanks, Eugen [1] https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/NG56XXAM5A4JONT4BG…

10 months, 1 week

3
7
0 0

Planning cluster

by Jan Marek

Hello, I have a cluster, which have this configuration: osd pool default size = 3 osd pool default min size = 1 I have 5 monitor nodes and 7 OSD nodes. I have changed a crush map to divide ceph cluster to two datacenters - in the first one will be a part of cluster with 2 copies of data and in the second one will be part of cluster with one copy - only emergency. I still have this cluster in one This cluster have a 1 PiB of raw data capacity, thus it is very expensive add a further 300TB capacity to have 2+2 data redundancy. Will it works? If I turn off the 1/3 location, will it be operational? I believe, it is a better choose, it will. And what if "die" 2/3 location? On this cluster is pool with cephfs - this is a main part of CEPH. Many thanks for your notices. Sincerely Jan Marek -- Ing. Jan Marek University of South Bohemia Academic Computer Centre Phone: +420389032080 http://www.gnu.org/philosophy/no-word-attachments.cs.html

10 months, 1 week

3
2
0 0

mon log file grows huge

by Ben

Hi, In our cluster monitors' log grows to couple GBs in days. There are quite many debug message from rocksdb, osd, mgr and mds. These should not be necessary with a well-run cluster. How could I close these logging? Thanks, Ben

10 months, 2 weeks

2
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users July 2023