Hi,
I have a request about docs.ceph.com. Could you provide per minor-version views
on docs.ceph.com? Currently, we can select the Ceph version
by using `https://docs.ceph.com/en/<version>". In this case, we can
use the major
version's code names (e.g., "quincy") or "latest". However, we can't
use minor version
numbers like "v17.2.6". It's convenient for me (and I guess for many
other users, too)
to be able to select the document for the version which we actually use.
In my recent case, I've read the mclock's document of quincy because I
use v17.2.6.
However, the document has changed a lot from v17.2.6 to the quincy's latest one
because of the recent mclock's rework.
Thanks,
Satoru
Hi everyone,
This is the second release candidate for Reef.
The Reef release comes with a new RockDB version (7.9.2) [0], which
incorporates several performance improvements and features. Our
internal testing doesn't show any side effects from the new version,
but we are very eager to hear community feedback on it. This is the
first release to have the ability to tune RockDB settings per column
family [1], which allows for more granular tunings to be applied to
different kinds of data stored in RocksDB. A new set of settings has
been used in Reef to optimize performance for most kinds of workloads
with a slight penalty in some cases, outweighed by large improvements
in use cases such as RGW, in terms of compactions and write
amplification. We would highly encourage community members to give
these a try against their performance benchmarks and use cases. The
detailed list of changes in terms of RockDB and BlueStore can be found
in https://pad.ceph.com/p/reef-rc-relnotes.
If any of our community members would like to help us with performance
investigations or regression testing of the Reef release candidate,
please feel free to provide feedback via email or in
https://pad.ceph.com/p/reef_scale_testing. For more active
discussions, please use the #ceph-at-scale slack channel in
ceph-storage.slack.com.
This RC has gone thru partial testing due to issues we are
experiencing in the sepia lab.
Please try it out and report any issues you encounter. Happy testing!
Thanks,
YuriW
I'm in the process of adding the radosgw service to our OpenStack cloud
and hoping to re-use keystone for discovery and auth. Things seem to
work fine with many keystone tenants, but as soon as we try to do
something in a project with a '-' in its name everything fails.
Here's an example, using the openstack swift cli:
root@cloudcontrol2001-dev:~# OS_PROJECT_ID="testlabs" openstack
container create 'makethiscontainer'
+---------------+-------------------+----------------------------------------------------+
| account | container |
x-trans-id |
+---------------+-------------------+----------------------------------------------------+
| AUTH_testlabs | makethiscontainer |
tx0000008c311dbda86c695-0064ac5fad-6927acd-default |
+---------------+-------------------+----------------------------------------------------+
root@cloudcontrol2001-dev:~# OS_PROJECT_ID="service" openstack container
create 'makethiscontainer'
+--------------+-------------------+----------------------------------------------------+
| account | container |
x-trans-id |
+--------------+-------------------+----------------------------------------------------+
| AUTH_service | makethiscontainer |
tx00000b341a22866f65e44-0064ac5fb7-6927acd-default |
+--------------+-------------------+----------------------------------------------------+
root@cloudcontrol2001-dev:~# OS_PROJECT_ID="admin-monitoring" openstack
container create 'makethiscontainer'
Bad Request (HTTP 400) (Request-ID:
tx00000f7326bb541b4d2a9-0064ac5fc2-6927acd-default)
Before I dive into the source code, is this a known issue and/or
something I can configure? Dash-named-projects work fine in keystone and
seem to also work fine with standalone rados; I assume the issue is
somewhere in the communication between the two. I suspected the implicit
user creation code, but that seems to be working properly:
# radosgw-admin user list
[
"cloudvirt-canary$cloudvirt-canary",
"testlabs$testlabs",
"paws-dev$paws-dev",
"andrewtestproject$andrewtestproject",
"admin-monitoring$admin-monitoring",
"taavi-test-project$taavi-test-project",
"admin$admin",
"taavitestproject$taavitestproject",
"bastioninfra-codfw1dev$bastioninfra-codfw1dev",
]
Here is the radosgw section of my ceph.conf:
[client.radosgw]
host = 10.192.20.9
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw frontends = "civetweb port=18080"
rgw_keystone_verify_ssl = false
rgw_keystone_api_version = 3
rgw_keystone_url = https://openstack.codfw1dev.wikimediacloud.org:25000
rgw_keystone_accepted_roles = 'reader, admin, member'
rgw_keystone_implicit_tenants = true
rgw_keystone_admin_domain = default
rgw_keystone_admin_project = service
rgw_keystone_admin_user = swift
rgw_keystone_admin_password = (redacted)
rgw_s3_auth_use_keystone = true
rgw_swift_account_in_url = true
rgw_user_default_quota_max_objects = 4096
rgw_user_default_quota_max_size = 8589934592
And here's a debug log of a failed transaction:
https://phabricator.wikimedia.org/P49539
Thanks in advance!
I'm not sure if it's a bug with Cephadm, but it looks like it. I've got Loki deployed on one machine and Promtail deployed to all machines. After creating a login, I can view only the logs on the hosts on which Loki is running.
When inspecting the Promtail configuration, the configured URL for Loki is set to http://host.containers.internal:3100. Shouldn't this be configured by Cephadm and pointing to the Loki host?
This looks a lot like the issues with incorrectly setting the Grafana or Prometheus URL's, bug 57018 is created for this. Should I create another bug report?
And does someone know a workaround to set the correct URL for the time being?
Best regards,
Sake
Hello
I'm trying to add MONs in advance of a planned downtime.
This has actually ended up removing an existing MON, which isn't helpful.
The error I'm seeing is:
Invalid argument: /var/lib/ceph/mon/ceph-<hostname>/store.db: does not
exist (create_if_missing is false)
error opening mon data directory at '/var/lib/ceph/mon/ceph-<hostname>':
(22) Invalid argument
It appears that the fsid is being stripped, because the directory was there.
It's now in /var/lib/ceph/<fsid>/removed
This appears to be similar to:
https://tracker.ceph.com/issues/45167
which was closed for lack of a reproducer.
The command I ran was:
sudo ceph orch apply mon --placement="comma-separated hostname list"
after running that with "--dry-run".
Would be grateful for some advice here - I wasn't expecting to reduce the
MON count.
Best Wishes,
Adam
Hi *,
last week I successfully upgraded a customer cluster from Nautilus to
Pacific, no real issues, their main use is RGW. A couple of hours
after most of the OSDs were upgraded (the RGWs were not yet) their
application software reported an error, it couldn't write to a bucket.
This error occured again two days ago, in the RGW logs I found the
relevant messages that resharding was happening at that time. I'm
aware that this is nothing unusual, but I can't find anything helpful
how to prevent this except for deactivating dynamic resharding and
then manually do it during maintenance windows. We don't know yet if
there's really data missing after the bucket access has recovered or
not, that still needs to be investigated. Since Nautilus already had
dynamic resharding enabled, I wonder if they were just lucky until
now, for example resharding happened while no data was being written
to the buckets. Or if resharding just didn't happen until then, I have
no access to the cluster so I don't have any bucket stats available
right now. I found this thread [1] about an approach how to prevent
blocked IO but it's from 2019 and I don't know how far that got.
There are many users/operators on this list who use RGW more than me,
how do you deal with this? Are your clients better prepared for these
events? Any comments are appreciated!
Thanks,
Eugen
[1]
https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/NG56XXAM5A4JONT4BG…
Hello,
I have a cluster, which have this configuration:
osd pool default size = 3
osd pool default min size = 1
I have 5 monitor nodes and 7 OSD nodes.
I have changed a crush map to divide ceph cluster to two
datacenters - in the first one will be a part of cluster with 2
copies of data and in the second one will be part of cluster
with one copy - only emergency.
I still have this cluster in one
This cluster have a 1 PiB of raw data capacity, thus it is very
expensive add a further 300TB capacity to have 2+2 data redundancy.
Will it works?
If I turn off the 1/3 location, will it be operational? I
believe, it is a better choose, it will. And what if "die" 2/3
location? On this cluster is pool with cephfs - this is a main
part of CEPH.
Many thanks for your notices.
Sincerely
Jan Marek
--
Ing. Jan Marek
University of South Bohemia
Academic Computer Centre
Phone: +420389032080
http://www.gnu.org/philosophy/no-word-attachments.cs.html
Hi,
In our cluster monitors' log grows to couple GBs in days. There are quite
many debug message from rocksdb, osd, mgr and mds. These should not be
necessary with a well-run cluster. How could I close these logging?
Thanks,
Ben