Hi,
Currently running Mimic 13.2.5.
We had reports this morning of timeouts and failures with PUT and GET
requests to our Ceph RGW cluster. I found these messages in the RGW
log:
RGWReshardLock::lock failed to acquire lock on
bucket_name:bucket_instance ret=-16
NOTICE: resharding operation on bucket index detected, blocking
block_while_resharding ERROR: bucket is still resharding, please retry
Which were preceded by many of these, which I think are normal/expected.
check_bucket_shards: resharding needed: stats.num_objects=6415879
shard max_objects=6400000
Our RGW cluster sits behind haproxy which notified me approx 90
seconds after the first 'resharding needed' message that no backends
were available. It appears this dynamic reshard process caused the
RGWs to lock up for a period of time. Roughly 2 minutes later the
reshard error messages stop and operation returns to normal.
Looking back through previous RGW logs, I see a similar event from
about a week ago, on the same bucket. We have several buckets with
shard counts exceeding 1k (this one only has 128), and much larger
object counts, so clearly this isn't the first time dynamic sharding
has been invoked on this cluster.
Has anyone seen this? I expect it will come up again, and can turn up
debugging if that'll help. Thanks for any assistance!
Josh
After upgrading one of our clusters from Luminous 12.2.12 to Nautilus 14.2.6, I am seeing 100% CPU usage by a single ceph-mgr thread (found using 'top -H'). The way we found this was due to Prometheus being unable to report out certain pieces of data, specifically OSD Usage, OSD Apply and Commit Latency. Which are all similar issues people were having in previous versions of Nautilus.
Bryan Stillwell reported this previously on a separate cluster, 14.2.5, we have here:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/VW3GNVJGOOW…
That issue was resolved with the upgrade to 14.2.6.
We are seeing a similar issue on this other cluster with a couple differences.
This cluster has 1900+ OSD in it, the previous one had 300+
The top user is libceph-common, instead of mmap
4.86% libceph-common.so.0 [.] EventCenter::create_time_event
2.78% [kernel] [k] nmi
2.64% libstdc++.so.6.0.19 [.] __dynamic_cast
On all our other clusters that have been upgraded to 14.2.6 we are not experiencing this issue, the next largest being 800+ OSD.
We feel this is related to the size of the cluster, similarly to the previous report.
Anyone else experiencing this and/or can provide some direction on how to go about resolving this?
Thanks,
Joe
On our test cluster after upgrading to 14.2.5 I'm having problems with the mons pegging a CPU core while moving data around. I'm currently converting the OSDs from FileStore to BlueStore by marking the OSDs out in multiple nodes, destroying the OSDs, and then recreating them with ceph-volume lvm batch. This seems too get the ceph-mon process into a state where it pegs a CPU core on one of the mons:
1764450 ceph 20 0 4802412 2.1g 16980 S 100.0 28.1 4:54.72 ceph-mon
Has anyone else run into this with 14.2.5 yet? I didn't see this problem while the cluster was running 14.2.4.
Thanks,
Bryan
Hi,
I am having an unusual slowdown using VMware with ISCSI gws. I have two
ISCSI gateways with two RBD images. I have checked the following in the
logs:
Dec 24 09:00:26 ceph-iscsi2 tcmu-runner: 2019-12-24 09:00:26.040 969 [INFO]
alua_implicit_transition:562 rbd/pool1.vmware_iscsi1: Starting lock
acquisition operation.2019-12-24 09:00:26.040 969 [INFO]
alua_implicit_transition:557 rbd/pool1.vmware_iscsi1: Lock acquisition
operation is already in process.2019-12-24 09:00:26.973 969 [WARN]
tcmu_rbd_lock:744 rbd/pool1.vmware_iscsi1: Acquired exclusive lock.
Dec 24 09:00:26 ceph-iscsi2 tcmu-runner: tcmu_rbd_lock:744
rbd/pool1.vmware_iscsi1: Acquired exclusive lock.
Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: 2019-12-24 09:00:28.099 969 [WARN]
tcmu_notify_lock_lost:201 rbd/pool1.vmware_iscsi1: Async lock drop. Old
state 1
Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: tcmu_notify_lock_lost:201
rbd/pool1.vmware_iscsi1: Async lock drop. Old state 1
Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: alua_implicit_transition:562
rbd/pool1.vmware_iscsi1: Starting lock acquisition operation.
Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: 2019-12-24 09:00:28.824 969 [INFO]
alua_implicit_transition:562 rbd/pool1.vmware_iscsi1: Starting lock
acquisition operation.2019-12-24 09:00:28.990 969 [WARN] tcmu_rbd_lock:744
rbd/pool1.vmware_iscsi1: Acquired exclusive lock.
Dec 24 09:00:28 ceph-iscsi2 tcmu-runner: tcmu_rbd_lock:744
rbd/pool1.vmware_iscsi1: Acquired exclusive lock.
Can anyone help-me please?
Gesiel
Dear Cephalopodians,
running 13.2.6 on the source cluster and 14.2.5 on the rbd mirror nodes and the target cluster,
I observe regular failures of rbd-mirror processes.
With failures, I mean that traffic stops, but the daemons are still listed as active rbd-mirror daemons in
"ceph -s", and the daemons are still running. This comes in sync with a hefty load of below messages in the mirror logs.
This happens "sometimes" when some OSDs go down and up in the target cluster (which happens each night since the disks in that cluster
shortly go offline during "online" smart self-tests - that's a problem in itself, but it's a cluster built from hardware that would have been trashed otherwise).
The rbd daemons keep running in any case, but synchronization stops. If not all rbd mirror daemons have failed (we have three running, and it usually does not hit all of them),
the "surviving" seem(s) not to take care of the images the other daemons had locked.
Right now, I am eyeing with a "quick solution" of regularly restarting the rbd-mirror daemons, but if there are any good ideas on which debug info I could collect
to get this analyzed and fixed, that would of course be appreciated :-).
Cheers,
Oliver
-----------------------------------------------
2019-12-24 02:08:51.379 7f31c530e700 -1 rbd::mirror::ImageReplayer: 0x559dcb968d00 [2/aabba863-89fd-4ea5-bb8c-0f417225d394] handle_process_entry_safe: failed to commit journal event: (108) Cannot send after transport endpoint shutdown
2019-12-24 02:08:51.379 7f31c530e700 -1 rbd::mirror::ImageReplayer: 0x559dcb968d00 [2/aabba863-89fd-4ea5-bb8c-0f417225d394] handle_replay_complete: replay encountered an error: (108) Cannot send after transport endpoint shutdown
...
2019-12-24 02:08:54.392 7f31c530e700 -1 rbd::mirror::ImageReplayer: 0x559dcb87bb00 [2/23699357-a611-4557-9d73-6ff5279da991] handle_process_entry_safe: failed to commit journal event: (125) Operation canceled
2019-12-24 02:08:54.392 7f31c530e700 -1 rbd::mirror::ImageReplayer: 0x559dcb87bb00 [2/23699357-a611-4557-9d73-6ff5279da991] handle_replay_complete: replay encountered an error: (125) Operation canceled
2019-12-24 02:08:55.707 7f31ea358700 -1 rbd::mirror::image_replayer::GetMirrorImageIdRequest: 0x559dce2e05b0 handle_get_image_id: failed to retrieve image id: (108) Cannot send after transport endpoint shutdown
2019-12-24 02:08:55.707 7f31ea358700 -1 rbd::mirror::image_replayer::GetMirrorImageIdRequest: 0x559dcf47ee70 handle_get_image_id: failed to retrieve image id: (108) Cannot send after transport endpoint shutdown
...
2019-12-24 02:08:55.716 7f31f5b6f700 -1 rbd::mirror::ImageReplayer: 0x559dcb997680 [2/f8218221-6608-4a2b-8831-84ca0c2cb418] operator(): start failed: (108) Cannot send after transport endpoint shutdown
2019-12-24 02:09:25.707 7f31f5b6f700 -1 rbd::mirror::InstanceReplayer: 0x559dcabd5b80 start_image_replayer: global_image_id=0577bd16-acc4-4e9a-81f0-c698a24f8771: blacklisted detected during image replay
2019-12-24 02:09:25.707 7f31f5b6f700 -1 rbd::mirror::InstanceReplayer: 0x559dcabd5b80 start_image_replayer: global_image_id=05bd4cca-a561-4a5c-ad83-9905ad5ce34e: blacklisted detected during image replay
2019-12-24 02:09:25.707 7f31f5b6f700 -1 rbd::mirror::InstanceReplayer: 0x559dcabd5b80 start_image_replayer: global_image_id=0e614ece-65b1-4b4a-99bd-44dd6235eb70: blacklisted detected during image replay
-----------------------------------------------
Hi!
I've been running CephFS for a while now and ever since setting it up, I've seen unexpectedly large write i/o on the CephFS metadata pool.
The filesystem is otherwise stable and I'm seeing no usage issues.
I'm in a read-intensive environment, from the clients' perspective and throughput for the metadata pool is consistently larger than that of the data pool.
For example:
# ceph osd pool stats
pool cephfs_data id 1
client io 7.6 MiB/s rd, 19 KiB/s wr, 404 op/s rd, 1 op/s wr
pool cephfs_metadata id 2
client io 338 KiB/s rd, 43 MiB/s wr, 84 op/s rd, 26 op/s wr
I realise, of course, that this is a momentary display of statistics, but I see this unbalanced r/w activity consistently when monitoring it live.
I would like some insight into what may be causing this large imbalance in r/w, especially since I'm in a read-intensive (web hosting) environment.
Some of it may be expected in when considering details of my environment and CephFS implementation specifics, so please ask away if more details are needed.
With my experience using NFS, I would start by looking at client io stats, like `nfsstat` and tuning e.g. mount options, but I haven't been able to find such statistics for CephFS clients.
Is there anything of the sort for CephFS? Are similar stats obtainable in some other way?
This might be a somewhat broad question and shallow description, so yeah, let me know if there's anything you would like more details on.
Thanks a lot,
Samy
Hi.
before I descend into what happened and why it happened: I'm talking about a
test-cluster so I don't really care about the data in this case.
We've recently started upgrading from luminous to nautilus, and for us that
means we're retiring ceph-disk in favour of ceph-volume with lvm and
dmcrypt.
Our setup is in containers and we've got DBs separated from Data.
When testing our upgrade-path we discovered that running the host on
ubuntu-xenial and the containers on centos-7.7 leads to lvm inside the
containers not using lvmetad because it's too old. That in turn means that
not running `vgscan --cache` on the host before adding a LV to a VG
essentially zeros the metadata for all LVs in that VG.
That happened on two out of three hosts for a bunch of OSDs and those OSDs
are gone. I have no way of getting them back, they've been overwritten
multiple times trying to figure out what went wrong.
So now I have a cluster that's got 16 pgs in 'incomplete', 14 of them with 0
objects, 2 with about 150 objects each.
I have found a couple of howtos that tell me to use ceph-objectstore-tool to
find the pgs on the active osds and I've given that a try, but
ceph-objectstore-tool always tells me it can't find the pg I am looking for.
Can I tell ceph to re-init the pgs? Do I have to delete the pools and
recreate them?
There's no data I can't get back in there, I just don't feel like
scrapping and redeploying the whole cluster.
--
Cheers,
Hardy
Hi Ceph Community.
We currently have a luminous cluster running and some machines still on Ubuntu 14.04
We are looking to upgrade these machines to 18.04 but the only upgrade path for luminous with the ceph repo is through 16.04.
It is doable to get to Mimic but then we have to upgrade all those machines to 16.04 but then we have to upgrade again to 18.04 when we get to Mimic, it is becoming a huge time sink.
I did notice in the Ubuntu repos they have added 12.2.12 in 18.04.4 release. Is this a reliable build we can use?
https://ubuntu.pkgs.org/18.04/ubuntu-proposed-main-amd64/ceph_12.2.12-0ubun…
If so then we can go straight to 18.04.4 and not waste so much time.
Best