Hi,
We have in issue in our cluster (octopus 15.2.7) where we’re unable to remove orphaned objects from a pool, despite the fact these objects can be listed with “rados ls”.
Here is an example of an orphaned object which we can list (not sure why multiple objects are returned with the same name…related to the issue perhaps?)
rados ls -p default.rgw.buckets.data | grep -i 5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
And the error message when we try to stat / rm the object:
rados stat -p default.rgw.buckets.data 5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
error stat-ing default.rgw.buckets.data/5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6: (2) No such file or directory
rados -p default.rgw.buckets.data rm 5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
error removing default.rgw.buckets.data>5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6: (2) No such file or directory
The bucket with id "5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83” was deleted from radosgw a few months ago, but we still have approximately 450,000 objects with this bucket id that are orphaned:
cat orphan-list-202101191211.out | grep -i 5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83 | wc -l
448683
I can also see from our metrics that prior to deletion there was about 10TB of compressed data stored in this bucket, and this has not been reclaimed in the pool usage after the bucket was deleted.
Anyone have any suggestions on how we can remove these objects and reclaim the space?
We’re not using snapshots or cache tiers in our environment.
Thanks,
James.
Hi,
We are running a ceph cluster on Ubuntu 18.04 machines with ceph 14.2.4.
Our cephfs clients are using the kernel module and we have noticed that
some of them are sometimes (at least once) hanging after an MDS restart.
The only way to resolve this is to unmount and remount the mountpoint,
or reboot the machine if unmounting is not possible.
After some investigation, the problem seems to be that the MDS denies
reconnect attempts from some clients during restart even though the
reconnect interval is not yet reached. In particular, I see the following
log entries. Note that there are supposedly 9 sessions. 9 clients
reconnect (one client has two mountpoints) and then two more clients
reconnect after the MDS already logged "reconnect_done". These two
clients were hanging after the event. The kernel log of one of them is
shown below too.
Running `ceph tell mds.0 client ls` after the clients have been
rebooted/remounted also shows 11 clients instead of 9.
Do you have any ideas what is wrong here and how it could be fixed? I'm
guessing that the issue is that the MDS apparently has an incorrect
session count and stops the reconnect process to soon. Is this indeed a
bug and if so, do you know what is broken?
Regardless, I also think that the kernel should be able to deal with a
denied reconnect and that it should try again later. Yet, even after
10 minutes, the kernel does not attempt to reconnect. Is this a known
issue or maybe fixed in newer kernels? If not, is there a chance to get
this fixed?
Thanks,
Florian
MDS log:
> 2019-09-26 16:08:27.479 7f9fdde99700 1 mds.0.server reconnect_clients -- 9 sessions
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.24197043 v1:10.1.4.203:0/990008521 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.30487144 v1:10.1.4.146:0/483747473 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.21019865 v1:10.1.7.22:0/3752632657 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.21020717 v1:10.1.7.115:0/2841046616 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.24171153 v1:10.1.7.243:0/1127767158 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.23978093 v1:10.1.4.71:0/824226283 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.24209569 v1:10.1.4.157:0/1271865906 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.20190930 v1:10.1.4.240:0/3195698606 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.20190912 v1:10.1.4.146:0/852604154 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 1 mds.0.59 reconnect_done
> 2019-09-26 16:08:27.483 7f9fdde99700 1 mds.0.server no longer in reconnect state, ignoring reconnect, sending close
> 2019-09-26 16:08:27.483 7f9fdde99700 0 log_channel(cluster) log [INF] : denied reconnect attempt (mds is up:reconnect) from client.24167394 v1:10.1.67.49:0/1483641729 after 0.00400002 (allowed interval 45)
> 2019-09-26 16:08:27.483 7f9fe1087700 0 --1- [v2:10.1.4.203:6800/806949107,v1:10.1.4.203:6801/806949107] >> v1:10.1.67.49:0/1483641729 conn(0x55af50053f80 0x55af50140800 :6801 s=OPENED pgs=21 cs=1 l=0).fault server, going to standby
> 2019-09-26 16:08:27.483 7f9fdde99700 1 mds.0.server no longer in reconnect state, ignoring reconnect, sending close
> 2019-09-26 16:08:27.483 7f9fdde99700 0 log_channel(cluster) log [INF] : denied reconnect attempt (mds is up:reconnect) from client.30586072 v1:10.1.67.140:0/3664284158 after 0.00400002 (allowed interval 45)
> 2019-09-26 16:08:27.483 7f9fe1888700 0 --1- [v2:10.1.4.203:6800/806949107,v1:10.1.4.203:6801/806949107] >> v1:10.1.67.140:0/3664284158 conn(0x55af50055600 0x55af50143000 :6801 s=OPENED pgs=8 cs=1 l=0).fault server, going to standby
Hanging client (10.1.67.49) kernel log:
> 2019-09-26T16:08:27.481676+02:00 hostnamefoo kernel: [708596.227148] ceph: mds0 reconnect start
> 2019-09-26T16:08:27.488943+02:00 hostnamefoo kernel: [708596.233145] ceph: mds0 reconnect denied
> 2019-09-26T16:16:17.541041+02:00 hostnamefoo kernel: [709066.287601] libceph: mds0 10.1.4.203:6801 socket closed (con state NEGOTIATING)
> 2019-09-26T16:16:18.068934+02:00 hostnamefoo kernel: [709066.813064] ceph: mds0 rejected session
> 2019-09-26T16:16:18.068955+02:00 hostnamefoo kernel: [709066.814843] ceph: get_quota_realm: ino (10000000008.fffffffffffffffe) null i_snap_realm
Bonjour,
In the context Software Heritage (a noble mission to preserve all source code)[0], artifacts have an average size of ~3KB and there are billions of them. They never change and are never deleted. To save space it would make sense to write them, one after the other, in an every growing RBD volume (more than 100TB). An index, located somewhere else, would record the offset and size of the artifacts in the volume.
I wonder if someone already implemented this idea with success? And if not... does anyone see a reason why it would be a bad idea?
Cheers
[0] https://docs.softwareheritage.org/
--
Loïc Dachary, Artisan Logiciel Libre
We have a fairly old cluster that has over time been upgraded to nautilus. We were digging through some things and found 3 bucket indexes without a corresponding bucket. They should have been deleted but somehow were left behind. When we try and delete the bucket index, it will not allow it as the bucket is not found. The bucket index list command works fine though without the bucket. Is there a way to delete the indexes? Maybe somehow relink the bucket so it can be deleted again?
Thanks,
Kevin
Hello everyone,
Could some one please let me know what is the recommended modern kernel disk scheduler that should be used for SSD and HDD osds? The information in the manuals is pretty dated and refer to the schedulers which have been deprecated from the recent kernels.
Thanks
Andrei
This is a odd one. I don't hit it all the time so I don't think its expected behavior.
Sometimes I have no issues enabling rbd-mirror snapshot mode on a rbd when its in use by a KVM VM. Other times I hit the following error, the only way I can get around it is to power down the KVM VM.
root@Ccscephtest1:~# rbd mirror image enable CephTestPool1/vm-101-disk-0 snapshot
2021-01-29T09:29:07.875-0500 7f1e99ffb700 -1 librbd::mirror::snapshot::CreatePrimaryRequest: 0x7f1e7c012440 handle_create_snapshot: failed to create mirror snapshot: (22) Invalid argument
2021-01-29T09:29:07.875-0500 7f1e99ffb700 -1 librbd::mirror::EnableRequest: 0x5597667fd200 handle_create_primary_snapshot: failed to create initial primary snapshot: (22) Invalid argument
2021-01-29T09:29:07.875-0500 7f1ea559f3c0 -1 librbd::api::Mirror: image_enable: cannot enable mirroring: (22) Invalid argument
Hi all,
I have a cluster with 116 disks (24 new disks of 16TB added in december
and the rest of 8TB) running nautilus 14.2.16.
I moved (8 month ago) from crush_compat to upmap balancing.
But the cluster seems not well balanced, with a number of pgs on the 8TB
disks varying from 26 to 52 ! And an occupation from 35 to 69%.
The recent 16 TB disks are more homogeneous with 48 to 61 pgs and space
between 30 and 43%.
Last week, I realized that some osd were maybe not using upmap because I
did a ceph osd crush weight-set ls and got (compat) as result.
Thus I ran a ceph osd crush weight-set rm-compat which triggered some
rebalancing. Now there is no more recovery for 2 days, but the cluster
is still unbalanced.
As far as I understand, upmap is supposed to reach an equal number of
pgs on all the disks (I guess weighted by their capacity).
Thus I would expect more or less 30 pgs on the 8TB disks and 60 on the
16TB and around 50% usage on all. Which is not the case (by far).
The problem is that it impact the free available space in the pools
(264Ti while there is more than 578Ti free in the cluster) because free
space seems to be based on space available before the first osd will be
full !
Is it normal ? Did I missed something ? What could I do ?
F.
Hi,
I’ve never seen in our multisite sync status healthy output, almost all the sync shards are recovering.
What can I do with recovering shards?
We have 1 realm, 1 zonegroup and inside the zonegroup we have 3 zones in 3 different geo location.
We are using octopus 15.2.7 for bucket sync with symmetrical replication.
The user is at the moment migrating their data and the sites are always behind which is replicated from the place where it was uploaded.
I’ve restarted all rgw and disable / enable bucket sync, it started to work, but I think when it comes to close sync it will stop again due to the recovering shards.
Any idea?
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Hello all - Hi. After running this dev cluster with a single osd
(/dev/sda) hdd in each node (6), I want to now put the metadata on the
nvme disk which is also used as boot. There is plenty of space left on
the nvme, so I re-did the logical volumes to make a 50gb LV for the
metadata, thinking I'd put the metdata on thenvme/LV and use the entire
/dev/sda as data. Before I really go down this rabbit hole, just want
opinions if this is something that should work? I've tried both Cepth
15.2.7 & Ceph 15.2.8, each with different errors. This particular trace
is ceph15.2.8. This is under Rook, so rook is doing:
<snip>
exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 1 /dev/sda --db-devices /dev/cephDB/database --report
provision 2021-01-28 01:46:56.186043 D | exec: --> passed data devices: 1 physical, 0 LVM
provision 2021-01-28 01:46:56.186074 D | exec: --> relative data size: 1.0
provision 2021-01-28 01:46:56.186079 D | exec: --> passed block_db devices: 0 physical, 1 LVM
provision 2021-01-28 01:46:56.186092 D | exec:
provision 2021-01-28 01:46:56.186104 D | exec: Total OSDs: 1
provision 2021-01-28 01:46:56.186107 D | exec:
provision 2021-01-28 01:46:56.186111 D | exec: Type Path LV Size % of device
provision 2021-01-28 01:46:56.186114 D | exec: ----------------------------------------------------------------------------------------------------
provision 2021-01-28 01:46:56.186117 D | exec: data /dev/sda 3.64 TB 100.00%
provision 2021-01-28 01:46:56.186121 D | exec: block_db /dev/cephDB/database 51.65 GB 10000.00%<snip>
It fails with the stack trace below, essentially complaining it can't
PARTUUID for the LV.
exec: Running command: /usr/sbin/lvcreate --yes -l 953861 -n osd-block-2acd94f6-0fed-423b-8540-ae93c0621c2e ceph-b6da5679-543b-4a79-9cc2-e4e308ba61a4
exec: stderr: Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, LVM will manage logical volume symlinks in device directory.
exec: stderr: Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, LVM will obtain device list by scanning device directory.
exec: stderr: Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, device-mapper library will manage device nodes in device directory.
exec: stdout: Logical volume "osd-block-2acd94f6-0fed-423b-8540-ae93c0621c2e" created.
exec: --> blkid could not detect a PARTUUID for device: /dev/cephDB/database
exec: --> Was unable to complete a new OSD, will rollback changes
exec: Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
exec: stderr: purged osd.0
exec: Traceback (most recent call last):
exec: File "/usr/sbin/ceph-volume", line 11, in <module>
exec: load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
exec: File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 40, in __init__
exec: self.main(self.argv)
exec: File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
exec: return f(*a, **kw)
exec: File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 152, in main
exec: terminal.dispatch(self.mapper, subcommand_args)
exec: File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
exec: instance.main()
exec: File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 42, in main
exec: terminal.dispatch(self.mapper, self.argv)
exec: File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
exec: instance.main()
exec: File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
exec: return func(*a, **kw)
exec: File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 415, in main
exec: self._execute(plan)
exec: File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 431, in _execute
exec: p.safe_prepare(argparse.Namespace(**args))
exec: File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 252, in safe_prepare
exec: self.prepare()
exec: File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
exec: return func(*a, **kw)
exec: File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 382, in prepare
exec: self.args.block_db_slots)
exec: File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 189, in setup_device
exec: name_uuid = self.get_ptuuid(device_name)
exec: File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 135, in get_ptuuid
exec: raise RuntimeError('unable to use device')
exec: RuntimeError: unable to use device
provision failed to configure devices: failed to initialize devices: failed ceph-volume: exit status 1
Thanks for any ideas/help.