Hi !
I respond to the list, as it may help others.
I also reorder the response.
> On Mon, Jan 18, 2021 at 2:41 PM Gilles Mocellin <
>
> gilles.mocellin(a)nuagelibre.org> wrote:
> > Hello Cephers,
> >
> > On a new cluster, I only have 2 RBD block images, and the Dashboard
> > doesn't manage to list them correctly.
> >
> > I have this message :
> > Warning
> > Displaying previously cached data for pool veeam-repos.
> >
> > Sometime it disappears, but as soon as I reload or return to the listing
> > page, it's there.
> >
> > What I've seen, is a high CPU load due to ceph-mgr on the active
> > manager.
> > And also stack-traces like this :
[...]
> > dashboard.exceptions.ViewCacheNoDataException: ViewCache: unable to
> > retrieve data
> >
> > I also see that, when I try to edit an image :
> >
> > 2021-01-18T11:13:26.383+0100 7f00199ca700 0 [dashboard ERROR
> > frontend.error]
> > (https://fidcl-mrs4-sto-sds.fidcl.cloud:8443/#/block/rbd/edit/veeam-> > repos%252Fveeam-repo2-vol1
> > <https://fidcl-mrs4-sto-sds.fidcl.cloud:8443/#/block/rbd/edit/veeam-repos%
> > 252Fveeam-repo2-vol1>): Cannot read property 'features_name' of
> > undefined
> >
> > TypeError: Cannot read property 'features_name' of undefined
[...]
> >
> > But that's perhaps just becaus I open an Edit window on the image and it
> > does not have the datas.
> > The Edit window is empty, and I can't edit things, especially, I wan't
> > to resize the image.
> >
[...]
> > --
> > Gilles
Le jeudi 21 janvier 2021, 21:56:58 CET Ernesto Puerta a écrit :
> Hey Gilles,
>
> If I'm not wrong, that exception (ViewCacheNoDataException) happens when
> the dashboard is unable to gather all required data from Ceph within a
> defined timeout (5 secs I think, since the UI refreshes the data every ~5
> seconds).
>
> It'd be great if you could provide the steps to reproduce it and some
> insights into your environment (number of RBD pools, number of RBD images,
> snapshots, etc.).
>
> Kind Regards,
>
> Ernesto
OK,
As it is now, it always hapens, on the image listing, I have the Warning and
the list is not always up to date, if I create an image, I must wait very long
to see it.
Also, I can not edit the 2 big images I have. Perhaps the size is important,
they are 2 images of 40 TB.
If I create a 1 GB test image, I can edit and resize it.
But impossible withe the big image, the windows opens but all the fields are
empty.
Also, if it can matter, the images use a data pool (EC 3+2).
I have 2 pools, a replicated one for metadatas veeam-repos (replic x3), and a
data pool veeam-repos.data (EC 3+2).
My cluster has 6 nodes with AMD 16 cores CPU, 128 GB RAM, 10 8 TB HDD.
So 60 OSD. Soon doubling everything to 12 nodes.
Usage, as the pool and image names can tell, is to mount RBD image as a XFS
filesystem for a Veeam Backup Repository (krbd, because nbd-rbd tailed
regularly, especially during fstrim).
Hi all,
We are testing our S3 Ceph endpoints and we are not satisfied with its
speed. Our results are something between around 120 - 150 MB/s depending
on small/bigger files. This is good for 1Gbps connection, but not for
10GE or more.
We've tried the most recent versions of the AWS CLI, s3cmd, s4cmd, s3fs
... programs. Of course we are using multipart upload/download which is
precondition for parallel upload/download. Also we tried multi-thread
(25 or more threads) transfer in s4cmd but still we don't get proper
results.
For proof of concept that high speed can be achieved we have written
small script in bash which uses multi-part & parallel transfer and can
saturate at least 10GE without problem.
I would like to ask you, if you know proper program and its parameters,
so we can saturate n x 10GE if needed?
We are using the latest nautilus.
S3 gateways have much more computer power and bandwidth to internet then
it is used right now.
Thank you
Regards
Michal Strnad
Hi,
We have in issue in our cluster (octopus 15.2.7) where we’re unable to remove orphaned objects from a pool, despite the fact these objects can be listed with “rados ls”.
Here is an example of an orphaned object which we can list (not sure why multiple objects are returned with the same name…related to the issue perhaps?)
rados ls -p default.rgw.buckets.data | grep -i 5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
And the error message when we try to stat / rm the object:
rados stat -p default.rgw.buckets.data 5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
error stat-ing default.rgw.buckets.data/5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6: (2) No such file or directory
rados -p default.rgw.buckets.data rm 5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6
error removing default.rgw.buckets.data>5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83__shadow_anon_backup_xxxx_xx_xx_090109_7812500.bak.vLHmbxS4DAnRMDVjBYG-5X6iSmepDD6: (2) No such file or directory
The bucket with id "5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83” was deleted from radosgw a few months ago, but we still have approximately 450,000 objects with this bucket id that are orphaned:
cat orphan-list-202101191211.out | grep -i 5a5c812a-3d31-xxxx-xxxx-xxxxxxxxxxxx.4811659.83 | wc -l
448683
I can also see from our metrics that prior to deletion there was about 10TB of compressed data stored in this bucket, and this has not been reclaimed in the pool usage after the bucket was deleted.
Anyone have any suggestions on how we can remove these objects and reclaim the space?
We’re not using snapshots or cache tiers in our environment.
Thanks,
James.
Hi,
We are running a ceph cluster on Ubuntu 18.04 machines with ceph 14.2.4.
Our cephfs clients are using the kernel module and we have noticed that
some of them are sometimes (at least once) hanging after an MDS restart.
The only way to resolve this is to unmount and remount the mountpoint,
or reboot the machine if unmounting is not possible.
After some investigation, the problem seems to be that the MDS denies
reconnect attempts from some clients during restart even though the
reconnect interval is not yet reached. In particular, I see the following
log entries. Note that there are supposedly 9 sessions. 9 clients
reconnect (one client has two mountpoints) and then two more clients
reconnect after the MDS already logged "reconnect_done". These two
clients were hanging after the event. The kernel log of one of them is
shown below too.
Running `ceph tell mds.0 client ls` after the clients have been
rebooted/remounted also shows 11 clients instead of 9.
Do you have any ideas what is wrong here and how it could be fixed? I'm
guessing that the issue is that the MDS apparently has an incorrect
session count and stops the reconnect process to soon. Is this indeed a
bug and if so, do you know what is broken?
Regardless, I also think that the kernel should be able to deal with a
denied reconnect and that it should try again later. Yet, even after
10 minutes, the kernel does not attempt to reconnect. Is this a known
issue or maybe fixed in newer kernels? If not, is there a chance to get
this fixed?
Thanks,
Florian
MDS log:
> 2019-09-26 16:08:27.479 7f9fdde99700 1 mds.0.server reconnect_clients -- 9 sessions
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.24197043 v1:10.1.4.203:0/990008521 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.30487144 v1:10.1.4.146:0/483747473 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.21019865 v1:10.1.7.22:0/3752632657 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.21020717 v1:10.1.7.115:0/2841046616 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.24171153 v1:10.1.7.243:0/1127767158 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.23978093 v1:10.1.4.71:0/824226283 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.24209569 v1:10.1.4.157:0/1271865906 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.20190930 v1:10.1.4.240:0/3195698606 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 0 log_channel(cluster) log [DBG] : reconnect by client.20190912 v1:10.1.4.146:0/852604154 after 0
> 2019-09-26 16:08:27.479 7f9fdde99700 1 mds.0.59 reconnect_done
> 2019-09-26 16:08:27.483 7f9fdde99700 1 mds.0.server no longer in reconnect state, ignoring reconnect, sending close
> 2019-09-26 16:08:27.483 7f9fdde99700 0 log_channel(cluster) log [INF] : denied reconnect attempt (mds is up:reconnect) from client.24167394 v1:10.1.67.49:0/1483641729 after 0.00400002 (allowed interval 45)
> 2019-09-26 16:08:27.483 7f9fe1087700 0 --1- [v2:10.1.4.203:6800/806949107,v1:10.1.4.203:6801/806949107] >> v1:10.1.67.49:0/1483641729 conn(0x55af50053f80 0x55af50140800 :6801 s=OPENED pgs=21 cs=1 l=0).fault server, going to standby
> 2019-09-26 16:08:27.483 7f9fdde99700 1 mds.0.server no longer in reconnect state, ignoring reconnect, sending close
> 2019-09-26 16:08:27.483 7f9fdde99700 0 log_channel(cluster) log [INF] : denied reconnect attempt (mds is up:reconnect) from client.30586072 v1:10.1.67.140:0/3664284158 after 0.00400002 (allowed interval 45)
> 2019-09-26 16:08:27.483 7f9fe1888700 0 --1- [v2:10.1.4.203:6800/806949107,v1:10.1.4.203:6801/806949107] >> v1:10.1.67.140:0/3664284158 conn(0x55af50055600 0x55af50143000 :6801 s=OPENED pgs=8 cs=1 l=0).fault server, going to standby
Hanging client (10.1.67.49) kernel log:
> 2019-09-26T16:08:27.481676+02:00 hostnamefoo kernel: [708596.227148] ceph: mds0 reconnect start
> 2019-09-26T16:08:27.488943+02:00 hostnamefoo kernel: [708596.233145] ceph: mds0 reconnect denied
> 2019-09-26T16:16:17.541041+02:00 hostnamefoo kernel: [709066.287601] libceph: mds0 10.1.4.203:6801 socket closed (con state NEGOTIATING)
> 2019-09-26T16:16:18.068934+02:00 hostnamefoo kernel: [709066.813064] ceph: mds0 rejected session
> 2019-09-26T16:16:18.068955+02:00 hostnamefoo kernel: [709066.814843] ceph: get_quota_realm: ino (10000000008.fffffffffffffffe) null i_snap_realm
Bonjour,
In the context Software Heritage (a noble mission to preserve all source code)[0], artifacts have an average size of ~3KB and there are billions of them. They never change and are never deleted. To save space it would make sense to write them, one after the other, in an every growing RBD volume (more than 100TB). An index, located somewhere else, would record the offset and size of the artifacts in the volume.
I wonder if someone already implemented this idea with success? And if not... does anyone see a reason why it would be a bad idea?
Cheers
[0] https://docs.softwareheritage.org/
--
Loïc Dachary, Artisan Logiciel Libre
We have a fairly old cluster that has over time been upgraded to nautilus. We were digging through some things and found 3 bucket indexes without a corresponding bucket. They should have been deleted but somehow were left behind. When we try and delete the bucket index, it will not allow it as the bucket is not found. The bucket index list command works fine though without the bucket. Is there a way to delete the indexes? Maybe somehow relink the bucket so it can be deleted again?
Thanks,
Kevin
Hello everyone,
Could some one please let me know what is the recommended modern kernel disk scheduler that should be used for SSD and HDD osds? The information in the manuals is pretty dated and refer to the schedulers which have been deprecated from the recent kernels.
Thanks
Andrei
This is a odd one. I don't hit it all the time so I don't think its expected behavior.
Sometimes I have no issues enabling rbd-mirror snapshot mode on a rbd when its in use by a KVM VM. Other times I hit the following error, the only way I can get around it is to power down the KVM VM.
root@Ccscephtest1:~# rbd mirror image enable CephTestPool1/vm-101-disk-0 snapshot
2021-01-29T09:29:07.875-0500 7f1e99ffb700 -1 librbd::mirror::snapshot::CreatePrimaryRequest: 0x7f1e7c012440 handle_create_snapshot: failed to create mirror snapshot: (22) Invalid argument
2021-01-29T09:29:07.875-0500 7f1e99ffb700 -1 librbd::mirror::EnableRequest: 0x5597667fd200 handle_create_primary_snapshot: failed to create initial primary snapshot: (22) Invalid argument
2021-01-29T09:29:07.875-0500 7f1ea559f3c0 -1 librbd::api::Mirror: image_enable: cannot enable mirroring: (22) Invalid argument
Hi all,
I have a cluster with 116 disks (24 new disks of 16TB added in december
and the rest of 8TB) running nautilus 14.2.16.
I moved (8 month ago) from crush_compat to upmap balancing.
But the cluster seems not well balanced, with a number of pgs on the 8TB
disks varying from 26 to 52 ! And an occupation from 35 to 69%.
The recent 16 TB disks are more homogeneous with 48 to 61 pgs and space
between 30 and 43%.
Last week, I realized that some osd were maybe not using upmap because I
did a ceph osd crush weight-set ls and got (compat) as result.
Thus I ran a ceph osd crush weight-set rm-compat which triggered some
rebalancing. Now there is no more recovery for 2 days, but the cluster
is still unbalanced.
As far as I understand, upmap is supposed to reach an equal number of
pgs on all the disks (I guess weighted by their capacity).
Thus I would expect more or less 30 pgs on the 8TB disks and 60 on the
16TB and around 50% usage on all. Which is not the case (by far).
The problem is that it impact the free available space in the pools
(264Ti while there is more than 578Ti free in the cluster) because free
space seems to be based on space available before the first osd will be
full !
Is it normal ? Did I missed something ? What could I do ?
F.