Hello Sebastian,
On Fri, Feb 5, 2021 at 8:38 AM Sebastian Knust
<sknust(a)physik.uni-bielefeld.de> wrote:
Hi,
I am running a Ceph Octopus (15.2.8) cluster primarily for CephFS.
v15.2.9 (not yet released) would be the recommended version to start
doing this (it builds some protections into how subvolumes are
snapshotted) but your current workflow looks safe for v15.2.8.
Metadata is stored on SSD, data is stored in three
different pools on
HDD. Currently, I use 22 subvolumes.
I am rotating snapshots on 16 subvolumes, all in the same pool, which is
the primary data pool for CephFS. Currently I have 41 snapshots per
subvolume. The goal is 50 snapshots (see bottom of mail for details).
Snapshots are only placed in the root subvolume directory, i.e.
/volumes/_nogroup/subvolname/hex-id/.snap
Keep in mind those snapshots will not be visible via the `ceph fs
subvolume` interface. Are you not using the `ceph fs subvolume
snapshot` interface?
I place the snapshots on one of the nodes. Complete
CephFS is mounted,
mkdir and rmdir is performed for each relevant subvolume, then CephFS is
unmounted again. All PGs are active+clean most of the time, only a few
in snaptrim for 1-2 minutes after snapshot deletion. I therefore assume
that snaptrim is not a limiting factor.
Obviously, the total number of snapshots is more than the 400 and 100 I
see mentioned in some documentation. I am unsure if that is an issue
here, as the snapshots are all in disjunct subvolumes.
That guidance is obsolete now with the changes present in v15.2.9.
When mounting the subvolumes with kernel client
(ranging from CentOS 7
supplied 3.10 up to 5.4.93), after some time and for some subvolumes the
kworker process begins to hug 100% cpu usage and stat operations become
very slow (even slower than with fuse client). I can mostly replicate
this by starting specific rsync operations (with many small files, e.g.
CTAN, CentOS, Debian mirrors) and by running a bareos backup. The
kworker process seems to be stuck even after terminating the causing
operating, i.e. rsync or bareos-fd.
Interestingly, I can even trigger these issues on a host that has only a
single CephFS subvolume without any snapshots mounted, as long as that
subvolume is in the same pool as other subvolumes with snapshots.
I don't see any abnormal behaviour on the cluster nodes or on other
clients during these kworker hanging phases.
Can you retest with CentOS 8? There are numerous changes and bugfixes
that may have addressed this.
With fuse client, in normal operation stat calls are
about 10-20x slower
than with the kernel client. However, I don't encounter the extreme
slowdown behaviour. I am therefore currently mounting some
known-problematic subvolumes with fuse and non-problematic subvolumes
with the kernel client.
My questions are:
- Is this known or expected behaviour?
No
- I could move the subvolumes with snapshots into a
subvolumegroup and
snapshot the whole group instead of each subvolume. Will this be likely
to solve the issues?
No, please don't do this except through the subvolume API. Note:
subvolume group snapshots are currently disabled (but may not be for
your version of Octopus) but we expect to bring it back soon.
- What is the current recommendation regarding CephFS
and max number of
snapshots?
A given directory should have less than ~100 snapshots including
inherited snapshots.
--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D