Cannot remove cache tier - ceph-users

3 Jul 2020

Hello.

I have tried to follow through the documented writeback cache tier
removal procedure
(https://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-…)
on a test cluster, and failed.

I have successfully executed this command:

ceph osd tier cache-mode alex-test-rbd-cache proxy

Next, I am supposed to run this:

rados -p alex-test-rbd-cache ls
rados -p alex-test-rbd-cache cache-flush-evict-all

The failure mode is that, while the client i/o still going on, I
cannot get zero objects in the cache pool, even with the help of
"rados -p alex-test-rbd-cache cache-flush-evict-all". And yes, I have
waited more than 20 minutes (my cache tier has hit_set_count 10 and
hit_set_period 120).

I also tried to set both cache_target_dirty_ratio and
cache_target_full_ratio to 0, it didn't help.

Here is the relevant part of the pool setup:

# ceph osd pool ls detail
pool 25 'alex-test-rbd-metadata' replicated size 3 min_size 2
crush_rule 9 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode
warn last_change 10973111 lfor 0/10971347/10971345 flags
hashpspool,nodelete stripe_width 0 application rbd
pool 26 'alex-test-rbd-data' erasure size 6 min_size 5 crush_rule 12
object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode warn
last_change 10973112 lfor 10971705/10971705/10971705 flags
hashpspool,ec_overwrites,nodelete,selfmanaged_snaps tiers 27 read_tier
27 write_tier 27 stripe_width 16384 application rbd
removed_snaps [1~3]
pool 27 'alex-test-rbd-cache' replicated size 3 min_size 2 crush_rule
9 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn
last_change 10973113 lfor 10971705/10971705/10971705 flags
hashpspool,incomplete_clones,nodelete,selfmanaged_snaps tier_of 26
cache_mode proxy target_bytes 10000000000 hit_set
bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 120s
x10 decay_rate 0 search_last_n 0 stripe_width 0 application rbd
removed_snaps [1~3]

The relevant crush rules are selecting ssds for the
alex-test-rbd-cache and alex-test-rbd-metadata pools (plain old
"replicated size 3" pools), and hdds for alex-test-rbd-data (which is
EC 4+2).

The client workload, which seemingly outpaces the eviction and flushing, is:

for a in `seq 1000 2000` ; do
    time rbd import --data-pool alex-test-rbd-data
./Fedora-Cloud-Base-32-1.6.x86_64.raw
alex-test-rbd-metadata/Fedora-copy-$a
done

The ceph version is "ceph version 14.2.9
(2afdc1f644870fb6315f25a777f9e4126dacc32d) nautilus (stable)" on all
osds.

The relevant part of "ceph df" is:

RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
    hdd        23 TiB      20 TiB     2.9 TiB      3.0 TiB         12.99
    ssd       1.7 TiB     1.7 TiB      19 GiB       23 GiB          1.28
    TOTAL      25 TiB      22 TiB     2.9 TiB      3.0 TiB         12.17

POOLS:
    POOL                           ID     STORED      OBJECTS     USED
       %USED     MAX AVAIL
<irrelevant pools omitted>
    alex-test-rbd-metadata         25     237 KiB       2.37k      59
MiB         0       564 GiB
    alex-test-rbd-data             26     691 GiB     198.57k     1.0
TiB      6.52       9.7 TiB
    alex-test-rbd-cache            27     5.1 GiB       2.99k      15
GiB      0.90       564 GiB

The total size and the number of stored objects in the
alex-test-rbd-cache pool oscillate around 5 GB and 3K, respectively,
while "rados -p alex-test-rbd-cache cache-flush-evict-all" is running
in a loop. Without it, the size grows to 6 GB and stays there.

# ceph -s
  cluster:
    id:     <omitted for privacy>
    health: HEALTH_WARN
            1 cache pools at or near target size

  services:
    mon:         3 daemons, quorum xx-4a,xx-3a,xx-2a (age 10d)
    mgr:         xx-3a(active, since 5w), standbys: xx-2b, xx-2a, xx-4a
    mds:         cephfs:1 {0=xx-4b=up:active} 2 up:standby
    osd:         89 osds: 89 up (since 7d), 89 in (since 7d)
    rgw:         3 daemons active (xx-2b, xx-3b, xx-4b)
    tcmu-runner: 6 daemons active (<only irrelevant images here>)

  data:
    pools:   15 pools, 1976 pgs
    objects: 6.64M objects, 1.3 TiB
    usage:   3.1 TiB used, 22 TiB / 25 TiB avail
    pgs:     1976 active+clean

  io:
    client:   290 KiB/s rd, 251 MiB/s wr, 366 op/s rd, 278 op/s wr
    cache:    123 MiB/s flush, 72 MiB/s evict, 31 op/s promote, 3 PGs
flushing, 1 PGs evicting

Is there any workaround, short of somehow telling the client to stop
creating new rbds?

-- 
Alexander E. Patrakov
CV: http://pc.cd/PLz7