How to monitor Ceph MDS operation latencies when slow cephfs performance - ceph-users

11 Feb 2020

Hello , 

Cephfs operations are slow in our cluster , I see low number of operations or throughput
in the pools and all other resources as well. I think it is MDS operations that are
causing the issue. I increased mds_cache_memory_limit to 3 GB from 1 GB but not seeing any
improvements in the user access times. 

How do I monitor the MDS operations like metadata operations latencies including inode
access times update time and directory operations latencies ? 
we am using 14.2.3 ceph version. 

I have increased mds_cache_memory_limit but not sure how to check what is been used and
how effectively we are using it. 
# ceph config get mds.0 mds_cache_memory_limit
3221225472

I also see this , we are maninging PG using autoscale , however I see BIAS as 4.0 where as
all pools have 1.0 not sure what is this number exactly and how it effect cluster . 
# ceph osd pool autoscale-status | egrep "cephfs|POOL"
 POOL                             SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET
RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE
 cephfs01-metadata               1775M                3.0        167.6T  0.0000           
     4.0       8              on
 cephfs01-data0                 739.5G                3.0        167.6T  0.0129           
     1.0      32              on

There is one large OMAP. 
[root@knode25 /]# ceph health detail
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
    1 large objects found in pool 'cephfs01-metadata'
    Search the cluster log for 'Large omap object found' for more details.

I recently had similar one and I was able to remove that by running deep scrub , not sure
why they are keep forming and how to solve this for good ?  

Thanks,
Uday.