[ceph-users] Many ceph commands hang. broken mgr?

24 Nov 2020

Ever since we jumped from 14.2.9 to .12 (and beyond) a lot of the ceph commands just hang.
 The mgr daemon also just stops responding to our Prometheus scrapes occasionally.  A
daemon restart and it wakes back up.  I have nothing pointing to these being related but
it feels that way.

I also tried to get device health monitoring with smart up and running around that upgrade
time.  It never seemed to be able to pull in and report on the health across the drives. 
I did see the osd process firing off smartctl on occasion though so it was trying to do
something.  Again, I have nothing pointing to this being related but it feels like it may
be.

Some commands that currently hang:
ceph osd pool autoscale-status
ceph balancer *
ceph iostat (oddly, this spit out a line of all 0 stats once and then hung)
ceph fs status
toggling ceph device monitoring on or off and a lot of the device health stuff too

Mgr logs on disk show flavors of this:
2020-11-24 13:05:07.883 7f19e2c40700  0 log_channel(audit) log [DBG] : from='mon.0
-' entity='mon.' cmd=[{,",p,r,e,f,i,x,",:, ,",o,s,d,
,p,e,r,f,",,, ,",f,o,r,m,a,t,",:, ,",j,s,o,n,",}]: dispatch
2020-11-24 13:05:07.895 7f19e2c40700  0 log_channel(audit) log [DBG] : from='mon.0
-' entity='mon.' cmd=[{,",p,r,e,f,i,x,",:, ,",o,s,d, ,p,o,o,l,
,s,t,a,t,s,",,, ,",f,o,r,m,a,t,",:, ,",j,s,o,n,",}]: dispatch
2020-11-24 13:05:08.567 7f19e1c3e700  0 log_channel(cluster) log [DBG] : pgmap v587: 17149
pgs: 1 active+remapped+backfill_wait, 2 active+clean+scrubbing, 55
active+clean+scrubbing+deep, 9 active+remapped+backfilling, 17082 active+clean; 2.1 PiB
data, 3.5 PiB used, 2.9 PiB / 6.4 PiB avail; 108 MiB/s rd, 53 MiB/s wr, 1.20k op/s;
7525420/9900121381 objects misplaced (0.076%); 99 MiB/s, 40 objects/s recovering

ceph status:
  cluster:
    id:     971a5242-f00d-421e-9bf4-5a716fcc843a
    health: HEALTH_WARN
            1 nearfull osd(s)
            1 pool(s) nearfull

  services:
    mon: 3 daemons, quorum ceph-mon-01,ceph-mon-03,ceph-mon-02 (age 4h)
    mgr: ceph-mon-01(active, since 97s), standbys: ceph-mon-03, ceph-mon-02
    mds: cephfs:1 {0=ceph-mds-02=up:active} 3 up:standby
    osd: 843 osds: 843 up (since 13d), 843 in (since 2w); 10 remapped pgs
    rgw: 1 daemon active (ceph-rgw-01)

  task status:
    scrub status:
        mds.ceph-mds-02: idle

  data:
    pools:   16 pools, 17149 pgs
    objects: 1.61G objects, 2.1 PiB
    usage:   3.5 PiB used, 2.9 PiB / 6.4 PiB avail
    pgs:     6482000/9900825469 objects misplaced (0.065%)
             17080 active+clean
             54    active+clean+scrubbing+deep
             9     active+remapped+backfilling
             5     active+clean+scrubbing
             1     active+remapped+backfill_wait

  io:
    client:   877 MiB/s rd, 1.8 GiB/s wr, 1.91k op/s rd, 3.33k op/s wr
    recovery: 136 MiB/s, 55 objects/s

ceph config dump:
WHO                MASK LEVEL    OPTION                                         VALUE     
                                       RO 
global                  advanced cluster_network                               
192.168.42.0/24                                   *  
global                  advanced mon_max_pg_per_osd                             400       

global                  advanced mon_pg_warn_max_object_skew                    -1.000000 

global                  dev      mon_warn_on_pool_pg_num_not_power_of_two       false     

global                  advanced osd_max_backfills                              2         

global                  advanced osd_max_scrubs                                 4         

global                  advanced osd_scrub_during_recovery                      false     

global                  advanced public_network                                
1xx.xx.171.0/24 10.16.171.0/24                    *  
  mon                   advanced mon_allow_pool_delete                          true      

  mgr                   advanced mgr/balancer/mode                              none      

  mgr                   advanced mgr/devicehealth/enable_monitoring             false     

  osd                   advanced bluestore_compression_mode                     passive   

  osd                   advanced osd_deep_scrub_large_omap_object_key_threshold 2000000   

  osd                   advanced osd_op_queue_cut_off                           high      
                                       *  
  osd                   advanced osd_scrub_load_threshold                       5.000000  

  mds                   advanced mds_beacon_grace                               300.000000

  mds                   basic    mds_cache_memory_limit                        
16384000000                                          
  mds                   advanced mds_log_max_segments                           256       

  client                advanced rbd_default_features                           5         

    client.libvirt      advanced admin_socket                                  
/var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok *  
    client.libvirt      basic    log_file                                      
/var/log/ceph/qemu-guest-$pid.log                 * 

/etc/ceph/ceph.conf is the stub file with fsid and the mons listed.
Yes I have a drive that just started to tickle the full warn limit.  That's what
pulled me back into the "I should fix this" mode.  I'm manually adjusting
the weight on that one for the time being along with slowly lowering pg_num on an
oversized pool.  The cluster still has this issue when in health_ok.  

I'm free to do a lot of debugging and poking around even though this is our production
cluster.  The only service I refuse to play around with is the MDS.  That one bites back. 
Does anyone have more ideas on where to look to try and figure out what's going on?

--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfmeec(a)rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.
------------------------

2024

2023

2022

2021

2020

2019

[ceph-users] Many ceph commands hang. broken mgr?