New subject: Probable bug when raplacing osd disk with smaller one

5 Sep 2019

+dev(a)ceph.io  -ceph-devel(a)vger.kernel.org

On Thu, Sep 5, 2019 at 8:56 PM Ugis &lt;ugis22(a)gmail.com&gt; wrote:
...

 Hi,

 ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)

 Yesterday noticed unexpected behavior, probably bug. It seems ceph
 wrongly calculates osd size if it is replaced with smaller disk.

 In detail:
 Starting point: 1 osd disk had failed, ceph had reballanced and osd
 was marked down.

 I did remove failed disk(10TB) and replaced with smaller 6TB.
 Followed disk replacement instructions here:
 https://docs.ceph.com/docs/mimic/rados/operations/add-or-rm-osds/

 Destroy the OSD first:
   ceph osd destroy {id} --yes-i-really-mean-it
 Zap a disk for the new OSD, if the disk was used before for other
 purposes. It’s not necessary for a new disk:
   ceph-volume lvm zap /dev/sdX
 Prepare the disk for replacement by using the previously destroyed OSD id:
  ceph-volume lvm  prepare --osd-id {id} --data /dev/sdX
 And activate the OSD:
  ceph-volume lvm activate {id} {fsid}
  I skipped this as was not clear what fsid was needed(probably ceph
 cluster fsid} and just started osd
  systemctl start ceph-osd@29

 OSD came up and reballance started.

 After some time ceph started to complain following:
 # ceph health detail
 HEALTH_WARN 1 nearfull osd(s); 19 pool(s) nearfull; 10 pgs not
 deep-scrubbed in time
 OSD_NEARFULL 1 nearfull osd(s)
     osd.29 is near full

 #ceph osd df tree
 --------------------
 ID  CLASS WEIGHT    REWEIGHT SIZE    RAW USE DATA    OMAP    META
 AVAIL   %USE  VAR  PGS  STATUS TYPE NAME
 ...
 29   hdd   9.09569  1.00000 5.5 TiB 3.3 TiB 3.3 TiB 981 KiB  4.9 GiB
 2.2 TiB 59.75 0.99  590     up         osd.29

 Later I noticed that weight of osd.29 was still 9.09569 as for
 replaced 10TB disk.
 I did: ceph osd crush reweight osd.29 5.45789
 Things got back to normal after reballance.

 Got impression that ceph did not realize that osd had been replaced
 with smaller disk. Could that be because I skipped activation step? Or
 this is a bug.

 Best regards,
 Ugis 

-- 
Cheers,
Brad

Re: Probable bug when raplacing osd disk with smaller one