Hi,
yesterday I had to power off some vm (proxmox) backed by rbd images for maintenance.
After the VMs were off, I tried to create a Snapshot which didn't finish even after
half an hour.
Because it was maintenance I rebooted all VM nodes an all ceph nodes - nothing changed.
Powering on the VM was impossible, kvm exited with timeout.
This happened to two of about 15 VM.
Two of three Images of one VM still had locks, which I did remove but still unable to
power on.
I tried to access the Image by mapping it with rbd-nbd, which was unsuccessful and logged
this:
[ 8601.746971] block nbd0: Connection timed out
[ 8601.747648] block nbd0: shutting down sockets
[ 8601.747653] block nbd0: Connection timed out
[...]
[ 8601.750419] block nbd0: Connection timed out
[ 8601.750831] print_req_error: 121 callbacks suppressed
[ 8601.750832] blk_update_request: I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0
phys_seg 1 prio class 0
[ 8601.751261] buffer_io_error: 182 callbacks suppressed
[ 8601.751262] Buffer I/O error on dev nbd0, logical block 0, async page read
[ 8601.751678] blk_update_request: I/O error, dev nbd0, sector 1 op 0x0:(READ) flags 0x0
phys_seg 1 prio class 0
[...]
[ 8601.760283] ldm_validate_partition_table(): Disk read failed.
[ 8601.760344] Dev nbd0: unable to read RDB block 0
[ 8601.760985] nbd0: unable to read partition table
[ 8601.761282] nbd0: detected capacity change from 0 to 375809638400
[ 8601.761382] ldm_validate_partition_table(): Disk read failed.
[ 8601.761461] Dev nbd0: unable to read RDB block 0
[ 8601.762145] nbd0: unable to read partition table
The rbd-nbd process kept existing and had to be killed
Same thing with qemu-nbd.
Exporting the Image via rbd export worked fine, also a rbd copy.
Any other operation on the Image (feature dis / enable) took forever so I had to abort
it.
It seems that every operation leaves a lock on the image.
Because it was in the middle of the night, I stopped working on it.
Today Morning one of the images was accessible again, the others not.
Anybody any hint?
Some system information below.
Regards,
Yves
ceph version 14.2.5 (3ce7517553bdd5195b68a6ffaf0bd7f3acad1647) nautilus (stable)
Primary Cluster with a backup cluster (rbd mirror)
[global]
auth client required = none
auth cluster required = none
auth service required = none
auth supported = none
cephx_sign_messages = false
cephx require signatures = False
cluster_network = 172.16.230.0/24
debug asok = 0/0
debug auth = 0/0
debug bdev = 0/0
debug bluefs = 0/0
debug bluestore = 0/0
debug buffer = 0/0
debug civetweb = 0/0
debug client = 0/0
debug compressor = 0/0
debug context = 0/0
debug crush = 0/0
debug crypto = 0/0
debug dpdk = 0/0
debug eventtrace = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug fuse = 0/0
debug heartbeatmap = 0/0
debug javaclient = 0/0
debug journal = 0/0
debug journaler = 0/0
debug kinetic = 0/0
debug kstore = 0/0
debug leveldb = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug mds log expire = 0/0
debug mds migrator = 0/0
debug memdb = 0/0
debug mgr = 0/0
debug mgrc = 0/0
debug mon = 0/0
debug monc = 0/00
debug ms = 0/0
debug none = 0/0
debug objclass = 0/0
debug objectcacher = 0/0
debug objecter = 0/0
debug optracker = 0/0
debug osd = 0/0
debug paxos = 0/0
debug perfcounter = 0/0
debug rados = 0/0
debug rbd = 0/0
debug rbd mirror = 0/0
debug rbd replay = 0/0
debug refs = 0/0
debug reserver = 0/0
debug rgw = 0/0
debug rocksdb = 0/0
debug striper = 0/0
debug throttle = 0/0
debug timer = 0/0
debug tp = 0/0
debug xio = 0/0
fsid = 27fdf1bb-22a1-4d5e-9729-780cbdcd33fe
mon_allow_pool_delete = true
mon_host = 172.16.230.142 172.16.230.144 172.16.230.146
mon_osd_down_out_subtree_limit = host
osd_backfill_scan_max = 16
osd_backfill_scan_min = 4
osd_deep_scrub_interval = 1209600
osd_journal_size = 5120
osd_max_backfills = 1
osd_max_trimming_pgs = 1
osd_pg_max_concurrent_snap_trims = 1
osd_pool_default_min_size = 2
osd_pool_default_size = 3
osd_recovery_max_active = 1
osd_recovery_max_single_start = 1
osd_recovery_op_priority = 1
osd_recovery_threads = 1
osd_scrub_begin_hour = 19
osd_scrub_chunk_max = 1
osd_scrub_chunk_min = 1
osd_scrub_during_recovery = false
osd_scrub_end_hour = 6
osd_scrub_priority = 1
osd_scrub_sleep = 0.5
osd_snap_trim_priority = 1
osd_snap_trim_sleep = 0.005
osd_srub_max_interval = 1209600
public_network = 172.16.230.0/24
max open files = 131072
osd objectstore = bluestore
osd op threads = 2
osd crush update on start = true
Currently inaccessible image:
rbd image 'vm-29009-disk-2':
size 200 GiB in 51200 objects
order 22 (4 MiB objects)
snapshot_count: 2
id: 1abd04da8b9a4d
block_name_prefix: rbd_data.1abd04da8b9a4d
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten,
journaling
op_features:
flags:
create_timestamp: Tue Jul 9 13:07:36 2019
access_timestamp: Thu Dec 19 01:35:34 2019
modify_timestamp: Thu Dec 19 00:19:32 2019
journal: 1abd04da8b9a4d
mirroring state: enabled
mirroring global id: c71ec81f-18be-4d0b-93ed-0cebe3e619bb
mirroring primary: true
Show replies by date
addendum:
If I try to purge snaps the following happens:
rbd snap purge rbd_hdd_1.8tb_01_3t/vm-29009-disk-2
Removing all snapshots: 50% complete...failed.
rbd: removing snaps failed: (2) No such file or directory
Despite the output a rbd ls -l don't show any snapshots any longer.
After this the image is accessible again!
I don't have a lot of experience with rbd-nbd but i suppose it works same with rbd.
We use xen as hypervisor and sometimes when there is a crash, we need to remove the locks
on the volumes when remapping them as these are dead locks.
Now removing the locks will sometimes put blacklist on these addresses with client id and
then you just need to remove the blacklists. To check the blacklists do "ceph osd
dump |grep blacklist"
Or put it in a script as well:
for i in `ceph osd dump |grep blacklist |awk '{print $2}'` ;do ceph osd blacklist
rm $i;done
This article helped me a while ago and understood the dead locks on rbd volumes.
https://der-jd.de/blog/2018/12/27/openstack-ceph-luminous-upgrade.html