I've been struggling with this one for a few days now. We had an OSD report as near full a few days ago. Had this happen a couple of times before and a reweight-by-utilization has sorted it out in the past. Tried the same again but this time we ended up with a couple of pgs in a state of backfill_toofull and a handful of misplaced objects as a result.
Tried doing the reweight a few more times and it's been moving data around. We did have another osd trigger the near full alert but running the reweight a couple more times seems to have moved some of that data around a bit better. However, the original near_full osd doesn't seem to have changed much and the backfill_toofull pgs are still there. I'd keep doing the reweight-by-utilization but I'm not sure if I'm heading down the right path and if it will eventually sort it out.
We have 14 pools, but the vast majority of data resides in just one of those pools (pool 20). The pgs in the backfill state are in pool 2 (as far as I can tell). That particular pool is used for some cephfs stuff and has a handful of large files in there (not sure if this is significant to the problem).
All up, our utilization is showing as 55.13% but some of our OSDs are showing as 76% in use with this one problem sitting at 85.02%. Right now, I'm just not sure what the proper corrective action is. The last couple of reweights I've run have been a bit more targetted in that I've set it to only function on two OSDs at a time. If I run a test-reweight targetting only one osd, it does say it will reweight OSD 9 (the one at 85.02%). I gather this will move data away from this OSD and potentially get it below the threshold. However, at one point in the past couple of days, it's shown as no OSDs in a near full state, yet the two pgs in backfill_toofull didn't change. So, that's why I'm not sure continually reweighting is going to solve this issue.
I'm a long way from knowledgable on Ceph so I'm not really sure what information is useful here. Here's a bit of info on what I'm seeing. Can provide anything else that might help.
Basically, we have a three node cluster but only two have OSDs. The third is there simply to enable a quorum to be established. The OSDs are evenly spread across these two needs and the configuration of each is identical. We are running Jewel and are not in a position to upgrade at this stage.
# ceph --version
ceph version 10.2.11 (e4b061b47f07f583c92a050d9e84b1813a35671e)
# ceph health detail
HEALTH_WARN 2 pgs backfill_toofull; 2 pgs stuck unclean; recovery 33/62099566 objects misplaced (0.000%); 1 near full osd(s)
pg 2.52 is stuck unclean for 201822.031280, current state active+remapped+backfill_toofull, last acting [17,3]
pg 2.18 is stuck unclean for 202114.617682, current state active+remapped+backfill_toofull, last acting [18,2]
pg 2.18 is active+remapped+backfill_toofull, acting [18,2]
pg 2.52 is active+remapped+backfill_toofull, acting [17,3]
recovery 33/62099566 objects misplaced (0.000%)
osd.9 is near full at 85%
# ceph osd df
ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS
2 1.37790 1.00000 1410G 842G 496G 59.75 1.08 33
3 1.37790 0.45013 1410G 1079G 259G 76.49 1.39 21
4 1.37790 0.95001 1410G 1086G 253G 76.98 1.40 44
5 1.37790 1.00000 1410G 617G 722G 43.74 0.79 43
6 1.37790 0.65009 1410G 616G 722G 43.69 0.79 39
7 1.37790 0.95001 1410G 495G 844G 35.10 0.64 40
8 1.37790 1.00000 1410G 732G 606G 51.93 0.94 52
9 1.37790 0.70007 1410G 1199G 139G 85.02 1.54 37
10 1.37790 1.00000 1410G 611G 727G 43.35 0.79 41
11 1.37790 0.75006 1410G 495G 843G 35.11 0.64 32
0 1.37790 1.00000 1410G 731G 608G 51.82 0.94 43
12 1.37790 1.00000 1410G 851G 487G 60.36 1.09 44
13 1.37790 1.00000 1410G 378G 960G 26.82 0.49 38
14 1.37790 1.00000 1410G 969G 370G 68.68 1.25 37
15 1.37790 1.00000 1410G 724G 614G 51.35 0.93 35
16 1.37790 1.00000 1410G 491G 847G 34.84 0.63 43
17 1.37790 1.00000 1410G 862G 476G 61.16 1.11 50
18 1.37790 0.80005 1410G 1083G 255G 76.78 1.39 26
19 1.37790 0.65009 1410G 963G 375G 68.29 1.24 23
20 1.37790 1.00000 1410G 724G 614G 51.38 0.93 42
TOTAL 28219G 15557G 11227G 55.13
MIN/MAX VAR: 0.49/1.54 STDDEV: 15.57
# ceph pg ls backfill_toofull
pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
2.18 9 0 0 18 0 0 3653 3653 active+remapped+backfill_toofull 2020-10-29 05:31:20.429912 610'549153 656:390372 [9,12] 9 [18,2] 18 594'547482 2020-10-25 20:28:39.680744 594'543841 2020-10-21 21:21:33.092868
2.52 15 0 0 15 0 0 4883 4883 active+remapped+backfill_toofull 2020-10-29 05:31:28.277898 652'502085 656:367288 [17,9] 17 [17,3] 17 594'499108 2020-10-26 11:06:48.417825 594'499108 2020-10-26 11:06:48.417825
pool : 17 18 19 11 20 21 12 13 0 14 1 15 2 16 | SUM
--------------------------------------------------------------------------------------------------------------------------------
osd.4 3 0 0 0 9 2 0 0 12 1 9 0 7 1 | 44
osd.17 1 0 0 0 7 3 1 0 8 1 17 1 11 0 | 50
osd.18 0 0 0 0 9 0 0 0 4 0 7 0 5 0 | 25
osd.5 0 0 0 2 5 1 1 0 5 0 16 0 11 2 | 43
osd.6 0 1 0 1 5 2 0 0 9 0 13 1 7 0 | 39
osd.19 0 0 1 0 8 2 0 1 2 0 6 0 3 0 | 23
osd.7 0 0 0 0 4 1 1 0 3 0 12 0 19 0 | 40
osd.8 0 1 0 0 6 3 0 2 10 1 13 1 15 0 | 52
osd.9 1 0 2 0 10 2 0 0 4 1 6 1 10 0 | 37
osd.10 0 0 1 1 5 2 0 1 7 0 12 0 11 1 | 41
osd.20 1 0 0 0 6 1 0 1 7 0 8 1 17 0 | 42
osd.11 0 0 0 0 4 1 1 1 5 0 11 0 9 0 | 32
osd.12 0 0 1 1 7 1 0 0 5 1 12 1 14 1 | 44
osd.13 0 2 0 0 3 1 0 0 10 1 11 0 10 0 | 38
osd.0 0 1 0 1 6 3 0 1 7 0 11 0 13 0 | 43
osd.14 1 0 0 0 8 1 1 0 4 1 12 0 9 0 | 37
osd.15 1 0 2 1 6 1 1 0 8 0 7 0 6 2 | 35
osd.2 0 2 1 0 7 2 1 0 7 1 4 1 6 0 | 32
osd.3 0 0 0 0 9 0 0 0 2 0 4 0 5 0 | 20
osd.16 0 1 0 1 4 3 1 1 9 0 9 1 12 1 | 43
--------------------------------------------------------------------------------------------------------------------------------
SUM : 8 8 8 8 128 32 8 8 128 8 200 8 200 8 |
Hi:
I have this ceph status:
-----------------------------------------------------------------------------
cluster:
id: 039bf268-b5a6-11e9-bbb7-d06726ca4a78
health: HEALTH_WARN
noout flag(s) set
1 osds down
Reduced data availability: 191 pgs inactive, 2 pgs down, 35
pgs incomplete, 290 pgs stale
5 pgs not deep-scrubbed in time
7 pgs not scrubbed in time
327 slow ops, oldest one blocked for 233398 sec, daemons
[osd.12,osd.36,osd.5] have slow ops.
services:
mon: 1 daemons, quorum fond-beagle (age 23h)
mgr: fond-beagle(active, since 7h)
osd: 48 osds: 45 up (since 95s), 46 in (since 8h); 4 remapped pgs
flags noout
data:
pools: 7 pools, 2305 pgs
objects: 350.37k objects, 1.5 TiB
usage: 3.0 TiB used, 38 TiB / 41 TiB avail
pgs: 6.681% pgs unknown
1.605% pgs not active
1835 active+clean
279 stale+active+clean
154 unknown
22 incomplete
10 stale+incomplete
2 down
2 remapped+incomplete
1 stale+remapped+incomplete
--------------------------------------------------------------------------------------------
How can i fix all of unknown, incomplete, remmaped+incomplete, etc... i
dont care if i need remove PGs
Hi,
I already submitted a ticket: https://tracker.ceph.com/issues/47951
Maybe other people noticed this as well.
Situation:
- Cluster is running IPv6
- mon_host is set to a DNS entry
- DNS entry is a Round Robin with three AAAA-records
root@wido-standard-benchmark:~# ceph -s
unable to parse addrs in 'mon.objects.xx.xxx.net'
[errno 22] error connecting to the cluster
root@wido-standard-benchmark:~#
The relevant part of the ceph.conf:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
mon_host = mon.objects.xxx.xxx.xxx
ms_bind_ipv6 = true
This works fine with 14.2.11 and breaks under 14.2.12
Anybody else seeing this as well?
Wido
Dear all,
after breaking my experimental 1-host Ceph cluster and making one its pg 'incomplete' I left it in abandoned state for some time.
Now I decided to bring it back into life and found that it can not start one of its OSDs (osd.1 to name it)
"ceph osd df" shows :
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 hdd 0 1.00000 2.7 TiB 1.6 TiB 1.6 TiB 113 MiB 4.7 GiB 1.1 TiB 59.77 0.69 102 up
1 hdd 2.84549 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down
2 hdd 2.84549 1.00000 2.8 TiB 2.6 TiB 2.5 TiB 57 MiB 3.8 GiB 275 GiB 90.58 1.05 176 up
3 hdd 2.84549 1.00000 2.8 TiB 2.6 TiB 2.5 TiB 57 MiB 3.9 GiB 271 GiB 90.69 1.05 185 up
4 hdd 2.84549 1.00000 2.8 TiB 2.6 TiB 2.5 TiB 63 MiB 4.2 GiB 263 GiB 90.98 1.05 184 up
5 hdd 2.84549 1.00000 2.8 TiB 2.6 TiB 2.5 TiB 52 MiB 3.8 GiB 263 GiB 90.96 1.05 178 up
6 hdd 2.53400 1.00000 2.5 TiB 2.3 TiB 2.3 TiB 173 MiB 5.2 GiB 228 GiB 91.21 1.05 178 up
7 hdd 2.53400 1.00000 2.5 TiB 2.3 TiB 2.3 TiB 147 MiB 5.2 GiB 230 GiB 91.12 1.05 168 up
TOTAL 19 TiB 17 TiB 16 TiB 662 MiB 31 GiB 2.6 TiB 86.48
MIN/MAX VAR: 0.69/1.05 STDDEV: 10.90
"ceph device ls" shows :
DEVICE HOST:DEV DAEMONS LIFE EXPECTANCY
GIGABYTE_GP-ASACNE2100TTTDR_SN191108950380 p10s:nvme0n1 osd.1 osd.2 osd.3 osd.4 osd.5
WDC_WD30EFRX-68N32N0_WD-WCC7K1JJXVST p10s:sdd osd.1
WDC_WD30EFRX-68N32N0_WD-WCC7K1VUYPRA p10s:sda osd.6
WDC_WD30EFRX-68N32N0_WD-WCC7K2CKX8NT p10s:sdb osd.7
WDC_WD30EFRX-68N32N0_WD-WCC7K2UD8H74 p10s:sde osd.2
WDC_WD30EFRX-68N32N0_WD-WCC7K2VFTR1F p10s:sdh osd.5
WDC_WD30EFRX-68N32N0_WD-WCC7K3CYKL87 p10s:sdf osd.3
WDC_WD30EFRX-68N32N0_WD-WCC7K6FPZAJP p10s:sdc osd.0
WDC_WD30EFRX-68N32N0_WD-WCC7K7FXSCRN p10s:sdg osd.4
In my last migration, I created a bluestore volume with external block.db like this :
"ceph-volume lvm prepare --bluestore --data /dev/sdd1 --block.db /dev/nvme0n1p4"
And I can see this metadata by
"ceph-bluestore-tool show-label --dev /dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202" :
{
"/dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202": {
"osd_uuid": "8c6324a3-0364-4fad-9dcb-81a1661ee202",
"size": 3000588304384,
"btime": "2020-07-12T11:34:16.579735+0300",
"description": "main",
"bfm_blocks": "45785344",
"bfm_blocks_per_key": "128",
"bfm_bytes_per_block": "65536",
"bfm_size": "3000588304384",
"bluefs": "1",
"ceph_fsid": "49cdfe90-6f6e-4afe-8558-bf14a13aadfa",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"osd_key": "AQD9ygpf+7+MABAAqtj4y1YYgxwCaAN/jgDSwg==",
"ready": "ready",
"require_osd_release": "14",
"whoami": "1"
}
}
and by
"ceph-bluestore-tool show-label --dev /dev/nvme0n1p4" :
{
"/dev/nvme0n1p4": {
"osd_uuid": "8c6324a3-0364-4fad-9dcb-81a1661ee202",
"size": 128025886720,
"btime": "2020-07-12T11:34:16.592054+0300",
"description": "bluefs db"
}
}
As you see, their osd_uuid is equal.
But when I try to start it by hand : "systemctl restart ceph-osd@1" ,
I get this in the logs : ("journalctl -b -u ceph-osd@1")
-- Logs begin at Tue 2020-10-13 19:09:49 EEST, end at Fri 2020-10-23 16:59:38 EEST. --
жов 23 16:59:36 p10s systemd[1]: Starting Ceph object storage daemon osd.1...
жов 23 16:59:36 p10s systemd[1]: Started Ceph object storage daemon osd.1.
жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300 7f513cebedc0 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-1/keyring: (2) No
such file or directory
жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300 7f513cebedc0 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-1/keyring: (2) No
such file or directory
жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300 7f513cebedc0 -1 AuthRegistry(0x560776222940) no keyring found at
/var/lib/ceph/osd/ceph-1/keyring, disabling cephx
жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300 7f513cebedc0 -1 AuthRegistry(0x560776222940) no keyring found at
/var/lib/ceph/osd/ceph-1/keyring, disabling cephx
жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300 7f513cebedc0 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-1/keyring: (2) No
such file or directory
жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300 7f513cebedc0 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-1/keyring: (2) No
such file or directory
жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300 7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at
/var/lib/ceph/osd/ceph-1/keyring, disabling cephx
жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300 7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at
/var/lib/ceph/osd/ceph-1/keyring, disabling cephx
жов 23 16:59:36 p10s ceph-osd[3987]: failed to fetch mon config (--no-mon-config to skip)
жов 23 16:59:36 p10s systemd[1]: ceph-osd(a)1.service: Main process exited, code=exited, status=1/FAILURE
жов 23 16:59:36 p10s systemd[1]: ceph-osd(a)1.service: Failed with result 'exit-code'.
And so my question is, how to make this OSD known again to Ceph cluster without recreating it anew with ceph-volume ?
I see that every folder under "/var/lib/ceph/osd/" is a tmpfs mount point filled with appropriate files and symlinks, except of "/var/lib/ceph/osd/ceph-1",
which is just an empty folder not mounted anywhere.
I tried to run
"ceph-bluestore-tool prime-osd-dir --dev /dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202 --path
/var/lib/ceph/osd/ceph-1"
it created some files under /var/lib/ceph/osd/ceph-1 but without tmpfs mount, and these files belonged to root. I changed owner of these files into ceph.ceph ,
I created appropriate symlinks for block and block.db but ceph-osd@1 did not want to start either. Only "unable to find keyring" messages disappeared.
Please give any help on where to move next.
Thanks in advance for your help.
Dear cephers,
I have a somewhat strange situation. I have the health warning:
# ceph health detail
HEALTH_WARN 3 clients failing to respond to capability release
MDS_CLIENT_LATE_RELEASE 3 clients failing to respond to capability release
mdsceph-12(mds.0): Client sn106.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 30716617
mdsceph-12(mds.0): Client sn269.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 30717358
mdsceph-12(mds.0): Client sn009.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 30749150
However, these clients are not busy right now. Also, they hold almost nothing; see snippets from "session ls" below. It is possible that a very IO intensive application was running on these nodes and these release requests got stuck. How do I resolve this issue? Can I just evict the client?
Version is mimic 13.2.8. Note that we execute a drop cache command after a job finishes on these clients. Its possible that the clients dropped the caps already before the MDS request was handled/received.
Best regards,
Frank
{
"id": 30717358,
"num_leases": 0,
"num_caps": 44,
"state": "open",
"request_load_avg": 0,
"uptime": 6632206.332307,
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.30717358 192.168.57.140:0/3212676185",
"client_metadata": {
"features": "00000000000000ff",
"entity_id": "con-fs2-hpc",
"hostname": "sn269.hpc.ait.dtu.dk",
"kernel_version": "3.10.0-957.12.2.el7.x86_64",
"root": "/hpc/home"
}
},
--
{
"id": 30716617,
"num_leases": 0,
"num_caps": 48,
"state": "open",
"request_load_avg": 1,
"uptime": 6632206.336307,
"replay_requests": 0,
"completed_requests": 1,
"reconnecting": false,
"inst": "client.30716617 192.168.56.233:0/2770977433",
"client_metadata": {
"features": "00000000000000ff",
"entity_id": "con-fs2-hpc",
"hostname": "sn106.hpc.ait.dtu.dk",
"kernel_version": "3.10.0-957.12.2.el7.x86_64",
"root": "/hpc/home"
}
},
--
{
"id": 30749150,
"num_leases": 0,
"num_caps": 44,
"state": "open",
"request_load_avg": 0,
"uptime": 6632206.338307,
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.30749150 192.168.56.136:0/2578719015",
"client_metadata": {
"features": "00000000000000ff",
"entity_id": "con-fs2-hpc",
"hostname": "sn009.hpc.ait.dtu.dk",
"kernel_version": "3.10.0-957.12.2.el7.x86_64",
"root": "/hpc/home"
}
},
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hi,
We're running Octopus and we've 3 control plane nodes (12 core, 64 GB memory each) that are running mon, mds and mgr and also 4 data nodes (12 core, 256 GB memory, 13x10TB HDDs each). We've increased number of PGs inside our pool, which resulted in all OSDs going crazy and reading the average of 900 M/s constantly (based on iotop).
This has resulted in slow ops and very low recovery speed. Any tips on how to handle this kind of situation? We've osd_recovery_sleep_hdd set to 0.2, osd_recovery_max_active set to 5 and osd_max_backfills set to 4. Some OSDs are reporting slow ops constantly and iowait on machines is at 70-80% constantly.
Hi:
I tried get info from a RBD image but:
-------------------------------------------------------------------------
root@fond-beagle:/# rbd list --pool cinder-ceph | grep
volume-dfcca6c8-cb96-4b79-bc85-b200a061dcda
> volume-dfcca6c8-cb96-4b79-bc85-b200a061dcda
root@fond-beagle:/# rbd info --pool cinder-ceph
volume-dfcca6c8-cb96-4b79-bc85-b200a061dcda
> rbd: error opening image volume-dfcca6c8-cb96-4b79-bc85-b200a061dcda:
> (2) No such file or directory
----------------------------------------------------------------------
THis is that the metadata show the image but the content was removed?
(sending this email again as the first time was blocked because my attached
log file was too big)
Hi all,
*Context*: I'm running Ceph Octopus 15.2.5 (the latest as of this email)
using Rook on a toy Kubernetes cluster of two nodes. I've got a single Ceph
mon node running perfectly with 3 OSDs . There are two pools running which
were created as part of a CephFS install.
*Problem*: when I try to add my 4th OSD, the Ceph mon starts crashing on
the OSDMonitor::build_incremental function. I've checked on the mailing
lists and just in general and the last instance of this issue seems to have
been 7 years ago so I'm probably not hitting the same thing!
*Question*: I was wondering if anyone had ideas on what I might be doing
wrong? I'm very new to Ceph so my suspicion is that it's something to do
with my configuration but given I'm literally just adding an OSD and
everything is fine otherwise, I'm not sure what my mistake might be.
Please find the bug I filed on the Ceph tracker here
<https://tracker.ceph.com/issues/48026> where I've provided a mon log file
with log level 20.
Kind regards,
Lalit Maganti