April 2021 - ceph-users - lists.ceph.io

BADAUTHORIZER in Nautilus, unknown PGs, slow peering, very slow client I/O

by Nico Schottelius

Good morning, I've look somewhat intensively through the list and it seems we are rather hard hit by this. Originally yesterday started on a mixed 14.2.9 and 14.2.16 cluster (osds, mons were all 14.2.16). We started phasing in 7 new osds, 6 of them throttled by reweighting to 0.1. Symptoms are many unknown PGs, very long stuck in peering (hours), slow activating and the infamous BADAUTHORIZER message. Client I/O is almost 0, not only of the pool with new OSDs, but also of other pools. We tried restarting all OSDs one by one, which seemed to clear the unknown PGs, however after around 1h they came back to unknown state. Tuning the --osd-max-backfills=.. and --osd-recovery-max-active=... to 1 does not improve the situation, so we are currently running at =7 for both of them. An excerpt from one of the osds which heavily logs this issue is below. Any pointer on how this problem was solved on nautilus before is much appreciated, as the issue started late last night. Best regards, Nico 2021-04-13 07:38:37.275 7f159a469700 4 rocksdb: (Original Log Time 2021/04/13-07:38:37.275255) EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "compaction_finished", "compaction_time_micros": 5288403, "compaction_time_cpu_micros": 2940370, "output_level": 2, "num_output_files": 6, "total_output_size": 370673432, "num_input_records": 983055, "num_output_records": 641964, "num_subcompactions": 1, "output_compression": "NoCompression", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [0, 5, 35, 0, 0, 0, 0]} 2021-04-13 07:38:37.275 7f159a469700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "table_file_deletion", "file_number": 421257} 2021-04-13 07:38:37.275 7f159a469700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "table_file_deletion", "file_number": 401289} 2021-04-13 07:38:37.275 7f159a469700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "table_file_deletion", "file_number": 401288} 2021-04-13 07:38:37.275 7f159a469700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "table_file_deletion", "file_number": 401278} 2021-04-13 07:38:37.275 7f159a469700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "table_file_deletion", "file_number": 401277} 2021-04-13 07:38:37.275 7f159a469700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "table_file_deletion", "file_number": 401258} 2021-04-13 07:38:37.275 7f159a469700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "table_file_deletion", "file_number": 401257} 2021-04-13 07:38:37.275 7f159a469700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 5, "event": "table_file_deletion", "file_number": 401256} 2021-04-13 07:38:37.275 7f159a469700 4 rocksdb: [db/compaction_job.cc:1645] [default] [JOB 6] Compacting 1@1 + 5@2 files to L2, score 1.17 2021-04-13 07:38:37.275 7f159a469700 4 rocksdb: [db/compaction_job.cc:1649] [default] Compaction start summary: Base version 5 Base level 1, inputs: [421255(65MB)], [401318(65MB) 401319(65MB) 401320(65MB) 401321(65MB) 401322(65MB)] 2021-04-13 07:38:37.275 7f159a469700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292317275255, "job": 6, "event": "compaction_started", "compaction_reason": "LevelMaxLevelSize", "files_L1": [421255], "files_L2": [401318, 401319, 401320, 401321, 401322], "score": 1.17224, "input_data_size": 413630461} 2021-04-13 07:38:37.307 7f15a0704700 0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6862/8751 conn(0x55d520fbcd80 0x55d520ffa000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER 2021-04-13 07:38:37.315 7f15a372c700 0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:5060]:6835/13651 conn(0x55d521b1bf80 0x55d521b19800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER 2021-04-13 07:38:37.707 7f15a0704700 0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6862/8751 conn(0x55d520fbcd80 0x55d520ffa000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER 2021-04-13 07:38:37.715 7f15a372c700 0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:5060]:6835/13651 conn(0x55d521b1bf80 0x55d521b19800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER 2021-04-13 07:38:38.119 7f159a469700 4 rocksdb: [db/compaction_job.cc:1332] [default] [JOB 6] Generated table #421271: 130853 keys, 68923910 bytes 2021-04-13 07:38:38.119 7f159a469700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1618292318119320, "cf_name": "default", "job": 6, "event": "table_file_creation", "file_number": 421271, "file_size": 68923910, "table_properties": {"data_size": 67111841, "index_size": 1483972, "filter_size": 327237, "raw_key_size": 10909111, "raw_average_key_size": 83, "raw_value_size": 62457370, "raw_average_value_size": 477, "num_data_blocks": 16807, "num_entries": 130853, "filter_policy_name": "rocksdb.BuiltinBloomFilter"}} 2021-04-13 07:38:38.331 7f15a0704700 0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febb:68f0]:6884/29245 conn(0x55d521b1a000 0x55d521b16800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER 2021-04-13 07:38:38.331 7f15a0704700 0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febb:68f0]:6819/17174 conn(0x55d521b1c400 0x55d521b38000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER 2021-04-13 07:38:38.507 7f15a0704700 0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6862/8751 conn(0x55d520fbcd80 0x55d520ffa000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER 2021-04-13 07:38:38.515 7f15a372c700 0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:5060]:6835/13651 conn(0x55d521b1bf80 0x55d521b19800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER 2021-04-13 07:38:38.731 7f15a0704700 0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febb:68f0]:6884/29245 conn(0x55d521b1a000 0x55d521b16800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER 2021-04-13 07:38:38.731 7f15a0704700 0 --1- [v2:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6816/5534,v1:[2a0a:e5c0:2:1:21b:21ff:febc:7cb6]:6817/5534] >> v1:[2a0a:e5c0:2:1:21b:21ff:febb:68f0]:6819/17174 conn(0x55d521b1c400 0x55d521b38000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 connect got BADAUTHORIZER -- Sustainable and modern Infrastructures by ungleich.ch

3 years, 1 month

1
2
0 0

Enable Dashboard Active Alerts

by E Taka

Hi, this is documented with many links to other documents, which unfortunately only confused me. In our 6-Node-Ceph-Cluster (Pacific) the Dashboard tells me that I should "provide the URL to the API of Prometheus' Alertmanager". We only use Grafana and Prometheus which are deployed by cephadm. We did not configure anything unusual, with own containers or so. We use just the standard cephadm installation. What the documentation writes about how it "should look like" (https://docs.ceph.com/en/pacific/mgr/dashboard/#enabling-prometheus-alerting), seems to exist in the Docker-Container "prom/alertmanager:v0.20.0" in file /etc/alertmanager/alertmanager.yml […] - name: 'ceph-dashboard' webhook_configs: - url: 'https://ceph01:8443/api/prometheus_receiver' - url: 'https://10.149.12.22:8443/api/prometheus_receiver' […] (10.149.12.22 is the IP address for ceph01) Nevertheless I get the message above from the Dashboard. My questions: What do I have to write in which file or which commands, so that I can access the alerts via the dashboard? Of course this should survive reboots and updates. Thanks. Erich

3 years, 1 month

1
0
0 0

HEALTH_WARN - Recovery Stuck?

by Ml Ml

Hello, i kind of ran out of disk space, so i added another host with osd.37. But it does not seem to move much data on it. (85MB in 2h) Any idea why the recovery process seems to be stuck? Should i fix the 4 backfillfull osds first? (by changing the weight)? root@ceph01:~# ceph -s cluster: id: 5436dd5d-83d4-4dc8-a93b-60ab5db145df health: HEALTH_WARN 4 backfillfull osd(s) 9 nearfull osd(s) Low space hindering backfill (add storage if this doesn't resolve itself): 1 pg backfill_toofull 4 pool(s) backfillfull services: mon: 3 daemons, quorum ceph03,ceph01,ceph02 (age 12d) mgr: ceph03(active, since 4M), standbys: ceph02.jwvivm mds: backup:1 {0=backup.ceph06.hdjehi=up:active} 3 up:standby osd: 53 osds: 53 up (since 2h), 53 in (since 2h); 235 remapped pgs task status: scrub status: mds.backup.ceph06.hdjehi: idle data: pools: 4 pools, 1185 pgs objects: 24.69M objects, 45 TiB usage: 149 TiB used, 42 TiB / 191 TiB avail pgs: 5388809/74059569 objects misplaced (7.276%) 950 active+clean 232 active+remapped+backfill_wait 2 active+remapped+backfilling 1 active+remapped+backfill_wait+backfill_toofull io: recovery: 0 B/s, 171 keys/s, 16 objects/s progress: Rebalancing after osd.37 marked in (2h) [............................] (remaining: 6d) root@ceph01:~# ceph health detail HEALTH_WARN 4 backfillfull osd(s); 9 nearfull osd(s); Low space hindering backfill (add storage if this doesn't resolve itself): 1 pg backfill_toofull; 4 pool(s) backfillfull [WRN] OSD_BACKFILLFULL: 4 backfillfull osd(s) osd.28 is backfill full osd.32 is backfill full osd.66 is backfill full osd.68 is backfill full [WRN] OSD_NEARFULL: 9 nearfull osd(s) osd.11 is near full osd.24 is near full osd.27 is near full osd.39 is near full osd.40 is near full osd.42 is near full osd.43 is near full osd.45 is near full osd.69 is near full [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this doesn't resolve itself): 1 pg backfill_toofull pg 23.295 is active+remapped+backfill_wait+backfill_toofull, acting [8,67,32] [WRN] POOL_BACKFILLFULL: 4 pool(s) backfillfull pool 'backurne-rbd' is backfillfull pool 'device_health_metrics' is backfillfull pool 'cephfs.backup.meta' is backfillfull pool 'cephfs.backup.data' is backfillfull root@ceph01:~# ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME -1 182.59897 - 191 TiB 149 TiB 149 TiB 35 GiB 503 GiB 42 TiB 77.96 1.00 - root default -2 24.62473 - 29 TiB 22 TiB 22 TiB 5.0 GiB 80 GiB 7.1 TiB 75.23 0.96 - host ceph01 0 hdd 2.39999 1.00000 2.7 TiB 2.2 TiB 2.2 TiB 665 MiB 8.0 GiB 480 GiB 82.43 1.06 53 up osd.0 1 hdd 2.29999 1.00000 2.7 TiB 2.1 TiB 2.1 TiB 446 MiB 7.5 GiB 590 GiB 78.44 1.01 49 up osd.1 4 hdd 2.67029 0.91066 2.7 TiB 2.2 TiB 2.2 TiB 484 MiB 7.9 GiB 440 GiB 83.90 1.08 53 up osd.4 8 hdd 2.39999 1.00000 2.7 TiB 2.1 TiB 2.1 TiB 490 MiB 7.9 GiB 533 GiB 80.49 1.03 51 up osd.8 11 hdd 1.71660 1.00000 1.7 TiB 1.5 TiB 1.5 TiB 406 MiB 5.5 GiB 200 GiB 88.60 1.14 36 up osd.11 12 hdd 1.29999 1.00000 2.7 TiB 1.2 TiB 1.2 TiB 366 MiB 4.9 GiB 1.5 TiB 43.89 0.56 28 up osd.12 14 hdd 2.20000 1.00000 2.7 TiB 2.0 TiB 2.0 TiB 418 MiB 7.1 GiB 693 GiB 74.66 0.96 47 up osd.14 18 hdd 2.20000 1.00000 2.7 TiB 2.0 TiB 1.9 TiB 434 MiB 7.3 GiB 737 GiB 73.05 0.94 47 up osd.18 22 hdd 1.00000 1.00000 1.7 TiB 890 GiB 886 GiB 110 MiB 3.6 GiB 868 GiB 50.62 0.65 20 up osd.22 30 hdd 1.50000 1.00000 1.7 TiB 1.4 TiB 1.3 TiB 361 MiB 4.9 GiB 370 GiB 78.93 1.01 32 up osd.30 33 hdd 1.59999 0.97437 1.6 TiB 1.4 TiB 1.4 TiB 397 MiB 5.4 GiB 213 GiB 87.20 1.12 34 up osd.33 64 hdd 3.33789 0.89752 3.3 TiB 2.7 TiB 2.7 TiB 573 MiB 9.9 GiB 647 GiB 81.07 1.04 64 up osd.64 -3 26.79504 - 30 TiB 24 TiB 24 TiB 6.2 GiB 89 GiB 5.4 TiB 81.80 1.05 - host ceph02 2 hdd 1.50000 1.00000 1.7 TiB 1.4 TiB 1.4 TiB 363 MiB 5.3 GiB 359 GiB 79.58 1.02 32 up osd.2 3 hdd 2.50000 1.00000 2.7 TiB 2.2 TiB 2.2 TiB 647 MiB 7.8 GiB 469 GiB 82.85 1.06 53 up osd.3 7 hdd 2.00000 1.00000 2.7 TiB 1.8 TiB 1.8 TiB 453 MiB 7.0 GiB 848 GiB 69.00 0.89 43 up osd.7 9 hdd 2.67029 0.98323 2.7 TiB 2.4 TiB 2.3 TiB 709 MiB 8.8 GiB 322 GiB 88.21 1.13 57 up osd.9 13 hdd 1.79999 1.00000 2.4 TiB 1.7 TiB 1.6 TiB 410 MiB 6.5 GiB 747 GiB 69.41 0.89 40 up osd.13 16 hdd 2.50000 1.00000 2.7 TiB 2.2 TiB 2.2 TiB 637 MiB 7.8 GiB 458 GiB 83.26 1.07 53 up osd.16 19 hdd 1.39999 1.00000 1.7 TiB 1.3 TiB 1.3 TiB 345 MiB 5.1 GiB 465 GiB 73.53 0.94 30 up osd.19 23 hdd 2.00000 1.00000 2.7 TiB 1.9 TiB 1.9 TiB 442 MiB 7.7 GiB 738 GiB 73.02 0.94 43 up osd.23 24 hdd 1.71660 0.95634 1.7 TiB 1.5 TiB 1.5 TiB 426 MiB 5.8 GiB 187 GiB 89.37 1.15 36 up osd.24 28 hdd 2.70000 1.00000 2.7 TiB 2.5 TiB 2.4 TiB 712 MiB 8.4 GiB 219 GiB 92.00 1.18 58 up osd.28 31 hdd 2.67029 0.92993 2.7 TiB 2.3 TiB 2.3 TiB 465 MiB 8.1 GiB 393 GiB 85.62 1.10 54 up osd.31 32 hdd 3.33789 1.00000 3.3 TiB 3.0 TiB 3.0 TiB 693 MiB 11 GiB 306 GiB 91.06 1.17 71 up osd.32 -4 24.52005 - 26 TiB 21 TiB 21 TiB 5.0 GiB 79 GiB 5.1 TiB 80.51 1.03 - host ceph03 5 hdd 1.71660 1.00000 1.7 TiB 1.5 TiB 1.5 TiB 392 MiB 5.6 GiB 223 GiB 87.34 1.12 35 up osd.5 6 hdd 1.71660 1.00000 1.7 TiB 1.5 TiB 1.5 TiB 397 MiB 5.6 GiB 221 GiB 87.41 1.12 35 up osd.6 10 hdd 2.50000 0.97487 2.7 TiB 2.2 TiB 2.2 TiB 497 MiB 7.7 GiB 480 GiB 82.46 1.06 52 up osd.10 15 hdd 2.29999 1.00000 2.7 TiB 2.1 TiB 2.1 TiB 474 MiB 7.6 GiB 586 GiB 78.57 1.01 49 up osd.15 17 hdd 1.39999 1.00000 1.6 TiB 1.2 TiB 1.2 TiB 352 MiB 5.6 GiB 384 GiB 76.88 0.99 30 up osd.17 20 hdd 1.59999 1.00000 1.7 TiB 1.4 TiB 1.4 TiB 234 MiB 5.4 GiB 331 GiB 81.15 1.04 33 up osd.20 21 hdd 2.00000 1.00000 2.7 TiB 1.8 TiB 1.8 TiB 611 MiB 7.0 GiB 868 GiB 68.27 0.88 44 up osd.21 25 hdd 1.70000 0.92348 1.7 TiB 1.4 TiB 1.4 TiB 407 MiB 5.6 GiB 274 GiB 84.41 1.08 35 up osd.25 26 hdd 2.50000 1.00000 2.7 TiB 2.2 TiB 2.2 TiB 464 MiB 7.8 GiB 441 GiB 83.88 1.08 52 up osd.26 27 hdd 2.70000 0.95955 2.7 TiB 2.4 TiB 2.4 TiB 674 MiB 8.3 GiB 318 GiB 88.35 1.13 57 up osd.27 29 hdd 2.67029 0.73337 2.7 TiB 1.8 TiB 1.8 TiB 436 MiB 6.7 GiB 885 GiB 67.63 0.87 43 up osd.29 63 hdd 1.71660 1.00000 1.7 TiB 1.5 TiB 1.5 TiB 226 MiB 5.7 GiB 224 GiB 87.26 1.12 35 up osd.63 -11 24.64297 - 25 TiB 21 TiB 21 TiB 4.9 GiB 66 GiB 3.4 TiB 86.48 1.11 - host ceph04 34 hdd 5.24519 0.85004 5.2 TiB 4.0 TiB 4.0 TiB 1002 MiB 13 GiB 1.2 TiB 76.37 0.98 97 up osd.34 42 hdd 5.24519 1.00000 5.2 TiB 4.7 TiB 4.7 TiB 1.1 GiB 15 GiB 545 GiB 89.86 1.15 113 up osd.42 44 hdd 7.00000 1.00000 7.2 TiB 6.3 TiB 6.3 TiB 1.4 GiB 19 GiB 901 GiB 87.70 1.12 150 up osd.44 45 hdd 7.15259 1.00000 7.2 TiB 6.5 TiB 6.4 TiB 1.5 GiB 19 GiB 718 GiB 90.20 1.16 154 up osd.45 -13 30.04085 - 30 TiB 26 TiB 26 TiB 5.8 GiB 81 GiB 4.2 TiB 86.11 1.10 - host ceph05 39 hdd 7.15259 1.00000 7.2 TiB 6.4 TiB 6.4 TiB 1.5 GiB 19 GiB 751 GiB 89.74 1.15 153 up osd.39 40 hdd 7.15259 1.00000 7.2 TiB 6.4 TiB 6.4 TiB 1.3 GiB 19 GiB 767 GiB 89.53 1.15 153 up osd.40 41 hdd 7.15259 0.90002 7.2 TiB 5.8 TiB 5.7 TiB 1.2 GiB 18 GiB 1.4 TiB 80.54 1.03 138 up osd.41 43 hdd 5.24519 1.00000 5.2 TiB 4.7 TiB 4.7 TiB 1.1 GiB 15 GiB 574 GiB 89.32 1.15 113 up osd.43 60 hdd 3.33789 0.85780 3.3 TiB 2.6 TiB 2.6 TiB 685 MiB 8.9 GiB 754 GiB 77.93 1.00 62 up osd.60 -9 17.64297 - 18 TiB 13 TiB 13 TiB 3.0 GiB 43 GiB 4.4 TiB 74.85 0.96 - host ceph06 35 hdd 7.15259 0.80005 7.2 TiB 5.2 TiB 5.2 TiB 1.0 GiB 16 GiB 2.0 TiB 72.31 0.93 124 up osd.35 36 hdd 5.24519 0.85004 5.2 TiB 4.0 TiB 4.0 TiB 985 MiB 13 GiB 1.2 TiB 76.65 0.98 97 up osd.36 38 hdd 5.24519 0.85004 5.2 TiB 4.0 TiB 4.0 TiB 1.0 GiB 13 GiB 1.2 TiB 76.50 0.98 97 up osd.38 -15 24.79565 - 25 TiB 22 TiB 22 TiB 4.7 GiB 66 GiB 3.1 TiB 87.64 1.12 - host ceph07 66 hdd 7.15259 1.00000 7.2 TiB 6.5 TiB 6.5 TiB 1.5 GiB 19 GiB 670 GiB 90.86 1.17 155 up osd.66 67 hdd 7.15259 0.91141 7.2 TiB 5.8 TiB 5.8 TiB 1.1 GiB 18 GiB 1.3 TiB 81.62 1.05 140 up osd.67 68 hdd 3.33789 1.00000 3.3 TiB 3.0 TiB 3.0 TiB 738 MiB 9.8 GiB 299 GiB 91.24 1.17 71 up osd.68 69 hdd 7.15259 1.00000 7.2 TiB 6.3 TiB 6.3 TiB 1.3 GiB 19 GiB 823 GiB 88.77 1.14 152 up osd.69 -17 9.53670 - 9.5 TiB 1.4 GiB 85 MiB 473 MiB 832 MiB 9.5 TiB 0.01 0 - host ceph08 37 hdd 9.53670 1.00000 9.5 TiB 1.4 GiB 85 MiB 473 MiB 832 MiB 9.5 TiB 0.01 0 2 up osd.37 TOTAL 191 TiB 149 TiB 149 TiB 35 GiB 503 GiB 42 TiB 77.96 MIN/MAX VAR: 0/1.18 STDDEV: 14.73

3 years, 1 month

4
3
0 0

cephadm custom mgr modules

by Rob Haverkamp

Hi there, I'm developing a custom ceph-mgr module and have issues deploying this on a cluster deployed with cephadm. With a cluster deployed with ceph-deploy, I can just put my code under /usr/share/ceph/mgr/ and load the module. This works fine. I think I found 2 options to do this with cephadm: 1. build a custom container image: https://docs.ceph.com/en/octopus/cephadm/install/#deploying-custom-containe… 2. use the --shared_ceph_folder during cephadm bootstrap: 'Development mode. Several folders in containers are volumes mapped to different sub-folders in the ceph source folder' The shared folder method is only meant for development. So that is not an option in a production environment. Building a custom container image should be possible, but I don't think I want to go there. Are there more options? It would be nice if it was possible to deploy the managers with a custom service specification that for example mounts a folder from the host system to /usr/share/ceph/mgr/<module> in the container. Thanks! Rob Haverkamp

3 years, 1 month

3
2
0 0

Ceph osd Reweight command in octopus

by Brent Kennedy

We have a ceph octopus cluster running 15.2.6, its indicating a near full osd which I can see is not weighted equally with the rest of the osds. I tried to do the usual "ceph osd reweight osd.0 0.95" to force it down a little bit, but unlike the nautilus clusters, I see no data movement when issuing the command. If I run a ceph osd tree, it shows the reweight setting, but no data movement appears to be occurring. Is there some new thing in ocotopus I am missing? I looked through the release notes for .7, .8 and .9 and didn't see any fixes that jumped out as resolving a bug related to this. The Octopus cluster was deployed using ceph-ansible and upgraded to 15.2.6. I plan to upgrade to 15.2.9 in the coming month. Any thoughts? Regards, -Brent Existing Clusters: Test: Ocotpus 15.2.5 ( all virtual on nvme ) US Production(HDD): Nautilus 14.2.11 with 11 osd servers, 3 mons, 4 gateways, 2 iscsi gateways UK Production(HDD): Nautilus 14.2.11 with 18 osd servers, 3 mons, 4 gateways, 2 iscsi gateways US Production(SSD): Nautilus 14.2.11 with 6 osd servers, 3 mons, 4 gateways, 2 iscsi gateways UK Production(SSD): Octopus 15.2.6 with 5 osd servers, 3 mons, 4 gateways

3 years, 1 month

2
2
0 0

rbd info error opening image

by Marcel Kuiper

I hope someone can help out. I cannot run 'rbd info' on any image. # rbd ls openstack-volumes volume-628efc47-fc57-4630-8661-a13210a4e02c volume-e4fe1e24-fb26-4abc-a458-f936a4e75715 volume-1ce1439d-767b-4b1d-8217-51464a11c5cc volume-0a01d7e3-2c8f-4fab-9f9f-d84bbc7fa3c7 volume-a4aeb848-7283-4cd0-b5e6-ac2fc7f06dac # rbd info openstack-volumes/volume-a4aeb848-7283-4cd0-b5e6-ac2fc7f06dac rbd: error opening image volume-a4aeb848-7283-4cd0-b5e6-ac2fc7f06dac: (2) No such file or directory We're running nautilus 14.2.16 on ubuntu bionic Marcel

3 years, 1 month

2
2
0 0

Nautilus, Ceph-Ansible, existing OSDs, and ceph.conf updates

by Dave Hall

Hello, A while back I asked about the troubles I was having with Ceph-Ansible when I kept existing OSDs in my inventory file when managing my Nautilus cluster. At the time it was suggested that once the OSDs have been configured they should be excluded from the inventory file. However, when processing certain configuration changes Ceph-Ansible updates ceph.conf on all cluster nodes and clients in the inventory file. Is there an alternative way to keep OSD nodes in the inventory file without listing them as OSD nodes, so they get other updates, but also so Ceph-Ansible doesn't try to do any of the ceph-volume stuff that seems to be failing after the OSDs are configured? Or maybe I just have something odd in my inventory file. I'd be glad to share - either in this list or off line. Thanks. -Dave -- Dave Hall Binghamton University kdhall(a)binghamton.edu

3 years, 1 month

2
1
0 0

RGW failed to start after upgrade to pacific

by 胡玮文

Hi all, I upgraded to the pacific version with cephadm. However, all our RGW daemons cannot start anymore. any help is appreciated. Here are the logs when starting RGW. I set debug_rados and debug_rgw to 20/20 systemd[1]: Started Ceph rgw.smil.b7-1.gpu006.twfefs for e88d509a-f6fc-11ea-b25d-a0423f3ac864. bash[9823]: WARNING: Error loading config file: .dockercfg: $HOME is not defined bash[9823]: debug 2021-04-04T13:01:03.731+0000 7ff80f172440 0 deferred set uid:gid to 167:167 (ceph:ceph) bash[9823]: debug 2021-04-04T13:01:03.731+0000 7ff80f172440 0 ceph version 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable), process radosgw, pid 7 bash[9823]: debug 2021-04-04T13:01:03.731+0000 7ff80f172440 0 framework: beast bash[9823]: debug 2021-04-04T13:01:03.731+0000 7ff80f172440 0 framework conf key: port, val: 7480 bash[9823]: debug 2021-04-04T13:01:03.731+0000 7ff80f172440 1 radosgw_Main not setting numa affinity bash[9823]: debug 2021-04-04T13:01:03.735+0000 7ff7f742b700 20 reqs_thread_entry: start bash[9823]: debug 2021-04-04T13:01:03.735+0000 7ff7f6c2a700 10 entry start bash[9823]: debug 2021-04-04T13:01:03.739+0000 7ff80f172440 1 librados: starting msgr at bash[9823]: debug 2021-04-04T13:01:03.739+0000 7ff80f172440 1 librados: starting objecter bash[9823]: debug 2021-04-04T13:01:03.739+0000 7ff80f172440 1 librados: setting wanted keys bash[9823]: debug 2021-04-04T13:01:03.739+0000 7ff80f172440 1 librados: calling monclient init bash[9823]: debug 2021-04-04T13:01:03.743+0000 7ff80f172440 1 librados: init done bash[9823]: debug 2021-04-04T13:01:03.743+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.743+0000 7ff80f172440 10 librados: wait_for_osdmap waiting bash[9823]: debug 2021-04-04T13:01:03.747+0000 7ff80f172440 10 librados: wait_for_osdmap done waiting bash[9823]: debug 2021-04-04T13:01:03.747+0000 7ff80f172440 10 librados: read oid=realms_names.smil nspace= bash[9823]: debug 2021-04-04T13:01:03.747+0000 7ff80f172440 10 librados: Objecter returned from read r=0 bash[9823]: debug 2021-04-04T13:01:03.747+0000 7ff80f172440 20 rados_obj.operate() r=0 bl.length=46 bash[9823]: debug 2021-04-04T13:01:03.747+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.747+0000 7ff80f172440 10 librados: read oid=realms.6c92e1e6-c08b-4f4c-b211-7899893978fa nspace= bash[9823]: debug 2021-04-04T13:01:03.747+0000 7ff80f172440 10 librados: Objecter returned from read r=0 bash[9823]: debug 2021-04-04T13:01:03.747+0000 7ff80f172440 20 rados_obj.operate() r=0 bl.length=104 bash[9823]: debug 2021-04-04T13:01:03.747+0000 7ff80f172440 20 realm smil 6c92e1e6-c08b-4f4c-b211-7899893978fa bash[9823]: debug 2021-04-04T13:01:03.747+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.747+0000 7ff80f172440 10 librados: read oid=realms.6c92e1e6-c08b-4f4c-b211-7899893978fa nspace= bash[9823]: debug 2021-04-04T13:01:03.747+0000 7ff80f172440 10 librados: Objecter returned from read r=0 bash[9823]: debug 2021-04-04T13:01:03.747+0000 7ff80f172440 20 rados_obj.operate() r=0 bl.length=104 bash[9823]: debug 2021-04-04T13:01:03.747+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.747+0000 7ff80f172440 10 librados: read oid=periods.5528a87b-bca7-4936-b348-fcb8f213df20.latest_epoch nspace= bash[9823]: debug 2021-04-04T13:01:03.751+0000 7ff80f172440 10 librados: Objecter returned from read r=0 bash[9823]: debug 2021-04-04T13:01:03.751+0000 7ff80f172440 20 rados_obj.operate() r=0 bl.length=10 bash[9823]: debug 2021-04-04T13:01:03.751+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.751+0000 7ff80f172440 10 librados: read oid=periods.5528a87b-bca7-4936-b348-fcb8f213df20.1 nspace= bash[9823]: debug 2021-04-04T13:01:03.771+0000 7ff80f172440 10 librados: Objecter returned from read r=0 bash[9823]: debug 2021-04-04T13:01:03.771+0000 7ff80f172440 20 rados_obj.operate() r=0 bl.length=877 bash[9823]: debug 2021-04-04T13:01:03.771+0000 7ff80f172440 20 current period 5528a87b-bca7-4936-b348-fcb8f213df20 bash[9823]: debug 2021-04-04T13:01:03.771+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.771+0000 7ff80f172440 10 librados: read oid=converted nspace= bash[9823]: debug 2021-04-04T13:01:03.775+0000 7ff80f172440 10 librados: Objecter returned from read r=-2 bash[9823]: debug 2021-04-04T13:01:03.775+0000 7ff80f172440 20 rados_obj.operate() r=-2 bl.length=0 bash[9823]: debug 2021-04-04T13:01:03.775+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.775+0000 7ff80f172440 10 librados: read oid=realms_names.smil nspace= bash[9823]: debug 2021-04-04T13:01:03.775+0000 7ff80f172440 10 librados: Objecter returned from read r=0 bash[9823]: debug 2021-04-04T13:01:03.775+0000 7ff80f172440 20 rados_obj.operate() r=0 bl.length=46 bash[9823]: debug 2021-04-04T13:01:03.775+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.775+0000 7ff80f172440 10 librados: read oid=realms.6c92e1e6-c08b-4f4c-b211-7899893978fa nspace= bash[9823]: debug 2021-04-04T13:01:03.775+0000 7ff80f172440 10 librados: Objecter returned from read r=0 bash[9823]: debug 2021-04-04T13:01:03.775+0000 7ff80f172440 20 rados_obj.operate() r=0 bl.length=104 bash[9823]: debug 2021-04-04T13:01:03.775+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.775+0000 7ff80f172440 10 librados: read oid=default.region nspace= bash[9823]: debug 2021-04-04T13:01:03.775+0000 7ff80f172440 10 librados: Objecter returned from read r=-2 bash[9823]: debug 2021-04-04T13:01:03.775+0000 7ff80f172440 20 rados_obj.operate() r=-2 bl.length=0 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 RGWRados::pool_iterate: got default.zonegroup.6c92e1e6-c08b-4f4c-b211-7899893978fa bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 RGWRados::pool_iterate: got zonegroup_info.c9c9b732-ec1b-474a-afd6-fe95b8a6ac20 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 RGWRados::pool_iterate: got default.realm bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 RGWRados::pool_iterate: got periods.bd67eb14-7081-49ab-bfe5-718b29257410.1 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 RGWRados::pool_iterate: got zone_info.ceb42b3a-cf4b-493d-94f9-beb7f3a76f99 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 RGWRados::pool_iterate: got realms.6c92e1e6-c08b-4f4c-b211-7899893978fa.control bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 RGWRados::pool_iterate: got periods.5528a87b-bca7-4936-b348-fcb8f213df20.1 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 RGWRados::pool_iterate: got zone_names.b7-1 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 RGWRados::pool_iterate: got periods.6c92e1e6-c08b-4f4c-b211-7899893978fa:staging bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 RGWRados::pool_iterate: got realms.6c92e1e6-c08b-4f4c-b211-7899893978fa bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 RGWRados::pool_iterate: got default.zone.6c92e1e6-c08b-4f4c-b211-7899893978fa bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 RGWRados::pool_iterate: got periods.bd67eb14-7081-49ab-bfe5-718b29257410.latest_epoch bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 RGWRados::pool_iterate: got periods.5528a87b-bca7-4936-b348-fcb8f213df20.latest_epoch bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 RGWRados::pool_iterate: got realms_names.smil bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 RGWRados::pool_iterate: got period_config.6c92e1e6-c08b-4f4c-b211-7899893978fa bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 RGWRados::pool_iterate: got zonegroups_names.default bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 10 librados: read oid=zone_names.default nspace= bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 10 librados: Objecter returned from read r=-2 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 rados_obj.operate() r=-2 bl.length=0 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 10 librados: read oid=zonegroups_names.default nspace= bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 10 librados: Objecter returned from read r=0 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 rados_obj.operate() r=0 bl.length=46 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 10 librados: read oid=zonegroup_info.c9c9b732-ec1b-474a-afd6-fe95b8a6ac20 nspace= bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 10 librados: Objecter returned from read r=0 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 rados_obj.operate() r=0 bl.length=403 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 10 librados: read oid=region_map nspace= bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 10 librados: Objecter returned from read r=-2 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 rados_obj.operate() r=-2 bl.length=0 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 10 librados: read oid=realms_names.smil nspace= bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 10 librados: Objecter returned from read r=0 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 rados_obj.operate() r=0 bl.length=46 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.943+0000 7ff80f172440 10 librados: read oid=realms.6c92e1e6-c08b-4f4c-b211-7899893978fa nspace= bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 10 librados: Objecter returned from read r=0 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 rados_obj.operate() r=0 bl.length=104 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 10 librados: read oid=default.zonegroup.6c92e1e6-c08b-4f4c-b211-7899893978fa nspace= bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 10 librados: Objecter returned from read r=0 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 rados_obj.operate() r=0 bl.length=46 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 10 librados: read oid=zonegroup_info.c9c9b732-ec1b-474a-afd6-fe95b8a6ac20 nspace= bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 10 librados: Objecter returned from read r=0 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 rados_obj.operate() r=0 bl.length=403 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 period zonegroup init ret 0 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 period zonegroup name default bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 using current period zonegroup default bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 10 librados: read oid=zone_names.b7-1 nspace= bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 10 librados: Objecter returned from read r=0 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 rados_obj.operate() r=0 bl.length=46 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 10 librados: read oid=zone_info.ceb42b3a-cf4b-493d-94f9-beb7f3a76f99 nspace= bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 10 librados: Objecter returned from read r=0 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 rados_obj.operate() r=0 bl.length=784 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 10 Cannot find current period zone using local zone bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 10 librados: read oid=zone_info.ceb42b3a-cf4b-493d-94f9-beb7f3a76f99 nspace= bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 10 librados: Objecter returned from read r=0 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 rados_obj.operate() r=0 bl.length=784 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 zone b7-1 found bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 10 librados: call oid=bucket.sync-source-hints. nspace= bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 10 librados: Objecter returned from call r=-2 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 rados_obj.operate() r=-2 bl.length=0 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 20 rados->read ofs=0 len=0 bash[9823]: debug 2021-04-04T13:01:03.947+0000 7ff80f172440 10 librados: call oid=bucket.sync-target-hints. nspace= bash[9823]: debug 2021-04-04T13:01:03.951+0000 7ff80f172440 10 librados: Objecter returned from call r=-2 bash[9823]: debug 2021-04-04T13:01:03.951+0000 7ff80f172440 20 rados_obj.operate() r=-2 bl.length=0 bash[9823]: debug 2021-04-04T13:01:03.951+0000 7ff80f172440 20 started sync module instance, tier type = bash[9823]: debug 2021-04-04T13:01:03.951+0000 7ff80f172440 20 started zone id=ceb42b3a-cf4b-493d-94f9-beb7f3a76f99 (name=b7-1) with tier type = bash[9823]: debug 2021-04-04T13:01:03.951+0000 7ff80f172440 10 librados: create oid=notify.0 nspace= bash[9823]: debug 2021-04-04T13:01:04.215+0000 7ff80f172440 10 librados: Objecter returned from create r=0 bash[9823]: debug 2021-04-04T13:01:04.219+0000 7ff80f172440 10 librados: create oid=notify.1 nspace= bash[9823]: debug 2021-04-04T13:01:04.255+0000 7ff80f172440 10 librados: Objecter returned from create r=0 bash[9823]: debug 2021-04-04T13:01:04.255+0000 7ff80f172440 10 librados: create oid=notify.2 nspace= bash[9823]: debug 2021-04-04T13:01:04.307+0000 7ff80f172440 10 librados: Objecter returned from create r=0 bash[9823]: debug 2021-04-04T13:01:04.307+0000 7ff80f172440 10 librados: create oid=notify.3 nspace= bash[9823]: debug 2021-04-04T13:01:04.367+0000 7ff80f172440 10 librados: Objecter returned from create r=0 bash[9823]: debug 2021-04-04T13:01:04.367+0000 7ff80f172440 10 librados: create oid=notify.4 nspace= bash[9823]: debug 2021-04-04T13:01:04.423+0000 7ff80f172440 10 librados: Objecter returned from create r=0 bash[9823]: debug 2021-04-04T13:01:04.423+0000 7ff80f172440 10 librados: create oid=notify.5 nspace= bash[9823]: debug 2021-04-04T13:01:04.431+0000 7ff80f172440 10 librados: Objecter returned from create r=0 bash[9823]: debug 2021-04-04T13:01:04.431+0000 7ff80f172440 10 librados: create oid=notify.6 nspace= bash[9823]: debug 2021-04-04T13:01:04.495+0000 7ff80f172440 10 librados: Objecter returned from create r=0 bash[9823]: debug 2021-04-04T13:01:04.495+0000 7ff80f172440 10 librados: create oid=notify.7 nspace= bash[9823]: debug 2021-04-04T13:01:04.515+0000 7ff80f172440 10 librados: Objecter returned from create r=0 bash[9823]: debug 2021-04-04T13:01:04.515+0000 7ff80f172440 20 add_watcher() i=0 bash[9823]: debug 2021-04-04T13:01:04.515+0000 7ff80f172440 20 add_watcher() i=1 bash[9823]: debug 2021-04-04T13:01:04.515+0000 7ff80f172440 20 add_watcher() i=2 bash[9823]: debug 2021-04-04T13:01:04.515+0000 7ff80f172440 20 add_watcher() i=3 bash[9823]: debug 2021-04-04T13:01:04.515+0000 7ff80f172440 20 add_watcher() i=4 bash[9823]: debug 2021-04-04T13:01:04.515+0000 7ff80f172440 20 add_watcher() i=5 bash[9823]: debug 2021-04-04T13:01:04.527+0000 7ff80f172440 20 add_watcher() i=6 bash[9823]: debug 2021-04-04T13:01:04.527+0000 7ff80f172440 20 add_watcher() i=7 bash[9823]: debug 2021-04-04T13:01:04.527+0000 7ff80f172440 2 all 8 watchers are set, enabling cache bash[9823]: debug 2021-04-04T13:01:04.527+0000 7ff80f172440 10 librados: call oid=data_log.0 nspace= bash[9823]: debug 2021-04-04T13:01:04.527+0000 7ff80f172440 10 librados: Objecter returned from call r=0 bash[9823]: debug 2021-04-04T13:01:04.527+0000 7ff80f172440 10 librados: call oid=data_log.1 nspace= bash[9823]: debug 2021-04-04T13:01:04.527+0000 7ff80f172440 10 librados: Objecter returned from call r=-2 bash[9823]: debug 2021-04-04T13:01:04.527+0000 7ff80f172440 10 librados: call oid=data_log.2 nspace= bash[9823]: debug 2021-04-04T13:01:04.531+0000 7ff80f172440 10 librados: Objecter returned from call r=-2 bash[9823]: debug 2021-04-04T13:01:04.531+0000 7ff80f172440 10 librados: call oid=data_log.3 nspace= bash[9823]: debug 2021-04-04T13:01:04.531+0000 7ff80f172440 10 librados: Objecter returned from call r=-2 ... bash[9823]: debug 2021-04-04T13:01:04.791+0000 7ff80f172440 10 librados: call oid=data_log.127 nspace= bash[9823]: debug 2021-04-04T13:01:04.791+0000 7ff80f172440 10 librados: Objecter returned from call r=-2 bash[9823]: debug 2021-04-04T13:01:04.791+0000 7ff80f172440 10 librados: call oid=data_log.0 nspace= bash[9823]: debug 2021-04-04T13:01:04.791+0000 7ff80f172440 10 librados: Objecter returned from call r=-61 bash[9823]: debug 2021-04-04T13:01:04.791+0000 7ff80f172440 10 librados: call oid=data_log.1 nspace= bash[9823]: debug 2021-04-04T13:01:04.795+0000 7ff80f172440 10 librados: Objecter returned from call r=-2 bash[9823]: debug 2021-04-04T13:01:04.795+0000 7ff80f172440 10 librados: call oid=data_log.2 nspace= bash[9823]: debug 2021-04-04T13:01:04.795+0000 7ff80f172440 10 librados: Objecter returned from call r=-2 bash[9823]: debug 2021-04-04T13:01:04.795+0000 7ff80f172440 10 librados: call oid=data_log.3 nspace= bash[9823]: debug 2021-04-04T13:01:04.795+0000 7ff80f172440 10 librados: Objecter returned from call r=-2 ... bash[9823]: debug 2021-04-04T13:01:04.959+0000 7ff80f172440 10 librados: call oid=data_log.127 nspace= bash[9823]: debug 2021-04-04T13:01:04.959+0000 7ff80f172440 10 librados: Objecter returned from call r=-2 bash[9823]: debug 2021-04-04T13:01:04.959+0000 7ff80f172440 10 librados: call oid=data_log.0 nspace= bash[9823]: debug 2021-04-04T13:01:04.995+0000 7ff80f172440 10 librados: Objecter returned from call r=-5 bash[9823]: debug 2021-04-04T13:01:04.995+0000 7ff80f172440 -1 static int rgw::cls::fifo::FIFO::create(librados::v14_2_0::IoCtx, std::__cxx11::string, std::unique_ptr<rgw::cls::fifo::FIFO>*, optional_yield, std::optional<rados::cls::fifo::objv>, std::optional<std::basic_string_view<char> >, bool, uint64_t, uint64_t):925 create_meta failed: r=-5 bash[9823]: debug 2021-04-04T13:01:04.995+0000 7ff80f172440 -1 int RGWDataChangesLog::start(const RGWZone*, const RGWZoneParams&, RGWSI_Cls*, librados::v14_2_0::Rados*): Error when starting backend: Input/output error bash[9823]: debug 2021-04-04T13:01:04.995+0000 7ff80f172440 0 ERROR: failed to start datalog_rados service ((5) Input/output error bash[9823]: debug 2021-04-04T13:01:04.995+0000 7ff80f172440 0 ERROR: failed to init services (ret=(5) Input/output error) bash[9823]: debug 2021-04-04T13:01:05.107+0000 7ff80f172440 10 librados: watch_flush enter bash[9823]: debug 2021-04-04T13:01:05.107+0000 7ff80f172440 10 librados: watch_flush exit bash[9823]: debug 2021-04-04T13:01:05.107+0000 7ff80f172440 20 remove_watcher() i=0 bash[9823]: debug 2021-04-04T13:01:05.107+0000 7ff80f172440 2 removed watcher, disabling cache bash[9823]: debug 2021-04-04T13:01:05.131+0000 7ff80f172440 10 librados: watch_flush enter bash[9823]: debug 2021-04-04T13:01:05.131+0000 7ff80f172440 10 librados: watch_flush exit bash[9823]: debug 2021-04-04T13:01:05.131+0000 7ff80f172440 20 remove_watcher() i=1 bash[9823]: debug 2021-04-04T13:01:05.159+0000 7ff80f172440 10 librados: watch_flush enter bash[9823]: debug 2021-04-04T13:01:05.159+0000 7ff80f172440 10 librados: watch_flush exit ... bash[9823]: debug 2021-04-04T13:01:05.275+0000 7ff80f172440 20 remove_watcher() i=7 bash[9823]: debug 2021-04-04T13:01:05.275+0000 7ff80f172440 10 librados: watch_flush enter bash[9823]: debug 2021-04-04T13:01:05.275+0000 7ff80f172440 10 librados: watch_flush exit bash[9823]: debug 2021-04-04T13:01:05.275+0000 7ff80f172440 1 librados: shutdown bash[9823]: debug 2021-04-04T13:01:05.279+0000 7ff80f172440 -1 Couldn't init storage provider (RADOS) systemd[1]: ceph-e88d509a-f6fc-11ea-b25d-a0423f3ac864(a)rgw.smil.b7-1.gpu006.twfefs.service: Main process exited, code=exited, status=5/NOTINSTALLED systemd[1]: ceph-e88d509a-f6fc-11ea-b25d-a0423f3ac864(a)rgw.smil.b7-1.gpu006.twfefs.service: Failed with result 'exit-code'. systemd[1]: ceph-e88d509a-f6fc-11ea-b25d-a0423f3ac864(a)rgw.smil.b7-1.gpu006.twfefs.service: Service hold-off time over, scheduling restart. systemd[1]: ceph-e88d509a-f6fc-11ea-b25d-a0423f3ac864(a)rgw.smil.b7-1.gpu006.twfefs.service: Scheduled restart job, restart counter is at 4. systemd[1]: Stopped Ceph rgw.smil.b7-1.gpu006.twfefs for e88d509a-f6fc-11ea-b25d-a0423f3ac864. Weiwen Hu

3 years, 1 month

4
4
0 0

Ceph failover claster

by Várkonyi János

Hi All, Use anybody windows file server with ceph storage? Finally I can do the gateways. We've a ceph storage with 3 nodes and we can add this to windows via ceph-iscsi. I'd like to use it with 2 windows 2019 servers in failover cluster. I can connect to the storage each sides. But when I check the MPIO device Details all nods are connected and active, I've not "stand by" node. I'm not for sure it is right or it is a problem. I setup up the deatils from the ceph documment. TimeOutValue = 65; LinkDownTime = 25; SRBTimeoutDelta = 15. I try to validate failover cluster configuration and I get an error: "Failure issuing call to Persistent Reservation REGISTER AND IGNORE EXISTING on Test Disk 0 from node FS102.trafficom.hu when the disk has no existing registration. It is expected to succeed. The device is not ready." Did anybody see this error? jansz0

3 years, 1 month

2
1
0 0

cephadm upgrade to pacific

by Peter Childs

I am attempting to upgrade a Ceph Upgrade cluster that was deployed with Octopus 15.2.8 and upgraded to 15.2.10 successfully. I'm not attempting to upgrade to 16.2.0 Pacific, and it is not going very well. I am using cephadm. It looks to have upgraded the managers and stopped, and not moved on to the monitors or anything else. I've attempted stopping the upgrade and restarting it, with debug on and I'm not seeing anything to say why it is not progressing any further. I've also tried rebooting machines and failing the managers over with no success. I'm currently thinking its stuck attempting to upgrade a manager that does not exist. Its a test cluster of 16 nodes, bit of a proof of concept, so if I've got something terribly wrong I'm happy to look at deploying, (running on top of CentOS 7 but I'm fast heading to using something else) (apart from anything its not really a production ready system yet) Just not sure where cephadm upgrade has crashed in 16.2.0 Thanks in advance Peter

3 years, 1 month

2
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users April 2021