February 2023 - ceph-users

Any experience dealing with CephMgrPrometheusModuleInactive?

by Joshua Katz

Hey all! I'm a first time ceph user trying to learn how to set up a cluster. I've gotten a basic cluster created using the following: ``` cehphadm bootstrap --mon-ip <SERVER_1_IP> ceph orch host add server-2 <SERVER_2_IP> _admin ``` I've created and mounted an fs on a host, everything is going well, but I have noticed that I have an alert triggered: CephMgrPrometheusModuleInactive. It seems this alert is trying to `curl server-2:9283`. To debug if this was a network issue I did `ceph mgr fail` to move the mgr to server-2. After some time I get the same alert with the instance being server-1:9283. Running `ss -l -n -p | grep 9283` shows the port is bound on server-2 and not server-1. If I run `ceph mgr fail` again the port becomes bound on server-1 and not server-2. Is this alert important? Is there a way to remediate this issue? Let me know if I am missing something here. Thanks, - Josh

1 year, 2 months

1
0
0 0

Accessing OSD objects

by Geoffrey Rhodes

Hi Anthony, thanks for reaching out. Erasure data pool (K=4, M=2) but I had more than two disk failures around the same time and the data had not fully replicated / restored elsewhere in the cluster. They are big 12TB Exos so it usually takes a few weeks to backfill / recover plus I had snaptrimming on the go. FYI - The journal's co-located on drive. Kind regards Geoff On Fri, 24 Feb 2023 at 18:30, Anthony D'Atri <aad(a)dreamsnake.net> wrote: > Are you only doing 2 replicas? > > > > > > On Feb 24, 2023, at 08:20, Geoffrey Rhodes <geoffrey(a)rhodes.org.za> wrote: > > This has caused a PG to go inactive > > >

1 year, 2 months

1
0
0 0

Large STDDEV in pg per osd

by Joe Ryner

I have been digging for a while on how to minimize STDDEV of the distribution of data on my OSDs and I can't seem to get it to get below 12. I have other clusters that have a STDDEV of 1 which is my goal. But this cluster is really giving me fits. This cluster started off on the Emperor. Might have even ran Dumpling originally. I'm not sure I have some crud laying around that is preventing better balancing. I have tried the ceph balancer and I even tried jj balancer. As you can see below My lowest osd percent used is 37% and highest is 64% If anyone has any ideas on how to crack this nut I would greatly appreciate it. Thanks, Joe Here is my data: ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 167.63680 root default -3 57.62895 rack marack -4 10.47656 host gallo 0 ssd 3.49219 osd.0 up 1.00000 1.00000 1 ssd 3.49219 osd.1 up 1.00000 1.00000 8 ssd 3.49219 osd.8 up 1.00000 1.00000 -8 7.85712 host jeep 18 ssd 0.87299 osd.18 up 1.00000 1.00000 19 ssd 0.87299 osd.19 up 1.00000 1.00000 20 ssd 0.87299 osd.20 up 1.00000 1.00000 21 ssd 0.87299 osd.21 up 1.00000 1.00000 22 ssd 3.49219 osd.22 up 1.00000 1.00000 45 ssd 0.87299 osd.45 up 1.00000 1.00000 -5 7.85712 host joy 5 ssd 0.87299 osd.5 up 1.00000 1.00000 12 ssd 0.87299 osd.12 up 1.00000 1.00000 13 ssd 0.87299 osd.13 up 1.00000 1.00000 16 ssd 0.87299 osd.16 up 1.00000 1.00000 17 ssd 3.49219 osd.17 up 1.00000 1.00000 43 ssd 0.87299 osd.43 up 1.00000 1.00000 -37 6.98625 host kc01 82 ssd 0.87329 osd.82 up 1.00000 1.00000 83 ssd 0.87329 osd.83 up 1.00000 1.00000 84 ssd 0.87329 osd.84 up 1.00000 1.00000 85 ssd 3.49309 osd.85 up 1.00000 1.00000 86 ssd 0.87329 osd.86 up 1.00000 1.00000 -50 6.98625 host ks02 97 ssd 0.87329 osd.97 up 1.00000 1.00000 98 ssd 0.87329 osd.98 up 1.00000 1.00000 99 ssd 0.87329 osd.99 up 1.00000 1.00000 100 ssd 0.87329 osd.100 up 1.00000 1.00000 101 ssd 3.49309 osd.101 up 1.00000 1.00000 -27 3.49319 host lc02 52 ssd 1.74660 osd.52 up 1.00000 1.00000 73 ssd 1.74660 osd.73 up 1.00000 1.00000 -55 13.97246 host lx01 14 ssd 3.49309 osd.14 up 1.00000 1.00000 27 ssd 3.49309 osd.27 up 1.00000 1.00000 29 ssd 3.49309 osd.29 up 1.00000 1.00000 48 ssd 1.74660 osd.48 up 1.00000 1.00000 49 ssd 1.74660 osd.49 up 1.00000 1.00000 -2 55.00392 rack marack2 -13 13.96875 host helm 11 ssd 3.49219 osd.11 up 1.00000 1.00000 28 ssd 3.49219 osd.28 up 1.00000 1.00000 30 ssd 3.49219 osd.30 up 1.00000 1.00000 31 ssd 3.49219 osd.31 up 1.00000 1.00000 -17 7.85712 host jazz 23 ssd 0.87299 osd.23 up 1.00000 1.00000 24 ssd 0.87299 osd.24 up 1.00000 1.00000 25 ssd 0.87299 osd.25 up 1.00000 1.00000 26 ssd 0.87299 osd.26 up 1.00000 1.00000 36 ssd 3.49219 osd.36 up 1.00000 1.00000 44 ssd 0.87299 osd.44 up 1.00000 1.00000 -6 6.98438 host john 2 ssd 3.49219 osd.2 up 1.00000 1.00000 3 ssd 3.49219 osd.3 up 1.00000 1.00000 -49 7.85712 host jolt 34 ssd 3.49219 osd.34 up 1.00000 1.00000 63 ssd 0.87299 osd.63 up 1.00000 1.00000 64 ssd 0.87299 osd.64 up 1.00000 1.00000 65 ssd 0.87299 osd.65 up 1.00000 1.00000 66 ssd 0.87299 osd.66 up 1.00000 1.00000 67 ssd 0.87299 osd.67 up 1.00000 1.00000 -19 7.85712 host juju 37 ssd 0.87299 osd.37 up 1.00000 1.00000 38 ssd 0.87299 osd.38 up 1.00000 1.00000 39 ssd 0.87299 osd.39 up 1.00000 1.00000 40 ssd 0.87299 osd.40 up 1.00000 1.00000 41 ssd 3.49219 osd.41 up 1.00000 1.00000 46 ssd 0.87299 osd.46 up 1.00000 1.00000 -44 6.98625 host ks01 92 ssd 0.87329 osd.92 up 1.00000 1.00000 93 ssd 0.87329 osd.93 up 1.00000 1.00000 94 ssd 0.87329 osd.94 up 1.00000 1.00000 95 ssd 0.87329 osd.95 up 1.00000 1.00000 96 ssd 3.49309 osd.96 up 1.00000 1.00000 -22 3.49319 host lc01 50 ssd 1.74660 osd.50 up 1.00000 1.00000 51 ssd 1.74660 osd.51 up 1.00000 1.00000 -11 55.00392 rack marack3 -10 13.96875 host gold 7 ssd 3.49219 osd.7 up 1.00000 1.00000 9 ssd 3.49219 osd.9 up 1.00000 1.00000 10 ssd 3.49219 osd.10 up 1.00000 1.00000 15 ssd 3.49219 osd.15 up 1.00000 1.00000 -12 3.49319 host innes 42 ssd 1.74660 osd.42 up 1.00000 1.00000 47 ssd 1.74660 osd.47 up 1.00000 1.00000 -16 6.98438 host jack 4 ssd 3.49219 osd.4 up 1.00000 1.00000 6 ssd 3.49219 osd.6 up 1.00000 1.00000 -43 7.85712 host jam 32 ssd 3.49219 osd.32 up 1.00000 1.00000 53 ssd 0.87299 osd.53 up 1.00000 1.00000 54 ssd 0.87299 osd.54 up 1.00000 1.00000 55 ssd 0.87299 osd.55 up 1.00000 1.00000 56 ssd 0.87299 osd.56 up 1.00000 1.00000 57 ssd 0.87299 osd.57 up 1.00000 1.00000 -46 7.85712 host jet 33 ssd 3.49219 osd.33 up 1.00000 1.00000 58 ssd 0.87299 osd.58 up 1.00000 1.00000 59 ssd 0.87299 osd.59 up 1.00000 1.00000 60 ssd 0.87299 osd.60 up 1.00000 1.00000 61 ssd 0.87299 osd.61 up 1.00000 1.00000 62 ssd 0.87299 osd.62 up 1.00000 1.00000 -52 7.85712 host jug 35 ssd 3.49219 osd.35 up 1.00000 1.00000 68 ssd 0.87299 osd.68 up 1.00000 1.00000 69 ssd 0.87299 osd.69 up 1.00000 1.00000 70 ssd 0.87299 osd.70 up 1.00000 1.00000 71 ssd 0.87299 osd.71 up 1.00000 1.00000 72 ssd 0.87299 osd.72 up 1.00000 1.00000 -41 6.98625 host kc02 87 ssd 0.87329 osd.87 up 1.00000 1.00000 88 ssd 0.87329 osd.88 up 1.00000 1.00000 89 ssd 0.87329 osd.89 up 1.00000 1.00000 90 ssd 3.49309 osd.90 up 1.00000 1.00000 91 ssd 0.87329 osd.91 up 1.00000 1.00000 ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME -1 167.63680 - 168 TiB 80 TiB 79 TiB 331 MiB 282 GiB 88 TiB 47.56 1.00 - root default -3 57.62895 - 58 TiB 26 TiB 26 TiB 106 MiB 96 GiB 31 TiB 45.78 0.96 - rack marack -4 10.47656 - 10 TiB 4.3 TiB 4.3 TiB 30 MiB 14 GiB 6.1 TiB 41.46 0.87 - host gallo 0 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.4 TiB 9.9 MiB 4.8 GiB 2.0 TiB 41.53 0.87 141 up osd.0 1 ssd 3.49219 1.00000 3.5 TiB 1.4 TiB 1.4 TiB 3.7 MiB 4.7 GiB 2.1 TiB 40.42 0.85 136 up osd.1 8 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.5 TiB 16 MiB 4.8 GiB 2.0 TiB 42.44 0.89 132 up osd.8 -8 7.85712 - 7.9 TiB 4.1 TiB 4.1 TiB 14 MiB 14 GiB 3.7 TiB 52.30 1.10 - host jeep 18 ssd 0.87299 1.00000 894 GiB 569 GiB 567 GiB 987 KiB 2.0 GiB 325 GiB 63.65 1.34 35 up osd.18 19 ssd 0.87299 1.00000 894 GiB 552 GiB 550 GiB 991 KiB 1.9 GiB 342 GiB 61.72 1.30 32 up osd.19 20 ssd 0.87299 1.00000 894 GiB 549 GiB 548 GiB 945 KiB 1.8 GiB 345 GiB 61.46 1.29 69 up osd.20 21 ssd 0.87299 1.00000 894 GiB 577 GiB 575 GiB 1.0 MiB 1.8 GiB 317 GiB 64.57 1.36 33 up osd.21 22 ssd 3.49219 1.00000 3.5 TiB 1.4 TiB 1.4 TiB 8.8 MiB 4.8 GiB 2.1 TiB 39.21 0.82 133 up osd.22 45 ssd 0.87299 1.00000 894 GiB 558 GiB 556 GiB 938 KiB 1.9 GiB 336 GiB 62.41 1.31 35 up osd.45 -5 7.85712 - 7.9 TiB 4.2 TiB 4.1 TiB 16 MiB 14 GiB 3.7 TiB 52.85 1.11 - host joy 5 ssd 0.87299 1.00000 894 GiB 569 GiB 567 GiB 968 KiB 1.9 GiB 325 GiB 63.60 1.34 32 up osd.5 12 ssd 0.87299 1.00000 894 GiB 576 GiB 575 GiB 1006 KiB 1.9 GiB 318 GiB 64.49 1.36 33 up osd.12 13 ssd 0.87299 1.00000 894 GiB 578 GiB 576 GiB 1.0 MiB 1.9 GiB 316 GiB 64.65 1.36 33 up osd.13 16 ssd 0.87299 1.00000 894 GiB 569 GiB 567 GiB 1015 KiB 1.9 GiB 325 GiB 63.68 1.34 35 up osd.16 17 ssd 3.49219 1.00000 3.5 TiB 1.4 TiB 1.4 TiB 11 MiB 4.6 GiB 2.1 TiB 39.36 0.83 130 up osd.17 43 ssd 0.87299 1.00000 894 GiB 553 GiB 551 GiB 968 KiB 1.9 GiB 341 GiB 61.81 1.30 33 up osd.43 -37 6.98625 - 7.0 TiB 3.3 TiB 3.3 TiB 9.0 MiB 13 GiB 3.7 TiB 46.81 0.98 - host kc01 82 ssd 0.87329 1.00000 894 GiB 497 GiB 495 GiB 984 KiB 1.8 GiB 397 GiB 55.58 1.17 34 up osd.82 83 ssd 0.87329 1.00000 894 GiB 496 GiB 494 GiB 1.6 MiB 1.9 GiB 399 GiB 55.42 1.17 36 up osd.83 84 ssd 0.87329 1.00000 894 GiB 513 GiB 511 GiB 1.0 MiB 1.9 GiB 381 GiB 57.34 1.21 35 up osd.84 85 ssd 3.49309 1.00000 3.5 TiB 1.3 TiB 1.3 TiB 4.5 MiB 5.7 GiB 2.2 TiB 37.68 0.79 140 up osd.85 86 ssd 0.87329 1.00000 894 GiB 495 GiB 493 GiB 931 KiB 1.9 GiB 399 GiB 55.37 1.16 43 up osd.86 -50 6.98625 - 7.0 TiB 3.3 TiB 3.3 TiB 18 MiB 13 GiB 3.7 TiB 47.32 0.99 - host ks02 97 ssd 0.87329 1.00000 894 GiB 493 GiB 491 GiB 934 KiB 1.9 GiB 401 GiB 55.11 1.16 100 up osd.97 98 ssd 0.87329 1.00000 894 GiB 518 GiB 516 GiB 1.5 MiB 1.9 GiB 376 GiB 57.92 1.22 33 up osd.98 99 ssd 0.87329 1.00000 894 GiB 514 GiB 512 GiB 1019 KiB 1.7 GiB 380 GiB 57.46 1.21 38 up osd.99 100 ssd 0.87329 1.00000 894 GiB 514 GiB 512 GiB 1.0 MiB 1.9 GiB 381 GiB 57.44 1.21 35 up osd.100 101 ssd 3.49309 1.00000 3.5 TiB 1.3 TiB 1.3 TiB 13 MiB 5.3 GiB 2.2 TiB 37.65 0.79 140 up osd.101 -27 3.49319 - 3.5 TiB 1.6 TiB 1.6 TiB 4.2 MiB 6.9 GiB 1.9 TiB 45.58 0.96 - host lc02 52 ssd 1.74660 1.00000 1.7 TiB 799 GiB 796 GiB 2.0 MiB 3.4 GiB 989 GiB 44.70 0.94 70 up osd.52 73 ssd 1.74660 1.00000 1.7 TiB 831 GiB 828 GiB 2.2 MiB 3.5 GiB 957 GiB 46.47 0.98 66 up osd.73 -55 13.97246 - 14 TiB 5.6 TiB 5.6 TiB 16 MiB 20 GiB 8.4 TiB 40.15 0.84 - host lx01 14 ssd 3.49309 1.00000 3.5 TiB 1.3 TiB 1.3 TiB 3.8 MiB 3.9 GiB 2.2 TiB 37.83 0.80 140 up osd.14 27 ssd 3.49309 1.00000 3.5 TiB 1.3 TiB 1.3 TiB 3.9 MiB 4.7 GiB 2.2 TiB 37.65 0.79 140 up osd.27 29 ssd 3.49309 1.00000 3.5 TiB 1.4 TiB 1.4 TiB 4.1 MiB 4.8 GiB 2.1 TiB 40.47 0.85 133 up osd.29 48 ssd 1.74660 1.00000 1.7 TiB 817 GiB 813 GiB 2.1 MiB 3.2 GiB 972 GiB 45.66 0.96 71 up osd.48 49 ssd 1.74660 1.00000 1.7 TiB 780 GiB 776 GiB 2.0 MiB 3.5 GiB 1009 GiB 43.61 0.92 69 up osd.49 -2 55.00392 - 55 TiB 27 TiB 27 TiB 118 MiB 94 GiB 28 TiB 48.50 1.02 - rack marack2 -13 13.96875 - 14 TiB 5.9 TiB 5.9 TiB 20 MiB 20 GiB 8.0 TiB 42.41 0.89 - host helm 11 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.5 TiB 3.9 MiB 4.8 GiB 2.0 TiB 42.34 0.89 132 up osd.11 28 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.5 TiB 7.3 MiB 5.0 GiB 2.0 TiB 42.48 0.89 132 up osd.28 30 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.5 TiB 4.1 MiB 5.1 GiB 2.0 TiB 42.44 0.89 132 up osd.30 31 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.5 TiB 4.7 MiB 5.1 GiB 2.0 TiB 42.40 0.89 132 up osd.31 -17 7.85712 - 7.9 TiB 4.3 TiB 4.3 TiB 20 MiB 15 GiB 3.6 TiB 54.33 1.14 - host jazz 23 ssd 0.87299 1.00000 894 GiB 570 GiB 568 GiB 1003 KiB 1.9 GiB 324 GiB 63.81 1.34 51 up osd.23 24 ssd 0.87299 1.00000 894 GiB 572 GiB 570 GiB 1.0 MiB 2.0 GiB 322 GiB 63.99 1.35 34 up osd.24 25 ssd 0.87299 1.00000 894 GiB 576 GiB 574 GiB 1.0 MiB 1.9 GiB 318 GiB 64.39 1.35 33 up osd.25 26 ssd 0.87299 1.00000 894 GiB 571 GiB 569 GiB 1018 KiB 2.0 GiB 323 GiB 63.91 1.34 49 up osd.26 36 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.5 TiB 15 MiB 5.0 GiB 2.0 TiB 42.23 0.89 131 up osd.36 44 ssd 0.87299 1.00000 894 GiB 572 GiB 570 GiB 1.0 MiB 1.9 GiB 322 GiB 63.96 1.34 49 up osd.44 -6 6.98438 - 7.0 TiB 2.9 TiB 2.9 TiB 27 MiB 9.9 GiB 4.1 TiB 41.83 0.88 - host john 2 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.5 TiB 16 MiB 4.8 GiB 2.0 TiB 42.33 0.89 132 up osd.2 3 ssd 3.49219 1.00000 3.5 TiB 1.4 TiB 1.4 TiB 11 MiB 5.1 GiB 2.0 TiB 41.33 0.87 141 up osd.3 -49 7.85712 - 7.9 TiB 4.2 TiB 4.2 TiB 17 MiB 14 GiB 3.6 TiB 53.92 1.13 - host jolt 34 ssd 3.49219 1.00000 3.5 TiB 1.4 TiB 1.4 TiB 9.6 MiB 4.8 GiB 2.0 TiB 41.43 0.87 142 up osd.34 63 ssd 0.87299 1.00000 894 GiB 571 GiB 569 GiB 1023 KiB 1.9 GiB 323 GiB 63.88 1.34 45 up osd.63 64 ssd 0.87299 1.00000 894 GiB 571 GiB 569 GiB 1.0 MiB 2.0 GiB 323 GiB 63.91 1.34 34 up osd.64 65 ssd 0.87299 1.00000 894 GiB 571 GiB 569 GiB 983 KiB 1.9 GiB 323 GiB 63.86 1.34 39 up osd.65 66 ssd 0.87299 1.00000 894 GiB 571 GiB 569 GiB 992 KiB 1.9 GiB 323 GiB 63.91 1.34 41 up osd.66 67 ssd 0.87299 1.00000 894 GiB 572 GiB 570 GiB 3.6 MiB 1.9 GiB 322 GiB 64.00 1.35 34 up osd.67 -19 7.85712 - 7.9 TiB 4.3 TiB 4.3 TiB 14 MiB 15 GiB 3.6 TiB 54.34 1.14 - host juju 37 ssd 0.87299 1.00000 894 GiB 574 GiB 572 GiB 1.0 MiB 1.9 GiB 320 GiB 64.23 1.35 33 up osd.37 38 ssd 0.87299 1.00000 894 GiB 570 GiB 569 GiB 1003 KiB 1.8 GiB 324 GiB 63.80 1.34 42 up osd.38 39 ssd 0.87299 1.00000 894 GiB 577 GiB 575 GiB 1.1 MiB 1.7 GiB 317 GiB 64.51 1.36 33 up osd.39 40 ssd 0.87299 1.00000 894 GiB 577 GiB 575 GiB 1004 KiB 1.9 GiB 317 GiB 64.51 1.36 33 up osd.40 41 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.5 TiB 9.2 MiB 5.4 GiB 2.0 TiB 41.97 0.88 135 up osd.41 46 ssd 0.87299 1.00000 894 GiB 573 GiB 571 GiB 1003 KiB 1.8 GiB 321 GiB 64.09 1.35 33 up osd.46 -44 6.98625 - 7.0 TiB 3.4 TiB 3.4 TiB 16 MiB 13 GiB 3.5 TiB 49.37 1.04 - host ks01 92 ssd 0.87329 1.00000 894 GiB 522 GiB 520 GiB 1.0 MiB 2.0 GiB 372 GiB 58.35 1.23 33 up osd.92 93 ssd 0.87329 1.00000 894 GiB 520 GiB 518 GiB 1004 KiB 1.8 GiB 375 GiB 58.10 1.22 33 up osd.93 94 ssd 0.87329 1.00000 894 GiB 515 GiB 513 GiB 1.7 MiB 2.0 GiB 379 GiB 57.61 1.21 48 up osd.94 95 ssd 0.87329 1.00000 894 GiB 515 GiB 514 GiB 1.0 MiB 1.8 GiB 379 GiB 57.63 1.21 35 up osd.95 96 ssd 3.49309 1.00000 3.5 TiB 1.4 TiB 1.4 TiB 11 MiB 5.7 GiB 2.1 TiB 40.83 0.86 132 up osd.96 -22 3.49319 - 3.5 TiB 1.6 TiB 1.6 TiB 4.1 MiB 7.1 GiB 1.9 TiB 45.95 0.97 - host lc01 50 ssd 1.74660 1.00000 1.7 TiB 820 GiB 817 GiB 2.1 MiB 3.5 GiB 968 GiB 45.87 0.96 76 up osd.50 51 ssd 1.74660 1.00000 1.7 TiB 823 GiB 819 GiB 2.0 MiB 3.6 GiB 965 GiB 46.02 0.97 78 up osd.51 -11 55.00392 - 55 TiB 27 TiB 27 TiB 106 MiB 92 GiB 28 TiB 48.49 1.02 - rack marack3 -10 13.96875 - 14 TiB 5.9 TiB 5.8 TiB 32 MiB 20 GiB 8.1 TiB 41.90 0.88 - host gold 7 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.5 TiB 4.0 MiB 4.8 GiB 2.0 TiB 42.24 0.89 131 up osd.7 9 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.4 TiB 9.3 MiB 4.9 GiB 2.0 TiB 41.61 0.87 140 up osd.9 10 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.5 TiB 4.0 MiB 5.0 GiB 2.0 TiB 42.20 0.89 131 up osd.10 15 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.4 TiB 14 MiB 4.8 GiB 2.0 TiB 41.54 0.87 143 up osd.15 -12 3.49319 - 3.5 TiB 1.6 TiB 1.6 TiB 4.2 MiB 6.6 GiB 1.9 TiB 46.48 0.98 - host innes 42 ssd 1.74660 1.00000 1.7 TiB 834 GiB 830 GiB 2.1 MiB 3.2 GiB 955 GiB 46.61 0.98 66 up osd.42 47 ssd 1.74660 1.00000 1.7 TiB 829 GiB 826 GiB 2.1 MiB 3.4 GiB 959 GiB 46.36 0.97 67 up osd.47 -16 6.98438 - 7.0 TiB 3.0 TiB 2.9 TiB 15 MiB 10 GiB 4.0 TiB 42.30 0.89 - host jack 4 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.5 TiB 11 MiB 5.1 GiB 2.0 TiB 42.31 0.89 131 up osd.4 6 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.5 TiB 4.1 MiB 4.9 GiB 2.0 TiB 42.30 0.89 132 up osd.6 -43 7.85712 - 7.9 TiB 4.3 TiB 4.3 TiB 14 MiB 14 GiB 3.6 TiB 54.53 1.15 - host jam 32 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.5 TiB 9.1 MiB 5.0 GiB 2.0 TiB 42.58 0.90 133 up osd.32 53 ssd 0.87299 1.00000 894 GiB 574 GiB 572 GiB 1011 KiB 1.9 GiB 320 GiB 64.17 1.35 34 up osd.53 54 ssd 0.87299 1.00000 894 GiB 574 GiB 572 GiB 1.0 MiB 2.0 GiB 320 GiB 64.19 1.35 33 up osd.54 55 ssd 0.87299 1.00000 894 GiB 572 GiB 571 GiB 984 KiB 1.8 GiB 322 GiB 64.02 1.35 48 up osd.55 56 ssd 0.87299 1.00000 894 GiB 572 GiB 570 GiB 1003 KiB 1.9 GiB 322 GiB 64.03 1.35 103 up osd.56 57 ssd 0.87299 1.00000 894 GiB 573 GiB 571 GiB 995 KiB 1.7 GiB 321 GiB 64.06 1.35 35 up osd.57 -46 7.85712 - 7.9 TiB 4.3 TiB 4.2 TiB 19 MiB 14 GiB 3.6 TiB 54.20 1.14 - host jet 33 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.4 TiB 13 MiB 5.0 GiB 2.0 TiB 41.61 0.87 142 up osd.33 58 ssd 0.87299 1.00000 894 GiB 573 GiB 571 GiB 1004 KiB 2.0 GiB 321 GiB 64.05 1.35 35 up osd.58 59 ssd 0.87299 1.00000 894 GiB 577 GiB 575 GiB 1.3 MiB 1.8 GiB 317 GiB 64.55 1.36 33 up osd.59 60 ssd 0.87299 1.00000 894 GiB 573 GiB 571 GiB 1019 KiB 1.9 GiB 321 GiB 64.11 1.35 34 up osd.60 61 ssd 0.87299 1.00000 894 GiB 575 GiB 573 GiB 1011 KiB 1.9 GiB 319 GiB 64.29 1.35 33 up osd.61 62 ssd 0.87299 1.00000 894 GiB 575 GiB 573 GiB 1.0 MiB 1.7 GiB 319 GiB 64.31 1.35 33 up osd.62 -52 7.85712 - 7.9 TiB 4.3 TiB 4.2 TiB 13 MiB 14 GiB 3.6 TiB 54.23 1.14 - host jug 35 ssd 3.49219 1.00000 3.5 TiB 1.5 TiB 1.4 TiB 8.2 MiB 4.9 GiB 2.0 TiB 41.64 0.88 142 up osd.35 68 ssd 0.87299 1.00000 894 GiB 576 GiB 574 GiB 1.0 MiB 1.9 GiB 318 GiB 64.42 1.35 33 up osd.68 69 ssd 0.87299 1.00000 894 GiB 574 GiB 572 GiB 1000 KiB 2.0 GiB 320 GiB 64.17 1.35 34 up osd.69 70 ssd 0.87299 1.00000 894 GiB 573 GiB 571 GiB 1011 KiB 1.9 GiB 321 GiB 64.07 1.35 35 up osd.70 71 ssd 0.87299 1.00000 894 GiB 574 GiB 572 GiB 988 KiB 1.9 GiB 320 GiB 64.15 1.35 33 up osd.71 72 ssd 0.87299 1.00000 894 GiB 578 GiB 576 GiB 1.0 MiB 2.0 GiB 316 GiB 64.67 1.36 33 up osd.72 -41 6.98625 - 7.0 TiB 3.4 TiB 3.4 TiB 9.4 MiB 13 GiB 3.5 TiB 49.23 1.04 - host kc02 87 ssd 0.87329 1.00000 894 GiB 521 GiB 519 GiB 1.0 MiB 1.9 GiB 374 GiB 58.23 1.22 33 up osd.87 88 ssd 0.87329 1.00000 894 GiB 521 GiB 519 GiB 1012 KiB 1.8 GiB 373 GiB 58.29 1.23 33 up osd.88 89 ssd 0.87329 1.00000 894 GiB 519 GiB 517 GiB 1.4 MiB 1.9 GiB 376 GiB 57.99 1.22 33 up osd.89 90 ssd 3.49309 1.00000 3.5 TiB 1.4 TiB 1.4 TiB 5.0 MiB 5.2 GiB 2.1 TiB 40.31 0.85 148 up osd.90 91 ssd 0.87329 1.00000 894 GiB 519 GiB 517 GiB 997 KiB 2.0 GiB 375 GiB 58.07 1.22 33 up osd.91 TOTAL 168 TiB 80 TiB 79 TiB 331 MiB 282 GiB 88 TiB 47.56 MIN/MAX VAR: 0.79/1.36 STDDEV: 12.03 placementoptimizer.py show --osds --sort-utilization poolid name type size min pg_num stored used avail shrdsize crush 0 data repl 3 1 32 118.5M 355.9M 17.0T 3.7M 0:data default*1.000 1 metadata repl 3 1 32 13.2K 1.1M 17.0T 423.9B 1:metadata default*1.000 2 rbd repl 3 1 32 0.0B 0.0B 17.0T 0.0B 2:rbd default*1.000 3 images repl 3 1 2048 19.0T 56.9T 17.0T 9.5G 0:data default*1.000 4 locks repl 3 1 32 595.2M 1.8G 17.0T 18.6M 0:data default*1.000 24 device_health_metrics repl 3 2 1 6.7M 20.0M 17.0T 6.7M 0:data default*1.000 25 k8s-101v repl 3 1 32 113.3M 341.3M 17.0T 3.5M 0:data default*1.000 26 k8s-102pm repl 3 2 32 50.4G 151.6G 17.0T 1.6G 0:data default*1.000 27 k8s-103pm repl 3 2 32 25.0B 64.0K 17.0T 0.8B 0:data default*1.000 default 19.02T 57.06T sum 19.02T 57.06T osdid hostname cls devsize weight cweight util pg_num pools 27 lx01 ssd 3.5T 1.00 3.5T 37.6% 140 0 1 2 3 4 25 26 27 101 ks02 ssd 3.5T 1.00 3.5T 37.7% 140 0 1 2 3 4 25 26 27 85 kc01 ssd 3.5T 1.00 3.5T 37.7% 140 0 1 2 3 4 25 26 27 14 lx01 ssd 3.5T 1.00 3.5T 37.8% 140 0 1 2 3 4 25 26 27 22 jeep ssd 3.5T 1.00 3.5T 39.2% 133 0 1 2 3 4 25 26 27 17 joy ssd 3.5T 1.00 3.5T 39.4% 130 0 1 2 3 4 25 26 27 90 kc02 ssd 3.5T 1.00 3.5T 40.3% 148 0 1 2 3 4 25 26 27 1 gallo ssd 3.5T 1.00 3.5T 40.4% 136 0 1 2 3 4 25 26 27 29 lx01 ssd 3.5T 1.00 3.5T 40.5% 133 3 26 27 96 ks01 ssd 3.5T 1.00 3.5T 40.8% 132 3 3 john ssd 3.5T 1.00 3.5T 41.3% 141 0 1 2 3 4 25 26 27 34 jolt ssd 3.5T 1.00 3.5T 41.4% 142 0 1 2 3 4 24 25 26 27 0 gallo ssd 3.5T 1.00 3.5T 41.5% 141 0 1 2 3 4 25 26 27 15 gold ssd 3.5T 1.00 3.5T 41.5% 143 0 1 2 3 4 24 25 26 27 33 jet ssd 3.5T 1.00 3.5T 41.6% 142 0 1 2 3 4 25 26 27 9 gold ssd 3.5T 1.00 3.5T 41.6% 140 0 1 2 3 4 25 26 27 35 jug ssd 3.5T 1.00 3.5T 41.6% 142 0 1 2 3 4 25 26 27 41 juju ssd 3.5T 1.00 3.5T 42.0% 135 0 3 4 25 26 27 10 gold ssd 3.5T 1.00 3.5T 42.2% 131 3 36 jazz ssd 3.5T 1.00 3.5T 42.2% 131 3 7 gold ssd 3.5T 1.00 3.5T 42.2% 131 3 6 jack ssd 3.5T 1.00 3.5T 42.3% 132 3 27 4 jack ssd 3.5T 1.00 3.5T 42.3% 131 3 2 john ssd 3.5T 1.00 3.5T 42.3% 132 3 11 helm ssd 3.5T 1.00 3.5T 42.3% 132 3 31 helm ssd 3.5T 1.00 3.5T 42.4% 132 3 30 helm ssd 3.5T 1.00 3.5T 42.4% 132 3 8 gallo ssd 3.5T 1.00 3.5T 42.4% 132 3 28 helm ssd 3.5T 1.00 3.5T 42.5% 132 3 32 jam ssd 3.5T 1.00 3.5T 42.6% 133 3 49 lx01 ssd 1.7T 1.00 1.7T 43.6% 69 0 1 2 3 4 25 26 27 52 lc02 ssd 1.7T 1.00 1.7T 44.7% 70 0 1 2 3 4 25 26 27 48 lx01 ssd 1.7T 1.00 1.7T 45.6% 71 0 1 2 3 4 25 26 27 50 lc01 ssd 1.7T 1.00 1.7T 45.9% 76 0 1 2 3 4 25 26 27 51 lc01 ssd 1.7T 1.00 1.7T 46.0% 78 0 1 2 3 4 25 26 27 47 innes ssd 1.7T 1.00 1.7T 46.3% 67 3 26 73 lc02 ssd 1.7T 1.00 1.7T 46.5% 66 3 42 innes ssd 1.7T 1.00 1.7T 46.6% 66 3 97 ks02 ssd 894.2G 1.00 894.2G 55.1% 100 0 1 2 3 4 25 26 27 86 kc01 ssd 894.2G 1.00 894.2G 55.4% 43 1 2 3 24 26 83 kc01 ssd 894.2G 1.00 894.2G 55.4% 36 1 2 3 4 25 26 27 82 kc01 ssd 894.2G 1.00 894.2G 55.6% 34 1 3 25 26 27 84 kc01 ssd 894.2G 1.00 894.2G 57.3% 35 0 3 25 26 100 ks02 ssd 894.2G 1.00 894.2G 57.4% 35 1 2 3 27 99 ks02 ssd 894.2G 1.00 894.2G 57.5% 38 0 1 2 3 4 25 26 94 ks01 ssd 894.2G 1.00 894.2G 57.6% 48 0 3 4 25 26 27 95 ks01 ssd 894.2G 1.00 894.2G 57.6% 35 0 3 25 26 98 ks02 ssd 894.2G 1.00 894.2G 57.9% 33 3 89 kc02 ssd 894.2G 1.00 894.2G 58.0% 33 3 91 kc02 ssd 894.2G 1.00 894.2G 58.1% 33 3 93 ks01 ssd 894.2G 1.00 894.2G 58.1% 33 3 87 kc02 ssd 894.2G 1.00 894.2G 58.2% 33 3 88 kc02 ssd 894.2G 1.00 894.2G 58.3% 33 3 92 ks01 ssd 894.2G 1.00 894.2G 58.3% 33 3 20 jeep ssd 894.0G 1.00 893.9G 61.5% 69 1 2 3 4 26 19 jeep ssd 894.0G 1.00 893.9G 61.7% 32 3 25 26 43 joy ssd 894.0G 1.00 893.9G 61.8% 33 0 3 4 26 45 jeep ssd 894.0G 1.00 893.9G 62.4% 35 0 1 2 3 4 5 joy ssd 894.0G 1.00 893.9G 63.6% 32 3 18 jeep ssd 894.0G 1.00 893.9G 63.6% 35 1 3 26 27 16 joy ssd 894.0G 1.00 893.9G 63.7% 35 1 3 26 27 23 jazz ssd 894.0G 1.00 893.9G 63.8% 51 0 3 4 25 26 27 38 juju ssd 894.0G 1.00 893.9G 63.8% 42 0 3 4 25 26 27 65 jolt ssd 894.0G 1.00 893.9G 63.9% 39 0 1 2 3 4 25 26 63 jolt ssd 894.0G 1.00 893.9G 63.9% 45 0 3 4 25 26 27 26 jazz ssd 894.0G 1.00 893.9G 63.9% 49 0 3 4 25 26 27 64 jolt ssd 894.0G 1.00 893.9G 63.9% 34 3 26 27 66 jolt ssd 894.0G 1.00 893.9G 63.9% 41 0 1 2 3 4 25 26 44 jazz ssd 894.0G 1.00 893.9G 63.9% 49 0 3 4 25 26 27 24 jazz ssd 894.0G 1.00 893.9G 64.0% 34 3 26 67 jolt ssd 894.0G 1.00 893.9G 64.0% 34 0 3 26 55 jam ssd 894.0G 1.00 893.9G 64.0% 48 0 3 4 25 26 58 jet ssd 894.0G 1.00 893.9G 64.0% 35 3 26 56 jam ssd 894.0G 1.00 893.9G 64.0% 103 0 3 4 25 26 27 70 jug ssd 894.0G 1.00 893.9G 64.1% 35 3 26 57 jam ssd 894.0G 1.00 893.9G 64.1% 35 3 26 46 juju ssd 894.0G 1.00 893.9G 64.1% 33 3 26 60 jet ssd 894.0G 1.00 893.9G 64.1% 34 3 26 71 jug ssd 894.0G 1.00 893.9G 64.1% 33 3 69 jug ssd 894.0G 1.00 893.9G 64.2% 34 3 26 53 jam ssd 894.0G 1.00 893.9G 64.2% 34 3 26 54 jam ssd 894.0G 1.00 893.9G 64.2% 33 3 37 juju ssd 894.0G 1.00 893.9G 64.2% 33 3 26 61 jet ssd 894.0G 1.00 893.9G 64.3% 33 3 62 jet ssd 894.0G 1.00 893.9G 64.3% 33 3 25 jazz ssd 894.0G 1.00 893.9G 64.4% 33 3 68 jug ssd 894.0G 1.00 893.9G 64.4% 33 3 12 joy ssd 894.0G 1.00 893.9G 64.5% 33 3 40 juju ssd 894.0G 1.00 893.9G 64.5% 33 3 39 juju ssd 894.0G 1.00 893.9G 64.5% 33 3 59 jet ssd 894.0G 1.00 893.9G 64.6% 33 3 21 jeep ssd 894.0G 1.00 893.9G 64.6% 33 3 13 joy ssd 894.0G 1.00 893.9G 64.6% 33 3 72 jug ssd 894.0G 1.00 893.9G 64.7% 33 3 ceph osd crush dump { "devices": [ { "id": 0, "name": "osd.0", "class": "ssd" }, { "id": 1, "name": "osd.1", "class": "ssd" }, { "id": 2, "name": "osd.2", "class": "ssd" }, { "id": 3, "name": "osd.3", "class": "ssd" }, { "id": 4, "name": "osd.4", "class": "ssd" }, { "id": 5, "name": "osd.5", "class": "ssd" }, { "id": 6, "name": "osd.6", "class": "ssd" }, { "id": 7, "name": "osd.7", "class": "ssd" }, { "id": 8, "name": "osd.8", "class": "ssd" }, { "id": 9, "name": "osd.9", "class": "ssd" }, { "id": 10, "name": "osd.10", "class": "ssd" }, { "id": 11, "name": "osd.11", "class": "ssd" }, { "id": 12, "name": "osd.12", "class": "ssd" }, { "id": 13, "name": "osd.13", "class": "ssd" }, { "id": 14, "name": "osd.14", "class": "ssd" }, { "id": 15, "name": "osd.15", "class": "ssd" }, { "id": 16, "name": "osd.16", "class": "ssd" }, { "id": 17, "name": "osd.17", "class": "ssd" }, { "id": 18, "name": "osd.18", "class": "ssd" }, { "id": 19, "name": "osd.19", "class": "ssd" }, { "id": 20, "name": "osd.20", "class": "ssd" }, { "id": 21, "name": "osd.21", "class": "ssd" }, { "id": 22, "name": "osd.22", "class": "ssd" }, { "id": 23, "name": "osd.23", "class": "ssd" }, { "id": 24, "name": "osd.24", "class": "ssd" }, { "id": 25, "name": "osd.25", "class": "ssd" }, { "id": 26, "name": "osd.26", "class": "ssd" }, { "id": 27, "name": "osd.27", "class": "ssd" }, { "id": 28, "name": "osd.28", "class": "ssd" }, { "id": 29, "name": "osd.29", "class": "ssd" }, { "id": 30, "name": "osd.30", "class": "ssd" }, { "id": 31, "name": "osd.31", "class": "ssd" }, { "id": 32, "name": "osd.32", "class": "ssd" }, { "id": 33, "name": "osd.33", "class": "ssd" }, { "id": 34, "name": "osd.34", "class": "ssd" }, { "id": 35, "name": "osd.35", "class": "ssd" }, { "id": 36, "name": "osd.36", "class": "ssd" }, { "id": 37, "name": "osd.37", "class": "ssd" }, { "id": 38, "name": "osd.38", "class": "ssd" }, { "id": 39, "name": "osd.39", "class": "ssd" }, { "id": 40, "name": "osd.40", "class": "ssd" }, { "id": 41, "name": "osd.41", "class": "ssd" }, { "id": 42, "name": "osd.42", "class": "ssd" }, { "id": 43, "name": "osd.43", "class": "ssd" }, { "id": 44, "name": "osd.44", "class": "ssd" }, { "id": 45, "name": "osd.45", "class": "ssd" }, { "id": 46, "name": "osd.46", "class": "ssd" }, { "id": 47, "name": "osd.47", "class": "ssd" }, { "id": 48, "name": "osd.48", "class": "ssd" }, { "id": 49, "name": "osd.49", "class": "ssd" }, { "id": 50, "name": "osd.50", "class": "ssd" }, { "id": 51, "name": "osd.51", "class": "ssd" }, { "id": 52, "name": "osd.52", "class": "ssd" }, { "id": 53, "name": "osd.53", "class": "ssd" }, { "id": 54, "name": "osd.54", "class": "ssd" }, { "id": 55, "name": "osd.55", "class": "ssd" }, { "id": 56, "name": "osd.56", "class": "ssd" }, { "id": 57, "name": "osd.57", "class": "ssd" }, { "id": 58, "name": "osd.58", "class": "ssd" }, { "id": 59, "name": "osd.59", "class": "ssd" }, { "id": 60, "name": "osd.60", "class": "ssd" }, { "id": 61, "name": "osd.61", "class": "ssd" }, { "id": 62, "name": "osd.62", "class": "ssd" }, { "id": 63, "name": "osd.63", "class": "ssd" }, { "id": 64, "name": "osd.64", "class": "ssd" }, { "id": 65, "name": "osd.65", "class": "ssd" }, { "id": 66, "name": "osd.66", "class": "ssd" }, { "id": 67, "name": "osd.67", "class": "ssd" }, { "id": 68, "name": "osd.68", "class": "ssd" }, { "id": 69, "name": "osd.69", "class": "ssd" }, { "id": 70, "name": "osd.70", "class": "ssd" }, { "id": 71, "name": "osd.71", "class": "ssd" }, { "id": 72, "name": "osd.72", "class": "ssd" }, { "id": 73, "name": "osd.73", "class": "ssd" }, { "id": 74, "name": "device74" }, { "id": 75, "name": "device75" }, { "id": 76, "name": "device76" }, { "id": 77, "name": "device77" }, { "id": 78, "name": "device78" }, { "id": 79, "name": "device79" }, { "id": 80, "name": "device80" }, { "id": 81, "name": "device81" }, { "id": 82, "name": "osd.82", "class": "ssd" }, { "id": 83, "name": "osd.83", "class": "ssd" }, { "id": 84, "name": "osd.84", "class": "ssd" }, { "id": 85, "name": "osd.85", "class": "ssd" }, { "id": 86, "name": "osd.86", "class": "ssd" }, { "id": 87, "name": "osd.87", "class": "ssd" }, { "id": 88, "name": "osd.88", "class": "ssd" }, { "id": 89, "name": "osd.89", "class": "ssd" }, { "id": 90, "name": "osd.90", "class": "ssd" }, { "id": 91, "name": "osd.91", "class": "ssd" }, { "id": 92, "name": "osd.92", "class": "ssd" }, { "id": 93, "name": "osd.93", "class": "ssd" }, { "id": 94, "name": "osd.94", "class": "ssd" }, { "id": 95, "name": "osd.95", "class": "ssd" }, { "id": 96, "name": "osd.96", "class": "ssd" }, { "id": 97, "name": "osd.97", "class": "ssd" }, { "id": 98, "name": "osd.98", "class": "ssd" }, { "id": 99, "name": "osd.99", "class": "ssd" }, { "id": 100, "name": "osd.100", "class": "ssd" }, { "id": 101, "name": "osd.101", "class": "ssd" } ], "types": [ { "type_id": 0, "name": "osd" }, { "type_id": 1, "name": "host" }, { "type_id": 2, "name": "rack" }, { "type_id": 3, "name": "row" }, { "type_id": 4, "name": "room" }, { "type_id": 5, "name": "datacenter" }, { "type_id": 6, "name": "root" } ], "buckets": [ { "id": -1, "name": "default", "type_id": 6, "type_name": "root", "weight": 10986245, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": -3, "weight": 3776771, "pos": 0 }, { "id": -2, "weight": 3604737, "pos": 1 }, { "id": -11, "weight": 3604737, "pos": 2 } ] }, { "id": -2, "name": "marack2", "type_id": 2, "type_name": "rack", "weight": 3604737, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": -13, "weight": 915456, "pos": 0 }, { "id": -19, "weight": 514924, "pos": 1 }, { "id": -49, "weight": 514924, "pos": 2 }, { "id": -17, "weight": 514924, "pos": 3 }, { "id": -44, "weight": 457851, "pos": 4 }, { "id": -6, "weight": 457728, "pos": 5 }, { "id": -22, "weight": 228930, "pos": 6 } ] }, { "id": -3, "name": "marack", "type_id": 2, "type_name": "rack", "weight": 3776771, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": -4, "weight": 686592, "pos": 0 }, { "id": -37, "weight": 457851, "pos": 1 }, { "id": -55, "weight": 915699, "pos": 2 }, { "id": -5, "weight": 514924, "pos": 3 }, { "id": -8, "weight": 514924, "pos": 4 }, { "id": -50, "weight": 457851, "pos": 5 }, { "id": -27, "weight": 228930, "pos": 6 } ] }, { "id": -4, "name": "gallo", "type_id": 1, "type_name": "host", "weight": 686592, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 0, "weight": 228864, "pos": 0 }, { "id": 1, "weight": 228864, "pos": 1 }, { "id": 8, "weight": 228864, "pos": 2 } ] }, { "id": -5, "name": "joy", "type_id": 1, "type_name": "host", "weight": 514924, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 5, "weight": 57212, "pos": 0 }, { "id": 12, "weight": 57212, "pos": 1 }, { "id": 13, "weight": 57212, "pos": 2 }, { "id": 16, "weight": 57212, "pos": 3 }, { "id": 17, "weight": 228864, "pos": 4 }, { "id": 43, "weight": 57212, "pos": 5 } ] }, { "id": -6, "name": "john", "type_id": 1, "type_name": "host", "weight": 457728, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 2, "weight": 228864, "pos": 0 }, { "id": 3, "weight": 228864, "pos": 1 } ] }, { "id": -7, "name": "joy~ssd", "type_id": 1, "type_name": "host", "weight": 514924, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 5, "weight": 57212, "pos": 0 }, { "id": 12, "weight": 57212, "pos": 1 }, { "id": 13, "weight": 57212, "pos": 2 }, { "id": 16, "weight": 57212, "pos": 3 }, { "id": 17, "weight": 228864, "pos": 4 }, { "id": 43, "weight": 57212, "pos": 5 } ] }, { "id": -8, "name": "jeep", "type_id": 1, "type_name": "host", "weight": 514924, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 18, "weight": 57212, "pos": 0 }, { "id": 19, "weight": 57212, "pos": 1 }, { "id": 20, "weight": 57212, "pos": 2 }, { "id": 21, "weight": 57212, "pos": 3 }, { "id": 22, "weight": 228864, "pos": 4 }, { "id": 45, "weight": 57212, "pos": 5 } ] }, { "id": -9, "name": "john~ssd", "type_id": 1, "type_name": "host", "weight": 457728, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 2, "weight": 228864, "pos": 0 }, { "id": 3, "weight": 228864, "pos": 1 } ] }, { "id": -10, "name": "gold", "type_id": 1, "type_name": "host", "weight": 915456, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 7, "weight": 228864, "pos": 0 }, { "id": 9, "weight": 228864, "pos": 1 }, { "id": 10, "weight": 228864, "pos": 2 }, { "id": 15, "weight": 228864, "pos": 3 } ] }, { "id": -11, "name": "marack3", "type_id": 2, "type_name": "rack", "weight": 3604737, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": -52, "weight": 514924, "pos": 0 }, { "id": -46, "weight": 514924, "pos": 1 }, { "id": -43, "weight": 514924, "pos": 2 }, { "id": -10, "weight": 915456, "pos": 3 }, { "id": -41, "weight": 457851, "pos": 4 }, { "id": -16, "weight": 457728, "pos": 5 }, { "id": -12, "weight": 228930, "pos": 6 } ] }, { "id": -12, "name": "innes", "type_id": 1, "type_name": "host", "weight": 228930, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 42, "weight": 114465, "pos": 0 }, { "id": 47, "weight": 114465, "pos": 1 } ] }, { "id": -13, "name": "helm", "type_id": 1, "type_name": "host", "weight": 915456, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 28, "weight": 228864, "pos": 0 }, { "id": 30, "weight": 228864, "pos": 1 }, { "id": 31, "weight": 228864, "pos": 2 }, { "id": 11, "weight": 228864, "pos": 3 } ] }, { "id": -14, "name": "gallo~ssd", "type_id": 1, "type_name": "host", "weight": 686592, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 0, "weight": 228864, "pos": 0 }, { "id": 1, "weight": 228864, "pos": 1 }, { "id": 8, "weight": 228864, "pos": 2 } ] }, { "id": -15, "name": "jeep~ssd", "type_id": 1, "type_name": "host", "weight": 514924, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 18, "weight": 57212, "pos": 0 }, { "id": 19, "weight": 57212, "pos": 1 }, { "id": 20, "weight": 57212, "pos": 2 }, { "id": 21, "weight": 57212, "pos": 3 }, { "id": 22, "weight": 228864, "pos": 4 }, { "id": 45, "weight": 57212, "pos": 5 } ] }, { "id": -16, "name": "jack", "type_id": 1, "type_name": "host", "weight": 457728, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 4, "weight": 228864, "pos": 0 }, { "id": 6, "weight": 228864, "pos": 1 } ] }, { "id": -17, "name": "jazz", "type_id": 1, "type_name": "host", "weight": 514924, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 23, "weight": 57212, "pos": 0 }, { "id": 24, "weight": 57212, "pos": 1 }, { "id": 25, "weight": 57212, "pos": 2 }, { "id": 26, "weight": 57212, "pos": 3 }, { "id": 36, "weight": 228864, "pos": 4 }, { "id": 44, "weight": 57212, "pos": 5 } ] }, { "id": -18, "name": "jazz~ssd", "type_id": 1, "type_name": "host", "weight": 514924, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 23, "weight": 57212, "pos": 0 }, { "id": 24, "weight": 57212, "pos": 1 }, { "id": 25, "weight": 57212, "pos": 2 }, { "id": 26, "weight": 57212, "pos": 3 }, { "id": 36, "weight": 228864, "pos": 4 }, { "id": 44, "weight": 57212, "pos": 5 } ] }, { "id": -19, "name": "juju", "type_id": 1, "type_name": "host", "weight": 514924, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 37, "weight": 57212, "pos": 0 }, { "id": 38, "weight": 57212, "pos": 1 }, { "id": 39, "weight": 57212, "pos": 2 }, { "id": 40, "weight": 57212, "pos": 3 }, { "id": 41, "weight": 228864, "pos": 4 }, { "id": 46, "weight": 57212, "pos": 5 } ] }, { "id": -20, "name": "juju~ssd", "type_id": 1, "type_name": "host", "weight": 514924, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 37, "weight": 57212, "pos": 0 }, { "id": 38, "weight": 57212, "pos": 1 }, { "id": 39, "weight": 57212, "pos": 2 }, { "id": 40, "weight": 57212, "pos": 3 }, { "id": 41, "weight": 228864, "pos": 4 }, { "id": 46, "weight": 57212, "pos": 5 } ] }, { "id": -21, "name": "innes~ssd", "type_id": 1, "type_name": "host", "weight": 228930, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 42, "weight": 114465, "pos": 0 }, { "id": 47, "weight": 114465, "pos": 1 } ] }, { "id": -22, "name": "lc01", "type_id": 1, "type_name": "host", "weight": 228930, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 50, "weight": 114465, "pos": 0 }, { "id": 51, "weight": 114465, "pos": 1 } ] }, { "id": -24, "name": "jack~ssd", "type_id": 1, "type_name": "host", "weight": 457728, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 4, "weight": 228864, "pos": 0 }, { "id": 6, "weight": 228864, "pos": 1 } ] }, { "id": -26, "name": "lc01~ssd", "type_id": 1, "type_name": "host", "weight": 228930, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 50, "weight": 114465, "pos": 0 }, { "id": 51, "weight": 114465, "pos": 1 } ] }, { "id": -27, "name": "lc02", "type_id": 1, "type_name": "host", "weight": 228930, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 52, "weight": 114465, "pos": 0 }, { "id": 73, "weight": 114465, "pos": 1 } ] }, { "id": -28, "name": "gold~ssd", "type_id": 1, "type_name": "host", "weight": 915456, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 7, "weight": 228864, "pos": 0 }, { "id": 9, "weight": 228864, "pos": 1 }, { "id": 10, "weight": 228864, "pos": 2 }, { "id": 15, "weight": 228864, "pos": 3 } ] }, { "id": -29, "name": "lc02~ssd", "type_id": 1, "type_name": "host", "weight": 228930, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 52, "weight": 114465, "pos": 0 }, { "id": 73, "weight": 114465, "pos": 1 } ] }, { "id": -30, "name": "marack~ssd", "type_id": 2, "type_name": "rack", "weight": 3776771, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": -14, "weight": 686592, "pos": 0 }, { "id": -40, "weight": 457851, "pos": 1 }, { "id": -56, "weight": 915699, "pos": 2 }, { "id": -7, "weight": 514924, "pos": 3 }, { "id": -15, "weight": 514924, "pos": 4 }, { "id": -53, "weight": 457851, "pos": 5 }, { "id": -29, "weight": 228930, "pos": 6 } ] }, { "id": -34, "name": "marack2~ssd", "type_id": 2, "type_name": "rack", "weight": 3604737, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": -35, "weight": 915456, "pos": 0 }, { "id": -20, "weight": 514924, "pos": 1 }, { "id": -51, "weight": 514924, "pos": 2 }, { "id": -18, "weight": 514924, "pos": 3 }, { "id": -47, "weight": 457851, "pos": 4 }, { "id": -9, "weight": 457728, "pos": 5 }, { "id": -26, "weight": 228930, "pos": 6 } ] }, { "id": -35, "name": "helm~ssd", "type_id": 1, "type_name": "host", "weight": 915456, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 28, "weight": 228864, "pos": 0 }, { "id": 30, "weight": 228864, "pos": 1 }, { "id": 31, "weight": 228864, "pos": 2 }, { "id": 11, "weight": 228864, "pos": 3 } ] }, { "id": -37, "name": "kc01", "type_id": 1, "type_name": "host", "weight": 457851, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 82, "weight": 57232, "pos": 0 }, { "id": 83, "weight": 57232, "pos": 1 }, { "id": 84, "weight": 57232, "pos": 2 }, { "id": 85, "weight": 228923, "pos": 3 }, { "id": 86, "weight": 57232, "pos": 4 } ] }, { "id": -38, "name": "marack3~ssd", "type_id": 2, "type_name": "rack", "weight": 3604737, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": -54, "weight": 514924, "pos": 0 }, { "id": -48, "weight": 514924, "pos": 1 }, { "id": -45, "weight": 514924, "pos": 2 }, { "id": -28, "weight": 915456, "pos": 3 }, { "id": -42, "weight": 457851, "pos": 4 }, { "id": -24, "weight": 457728, "pos": 5 }, { "id": -21, "weight": 228930, "pos": 6 } ] }, { "id": -39, "name": "default~ssd", "type_id": 6, "type_name": "root", "weight": 10986245, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": -30, "weight": 3776771, "pos": 0 }, { "id": -34, "weight": 3604737, "pos": 1 }, { "id": -38, "weight": 3604737, "pos": 2 } ] }, { "id": -40, "name": "kc01~ssd", "type_id": 1, "type_name": "host", "weight": 457851, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 82, "weight": 57232, "pos": 0 }, { "id": 83, "weight": 57232, "pos": 1 }, { "id": 84, "weight": 57232, "pos": 2 }, { "id": 85, "weight": 228923, "pos": 3 }, { "id": 86, "weight": 57232, "pos": 4 } ] }, { "id": -41, "name": "kc02", "type_id": 1, "type_name": "host", "weight": 457851, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 87, "weight": 57232, "pos": 0 }, { "id": 88, "weight": 57232, "pos": 1 }, { "id": 89, "weight": 57232, "pos": 2 }, { "id": 90, "weight": 228923, "pos": 3 }, { "id": 91, "weight": 57232, "pos": 4 } ] }, { "id": -42, "name": "kc02~ssd", "type_id": 1, "type_name": "host", "weight": 457851, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 87, "weight": 57232, "pos": 0 }, { "id": 88, "weight": 57232, "pos": 1 }, { "id": 89, "weight": 57232, "pos": 2 }, { "id": 90, "weight": 228923, "pos": 3 }, { "id": 91, "weight": 57232, "pos": 4 } ] }, { "id": -43, "name": "jam", "type_id": 1, "type_name": "host", "weight": 514924, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 53, "weight": 57212, "pos": 0 }, { "id": 54, "weight": 57212, "pos": 1 }, { "id": 55, "weight": 57212, "pos": 2 }, { "id": 56, "weight": 57212, "pos": 3 }, { "id": 57, "weight": 57212, "pos": 4 }, { "id": 32, "weight": 228864, "pos": 5 } ] }, { "id": -44, "name": "ks01", "type_id": 1, "type_name": "host", "weight": 457851, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 92, "weight": 57232, "pos": 0 }, { "id": 93, "weight": 57232, "pos": 1 }, { "id": 94, "weight": 57232, "pos": 2 }, { "id": 95, "weight": 57232, "pos": 3 }, { "id": 96, "weight": 228923, "pos": 4 } ] }, { "id": -45, "name": "jam~ssd", "type_id": 1, "type_name": "host", "weight": 514924, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 53, "weight": 57212, "pos": 0 }, { "id": 54, "weight": 57212, "pos": 1 }, { "id": 55, "weight": 57212, "pos": 2 }, { "id": 56, "weight": 57212, "pos": 3 }, { "id": 57, "weight": 57212, "pos": 4 }, { "id": 32, "weight": 228864, "pos": 5 } ] }, { "id": -46, "name": "jet", "type_id": 1, "type_name": "host", "weight": 514924, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 58, "weight": 57212, "pos": 0 }, { "id": 59, "weight": 57212, "pos": 1 }, { "id": 60, "weight": 57212, "pos": 2 }, { "id": 61, "weight": 57212, "pos": 3 }, { "id": 62, "weight": 57212, "pos": 4 }, { "id": 33, "weight": 228864, "pos": 5 } ] }, { "id": -47, "name": "ks01~ssd", "type_id": 1, "type_name": "host", "weight": 457851, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 92, "weight": 57232, "pos": 0 }, { "id": 93, "weight": 57232, "pos": 1 }, { "id": 94, "weight": 57232, "pos": 2 }, { "id": 95, "weight": 57232, "pos": 3 }, { "id": 96, "weight": 228923, "pos": 4 } ] }, { "id": -48, "name": "jet~ssd", "type_id": 1, "type_name": "host", "weight": 514924, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 58, "weight": 57212, "pos": 0 }, { "id": 59, "weight": 57212, "pos": 1 }, { "id": 60, "weight": 57212, "pos": 2 }, { "id": 61, "weight": 57212, "pos": 3 }, { "id": 62, "weight": 57212, "pos": 4 }, { "id": 33, "weight": 228864, "pos": 5 } ] }, { "id": -49, "name": "jolt", "type_id": 1, "type_name": "host", "weight": 514924, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 63, "weight": 57212, "pos": 0 }, { "id": 64, "weight": 57212, "pos": 1 }, { "id": 65, "weight": 57212, "pos": 2 }, { "id": 66, "weight": 57212, "pos": 3 }, { "id": 67, "weight": 57212, "pos": 4 }, { "id": 34, "weight": 228864, "pos": 5 } ] }, { "id": -50, "name": "ks02", "type_id": 1, "type_name": "host", "weight": 457851, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 97, "weight": 57232, "pos": 0 }, { "id": 98, "weight": 57232, "pos": 1 }, { "id": 99, "weight": 57232, "pos": 2 }, { "id": 100, "weight": 57232, "pos": 3 }, { "id": 101, "weight": 228923, "pos": 4 } ] }, { "id": -51, "name": "jolt~ssd", "type_id": 1, "type_name": "host", "weight": 514924, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 63, "weight": 57212, "pos": 0 }, { "id": 64, "weight": 57212, "pos": 1 }, { "id": 65, "weight": 57212, "pos": 2 }, { "id": 66, "weight": 57212, "pos": 3 }, { "id": 67, "weight": 57212, "pos": 4 }, { "id": 34, "weight": 228864, "pos": 5 } ] }, { "id": -52, "name": "jug", "type_id": 1, "type_name": "host", "weight": 514924, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 68, "weight": 57212, "pos": 0 }, { "id": 69, "weight": 57212, "pos": 1 }, { "id": 70, "weight": 57212, "pos": 2 }, { "id": 71, "weight": 57212, "pos": 3 }, { "id": 72, "weight": 57212, "pos": 4 }, { "id": 35, "weight": 228864, "pos": 5 } ] }, { "id": -53, "name": "ks02~ssd", "type_id": 1, "type_name": "host", "weight": 457851, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 97, "weight": 57232, "pos": 0 }, { "id": 98, "weight": 57232, "pos": 1 }, { "id": 99, "weight": 57232, "pos": 2 }, { "id": 100, "weight": 57232, "pos": 3 }, { "id": 101, "weight": 228923, "pos": 4 } ] }, { "id": -54, "name": "jug~ssd", "type_id": 1, "type_name": "host", "weight": 514924, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 68, "weight": 57212, "pos": 0 }, { "id": 69, "weight": 57212, "pos": 1 }, { "id": 70, "weight": 57212, "pos": 2 }, { "id": 71, "weight": 57212, "pos": 3 }, { "id": 72, "weight": 57212, "pos": 4 }, { "id": 35, "weight": 228864, "pos": 5 } ] }, { "id": -55, "name": "lx01", "type_id": 1, "type_name": "host", "weight": 915699, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 14, "weight": 228923, "pos": 0 }, { "id": 27, "weight": 228923, "pos": 1 }, { "id": 29, "weight": 228923, "pos": 2 }, { "id": 48, "weight": 114465, "pos": 3 }, { "id": 49, "weight": 114465, "pos": 4 } ] }, { "id": -56, "name": "lx01~ssd", "type_id": 1, "type_name": "host", "weight": 915699, "alg": "straw2", "hash": "rjenkins1", "items": [ { "id": 14, "weight": 228923, "pos": 0 }, { "id": 27, "weight": 228923, "pos": 1 }, { "id": 29, "weight": 228923, "pos": 2 }, { "id": 48, "weight": 114465, "pos": 3 }, { "id": 49, "weight": 114465, "pos": 4 } ] } ], "rules": [ { "rule_id": 0, "rule_name": "data", "ruleset": 0, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1, "item_name": "default" }, { "op": "chooseleaf_firstn", "num": 0, "type": "rack" }, { "op": "emit" } ] }, { "rule_id": 1, "rule_name": "metadata", "ruleset": 1, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1, "item_name": "default" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] }, { "rule_id": 2, "rule_name": "rbd", "ruleset": 2, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1, "item_name": "default" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] } ], "tunables": { "choose_local_tries": 0, "choose_local_fallback_tries": 0, "choose_total_tries": 50, "chooseleaf_descend_once": 1, "chooseleaf_vary_r": 1, "chooseleaf_stable": 1, "straw_calc_version": 1, "allowed_bucket_algs": 54, "profile": "jewel", "optimal_tunables": 1, "legacy_tunables": 0, "minimum_required_version": "jewel", "require_feature_tunables": 1, "require_feature_tunables2": 1, "has_v2_rules": 0, "require_feature_tunables3": 1, "has_v3_rules": 0, "has_v4_buckets": 1, "require_feature_tunables5": 1, "has_v5_rules": 0 }, "choose_args": {} } -- Joe Ryner

1 year, 2 months

1
0
0 0

Accessing OSD objects

by Geoffrey Rhodes

Hello all, I'd really appreciate some input from the more knowledgeable here. Is there a way I can access OSD objects if I have a BlueFS replay error? This error prevents me starting the OSD and also throws an error if I try using the bluestore or objectstore tools. - I can however run a ceph-bluestore-tool show-label without issue. I'm hoping there is another way or possibly a way to purge this log that I can still access the objects on this OSD. If deleting this reply log will help (even with some data loss) I'm happy to try it. This has caused a PG to go inactive and I'm considering deleting the PG and force re-creating it. - Saw this mentioned as a last resort option. Below is a snip of where things go wrong. - I don't know if there is even a chance or is this an unrecoverable state? 2023-01-25T10:05:26.543+0000 7fa773a14240 20 bluefs _replay 0x0: op_dir_link db/031664.sst to 29549 2023-01-25T10:05:26.543+0000 7fa773a14240 20 bluefs _replay 0x0: op_dir_link db/031665.sst to 29550 2023-01-25T10:05:26.543+0000 7fa773a14240 20 bluefs _replay 0x0: op_dir_link db/031666.sst to 29551 2023-01-25T10:05:26.543+0000 7fa773a14240 20 bluefs _replay 0x0: op_dir_link db/CURRENT to 29543 2023-01-25T10:05:26.543+0000 7fa773a14240 20 bluefs _replay 0x0: op_dir_link db/IDENTITY to 5 2023-01-25T10:05:26.543+0000 7fa773a14240 20 bluefs _replay 0x0: op_dir_link db/LOCK to 2 2023-01-25T10:05:26.543+0000 7fa773a14240 20 bluefs _replay 0x0: op_dir_link db/MANIFEST-031657 to 29542 2023-01-25T10:05:26.543+0000 7fa773a14240 20 bluefs _replay 0x0: op_dir_link db/OPTIONS-031645 to 29529 2023-01-25T10:05:26.543+0000 7fa773a14240 20 bluefs _replay 0x0: op_dir_link db/OPTIONS-031660 to 29545 2023-01-25T10:05:26.543+0000 7fa773a14240 20 bluefs _replay 0x0: op_dir_create db.slow 2023-01-25T10:05:26.543+0000 7fa773a14240 20 bluefs _replay 0x0: op_jump seq 5204712 offset 0x20000 2023-01-25T10:05:26.543+0000 7fa773a14240 10 bluefs _read h 0x55d2f1cfdb80 0x10000~10000 from file(ino 1 size 0x0 mtime 2022-10-07T17:55:34.189440+0000 allocated 420000 alloc_commit 420000 extents [1:0x1770170000~20000,1:0x53d1e900000~400000]) 2023-01-25T10:05:26.543+0000 7fa773a14240 20 bluefs _read left 0x10000 len 0x10000 2023-01-25T10:05:26.543+0000 7fa773a14240 20 bluefs _read got 65536 2023-01-25T10:05:26.543+0000 7fa773a14240 10 bluefs _read h 0x55d2f1cfdb80 0x20000~1000 from file(ino 1 size 0x20000 mtime 2022-10-07T17:55:34.189440+0000 allocated 420000 alloc_commit 420000 extents [1:0x1770170000~20000,1:0x53d1e900000~400000]) 2023-01-25T10:05:26.543+0000 7fa773a14240 20 bluefs _read fetching 0x0~100000 of 1:0x53d1e900000~400000 2023-01-25T10:05:26.547+0000 7fa773a14240 20 bluefs _read left 0x100000 len 0x1000 2023-01-25T10:05:26.547+0000 7fa773a14240 20 bluefs _read got 4096 2023-01-25T10:05:26.547+0000 7fa773a14240 10 bluefs _replay 0x20000: txn(seq 5204713 len 0x55 crc 0x81f48b1c) 2023-01-25T10:05:26.547+0000 7fa773a14240 20 bluefs _replay 0x20000: op_file_update file(ino 29551 size 0x0 mtime 2022-10-07T17:55:34.151007+0000 allocated 0 alloc_commit 0 extents []) 2023-01-25T10:05:26.547+0000 7fa773a14240 20 bluefs _replay 0x20000: op_dir_link db/031666.sst to 29551 2023-01-25T10:05:26.555+0000 7fa773a14240 -1 /build/ceph-17.2.5/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_replay(bool, bool)' thread 7fa773a14240 time 2023-01-25T10:05:26.551808+0000 /build/ceph-17.2.5/src/os/bluestore/BlueFS.cc: 1419: FAILED ceph_assert(r == q->second->file_map.end()) Kind regards Geoff

1 year, 2 months

1
0
0 0

Daily failed capability releases, slow ops, fully stuck IO

by Kuhring, Mathias

Dear Ceph community, since about two or three weeks, we have CephFS clients regularly failing to respond to capability releases accompanied OSD slow ops. By now, this happens daily every time clients get more active (e.g. during nightly backups). We mostly observe it with a handful of highly active clients, so correlating with IO volume. But we have over 250 clients which mount the CephFS and plan to get them all more active soon. What's worrying me further, it doesn't seem to effect only the clients which fail to respond to the capability release. But also other clients get just stuck accessing data on the CephFS. So far I've been tracking down the corresponding OSDs via the client (`cat /sys/kernel/debug/ceph/*/osdc`) and restarted them one by one. But since this is now a regular/systemic issue, this is obviously no sustainable solution. This would be usually a handful of OSDs per client and I couldn't observe any particular pattern of involved OSDs, yet. Our cluster still runs on CentOS 7 with kernel 3.10.0-1160.42.2.el7.x86_64 using cephadm with ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable). Most active clients are currently on kernel versions such as: 4.18.0-348.el8.0.2.x86_64, 4.18.0-348.2.1.el8_5.x86_64, 4.18.0-348.7.1.el8_5.x86_64 I picked up some logging ideas from an older issue with similar symptoms: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/CKTIM6LF274… But this has been already fixed in the kernel client and I don't have similar things in the logs. But also I'm not sure if the things I digging up in the logs are actually useful. And if I'm actually looking in the right places. So, I enabled "debug_ms 1" for the OSDs as suggested in the other thread. But this filled up our host disks pretty fast, leading to e.g. monitors crashing. I disabled the debug messages again and trimmed logs to free up space. But I made copies of two OSD logs files which were involved in a capability release / slow requests issue. They are quite big now (~3GB) and even if I remove things like ping stuff, I have more than 1 million lines just for the morning until the disk space was full (around 7 hours). So now I'm wondering how to filter/look for the right things here. When I grep for "error", I get a few of these messages: {"log":"debug 2023-02-22T06:18:08.113+0000 7f15c5fff700 1 -- [v2:192.168.1.13:6881/4149819408,v1:192.168.1.13:6884/4149819408] \u003c== osd.161 v2:192.168.1.31:6835/1012436344 182573 ==== pg_update_log_missing(3.1a6s2 epoch 646235/644895 rep_tid 1014320 entries 646235'7672108 (0'0) error 3:65836dde:::10016e9b7c8.00000000:head by mds.0.1221974:8515830 0.000000 -2 ObjectCleanRegions clean_offsets: [0~18446744073709551615], clean_omap: 1, new_object: 0 trim_to 646178'7662340 roll_forward_to 646192'7672106) v3 ==== 261+0+0 (crc 0 0 0) 0x562d55e52380 con 0x562d8a2de400\n","stream":"stderr","time":"2023-02-22T06:18:08.115002765Z"} And if I grep for "failed", I get a couple of those: {"log":"debug 2023-02-22T06:15:25.242+0000 7f58bbf7c700 1 -- [v2:172.16.62.11:6829/3509070161,v1:172.16.62.11:6832/3509070161] \u003e\u003e 172.16.62.10:0/3127362489 conn(0x55ba06bf3c00 msgr2=0x55b9ce07e580 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed\n","stream":"stderr","time":"2023-02-22T06:15:25.243808392Z"} {"log":"debug 2023-02-22T06:15:25.242+0000 7f58bbf7c700 1 --2- [v2:172.16.62.11:6829/3509070161,v1:172.16.62.11:6832/3509070161] \u003e\u003e 172.16.62.10:0/3127362489 conn(0x55ba06bf3c00 0x55b9ce07e580 crc :-1 s=READY pgs=2096664 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) Operation not permitted)\n","stream":"stderr","time":"2023-02-22T06:15:25.243813528Z"} Not sure, if they are related to the issue. In the kernel logs of the client (dmesg, journalctl or /var/log/messages), there seem to be no errors or any stack traces in the relevant time periods. The only thing I can see is me restarting the relevant OSDs: [Mi Feb 22 07:29:59 2023] libceph: osd90 down [Mi Feb 22 07:30:34 2023] libceph: osd90 up [Mi Feb 22 07:31:55 2023] libceph: osd93 down [Mi Feb 22 08:37:50 2023] libceph: osd93 up I noticed a socket closed for another client, but I assume that's more related to monitors failing due to full disks: [Mi Feb 22 05:59:52 2023] libceph: mon2 (1)172.16.62.12:6789 socket closed (con state OPEN) [Mi Feb 22 05:59:52 2023] libceph: mon2 (1)172.16.62.12:6789 session lost, hunting for new mon [Mi Feb 22 05:59:52 2023] libceph: mon3 (1)172.16.62.13:6789 session established I would appreciate if anybody has a suggestion where I should look next. Thank you for your help Best Wishes, Mathias

1 year, 2 months

1
0
0 0

MDS stuck in "up:replay"

by Thomas Widhalm

Hi, I'm really lost with my Ceph system. I built a small cluster for home usage which has two uses for me: I want to replace an old NAS and I want to learn about Ceph so that I have hands-on experience. We're using it in our company but I need some real-life experience without risking any company or customers data. That's my preferred way of learning. The cluster consists of 3 Raspberry Pis plus a few VMs running on Proxmox. I'm not using Proxmox' built in Ceph because I want to focus on Ceph and not just use it as a preconfigured tool. All hosts are running Fedora (x86_64 and arm64) and during an Upgrade from F36 to F37 my cluster suddenly showed all PGs as unavailable. I worked nearly a week to get it back online and I learned a lot about Ceph management and recovery. The cluster is back but I still can't access my data. Maybe you can help me? Here are my versions: [ceph: root@ceph04 /]# ceph versions { "mon": { "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 3 }, "mgr": { "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 3 }, "osd": { "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 5 }, "mds": { "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 4 }, "overall": { "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 15 } } Here's MDS status output of one MDS: [ceph: root@ceph04 /]# ceph tell mds.mds01.ceph05.pqxmvt status 2023-01-14T15:30:28.607+0000 7fb9e17fa700 0 client.60986454 ms_handle_reset on v2:192.168.23.65:6800/2680651694 2023-01-14T15:30:28.640+0000 7fb9e17fa700 0 client.60986460 ms_handle_reset on v2:192.168.23.65:6800/2680651694 { "cluster_fsid": "ff6e50de-ed72-11ec-881c-dca6325c2cc4", "whoami": 0, "id": 60984167, "want_state": "up:replay", "state": "up:replay", "fs_name": "cephfs", "replay_status": { "journal_read_pos": 0, "journal_write_pos": 0, "journal_expire_pos": 0, "num_events": 0, "num_segments": 0 }, "rank_uptime": 1127.54018615, "mdsmap_epoch": 98056, "osdmap_epoch": 12362, "osdmap_epoch_barrier": 0, "uptime": 1127.957307273 } It's staying like that for days now. If there was a counter moving, I just would wait but it doesn't change anything and alle stats says, the MDS aren't working at all. The symptom I have is that Dashboard and all other tools I use say, it's more or less ok. (Some old messages about failed daemons and scrubbing aside). But I can't mount anything. When I try to start a VM that's on RDS I just get a timeout. And when I try to mount a CephFS, mount just hangs forever. Whatever command I give MDS or journal, it just hangs. The only thing I could do, was take all CephFS offline, kill the MDS's and do a "ceph fs reset <fs name> --yes-i-really-mean-it". After that I rebooted all nodes, just to be sure but I still have no access to data. Could you please help me? I'm kinda desperate. If you need any more information, just let me know. Cheers, Thomas -- Thomas Widhalm Lead Systems Engineer NETWAYS Professional Services GmbH | Deutschherrnstr. 15-19 | D-90429 Nuernberg Tel: +49 911 92885-0 | Fax: +49 911 92885-77 CEO: Julian Hein, Bernd Erk | AG Nuernberg HRB34510 https://www.netways.de | thomas.widhalm(a)netways.de ** stackconf 2023 - September - https://stackconf.eu ** ** OSMC 2023 - November - https://osmc.de ** ** New at NWS: Managed Database - https://nws.netways.de/managed-database ** ** NETWAYS Web Services - https://nws.netways.de **

1 year, 2 months

7
23
0 0

kernel client osdc ops stuck and mds slow reqs

by Dan van der Ster

Hi all, We are quite regularly (a couple times per week) seeing: HEALTH_WARN 1 clients failing to respond to capability release; 1 MDSs report slow requests MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability release mdshpc-be143(mds.0): Client hpc-be028.cern.ch: failing to respond to capability release client_id: 52919162 MDS_SLOW_REQUEST 1 MDSs report slow requests mdshpc-be143(mds.0): 1 slow requests are blocked > 30 secs Which is being caused by osdc ops stuck in a kernel client, e.g.: 10:57:18 root hpc-be028 /root → cat /sys/kernel/debug/ceph/4da6fd06-b069-49af-901f-c9513baabdbd.client52919162/osdc REQUESTS 9 homeless 0 46559317 osd243 3.ee6ffcdb 3.cdb [243,501,92]/243 [243,501,92]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a01.00000057 0x400014 1 read 46559322 osd243 3.ee6ffcdb 3.cdb [243,501,92]/243 [243,501,92]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a01.00000057 0x400014 1 read 46559323 osd243 3.969cc573 3.573 [243,330,226]/243 [243,330,226]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 0x400014 1 read 46559341 osd243 3.969cc573 3.573 [243,330,226]/243 [243,330,226]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 0x400014 1 read 46559342 osd243 3.969cc573 3.573 [243,330,226]/243 [243,330,226]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 0x400014 1 read 46559345 osd243 3.969cc573 3.573 [243,330,226]/243 [243,330,226]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 0x400014 1 read 46559621 osd243 3.6313e8ef 3.8ef [243,330,521]/243 [243,330,521]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a45.0000007a 0x400014 1 read 46559629 osd243 3.b280c852 3.852 [243,113,539]/243 [243,113,539]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a3a.0000007f 0x400014 1 read 46559928 osd243 3.1ee7bab4 3.ab4 [243,332,94]/243 [243,332,94]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f099ff.0000073f 0x400024 1 write LINGER REQUESTS BACKOFFS We can unblock those requests by doing `ceph osd down osd.243` (or restarting osd.243). This is ceph v14.2.6 and the client kernel is el7 3.10.0-957.27.2.el7.x86_64. Are there a better way to debug this? Best Regards, Dan

1 year, 2 months

4
12
0 0

slow replication of large buckets

by Glaza

Hi Cephers, We have two octopus 15.2.17 clusters in a multisite configuration. Every once in a while we have to perform a bucket reshard (most recently on 613 shards) and this practically kills our replication for a few days. Does anyone know of any priority mechanics within sync to give priority to other buckets and/or lower them? Are there any improvements to this in higher versions of ceph that we could take advantage of if we upgrade the cluster (I haven't found any)? How to safely perform the increase of rgw_data_log_num_shards, because the documentation only says: "The values of rgw_data_log_num_shards and rgw_md_log_max_shards should not be changed after sync has started." Does this mean that I should block access to the cluster, wait until sync is caught up with source/master, change this value, restart rgw and unblock access? Kind Regards, Tom

1 year, 2 months

1
0
0 0

increasing PGs OOM kill SSD OSDs (octopus) - unstable OSD behavior

by Boris Behrens

Hi, today I wanted to increase the PGs from 2k -> 4k and random OSDs went offline in the cluster. After some investigation we saw, that the OSDs got OOM killed (I've seen a host that went from 90GB used memory to 190GB before OOM kills happen). We have around 24 SSD OSDs per host and 128GB/190GB/265GB memory in these hosts. All of them experienced OOM kills. All hosts are octopus / ubuntu 20.04. And on every step new OSDs crashed with OOM. (We now set the pg_num/pgp_num to 2516 to stop the process). The OSD logs do not show anything why this might happen. Some OSDs also segfault. I now started to stop all OSDs on a host, and do a "ceph-bluestore-tool repair" and a "ceph-kvstore-tool bluestore-kv compact" on all OSDs. This takes for the 8GB OSDs around 30 minutes. When I start the OSDs I instantly get a lot of slow OPS from all the other OSDs when the OSD come up (the 8TB OSDs take around 10 minutes with "load_pgs". I am unsure what I can do to restore normal cluster performance. Any ideas or suggestions or maybe even known bugs? Maybe a line for what I can search in the logs. Cheers Boris

1 year, 2 months

2
3
0 0

setup problem for ingress + SSL for RGW

by Patrick Vranckx

Hi, Our cluster runs Pacific on Rocky8. We have 3 rgw running on port 7480. I tried to setup an ingress service with a yaml definition of service: no luck service_type: ingress service_id: rgw.myceph.be placement: hosts: - ceph001 - ceph002 - ceph003 spec: backend_service: rgw.myceph.be virtual_ip: 192.168.0.10 frontend_port: 443 monitor_port: 9000 ssl_cert: | -----BEGIN PRIVATE KEY----- ... -----END PRIVATE KEY----- -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- I tried to setup the ingress service with the dashboard... still no luck. I started debugging the problem. 1. Even if I entered the certificate and the private key in the form, CEPH complained about no haproxy.pem.key file. I added manually the file in the container folder definition. Haproxy containers started ! 2. Looking at the monitoring page of HAProxy, I realized that there was no backend server defined. In the form, I selected manually the servers running the rgw. In the container definition folder, the backend definition of haproxy.cfg looks like: ... backend backend option forwardfor balance static-rr option httpchk HEAD / HTTP/1.0 No mention of servers or port 7480 Once again, I added the definition manually : server ceph001 192.168.0.1:7480 check server ceph004 192.168.0.2:7480 check server ceph008 192.168.0.2:7480 check and redeployed the containers. It's working. Any idea ? Patrick

1 year, 2 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users February 2023