October 2023 - ceph-users

by Rok Jaklič

Hi, yesterday we changed RGW from civetweb to beast and at 04:02 RGW stopped working; we had to restart it in the morning. In one rgw log for previous day we can see: 2023-10-06T04:02:01.105+0200 7fb71d45d700 -1 received signal: Hangup from killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw rbd-mirror cephfs-mirror (PID: 3202663) UID: 0 and in the next day log we can see: 2023-10-06T04:02:01.133+0200 7fb71d45d700 -1 received signal: Hangup from (PID: 3202664) UID: 0 and after that no requests came. We had to restart rgw. In ceph.conf we have something like [client.radosgw.ctplmon2] host = ctplmon2 log_file = /var/log/ceph/client.radosgw.ctplmon2.log rgw_dns_name = ctplmon2 rgw_frontends = "beast ssl_endpoint=0.0.0.0:4443 ssl_certificate=..." rgw_max_put_param_size = 15728640 We assume it has something to do with logrotate. /etc/logrotate.d/ceph: /var/log/ceph/*.log { rotate 90 daily compress sharedscripts postrotate killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw rbd-mirror cephfs-mirror || pkill -1 -x "ceph-mon|ceph-mgr|ceph-mds|ceph-osd|ceph-fuse|radosgw|rbd-mirror|cephfs-mirror" || true endscript missingok notifempty su root ceph } ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific (stable) And ideas why this happend? Kind regards, Rok

7 months, 1 week

1
1
0 0

Ceph 16.2.x excessive logging, how to reduce?

by Zakhar Kirpichenko

Hi, Our Ceph 16.2.x cluster managed by cephadm is logging a lot of very detailed messages, Ceph logs alone on hosts with monitors and several OSDs has already eaten through 50% of the endurance of the flash system drives over a couple of years. Cluster logging settings are default, and it seems that all daemons are writing lots and lots of debug information to the logs, such as for example: https://pastebin.com/ebZq8KZk (it's just a snippet, but there's lots and lots of various information). Is there a way to reduce the amount of logging and, for example, limit the logging to warnings or important messages so that it doesn't include every successful authentication attempt, compaction etc, etc, when the cluster is healthy and operating normally? I would very much appreciate your advice on this. Best regards, Zakhar

7 months, 1 week

4
12
0 0

Manual resharding with multisite

by Yixin Jin

Hi folks, I am aware that dynamic resharding isn't supported before Reef with multisite. However, does manual resharding work? It doesn't seem to be so, either. First of all, running "bucket reshard" has to be in the master zone. But if the objects of that bucket isn't in the master zone, resharding in the master zone seems to render those objects inaccessible in the zone that actually has them. So, what is recommended practice of resharding with multiste? No resharding at all? Thanks, Yixin

7 months, 1 week

3
2
0 0

If you know your cluster is performing as expected？

by Louis Koo

My ceph cluster was made of all-flash nvme, like this: 1. 10*nodes , every node has 21 nvme device was used to rgw data pool, and other 1 nvme was used to index pool ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -26 1467.11609 root bd-ssd03-data -43 146.71161 host bd-ssd03-node01-f6fe3bdc-a1b6-496f-8fff-adb240afaedf 0 nvme 6.98627 osd.0 up 1.00000 1.00000 4 nvme 6.98627 osd.4 up 1.00000 1.00000 11 nvme 6.98627 osd.11 up 1.00000 1.00000 21 nvme 6.98627 osd.21 up 1.00000 1.00000 31 nvme 6.98627 osd.31 up 1.00000 1.00000 46 nvme 6.98627 osd.46 up 1.00000 1.00000 56 nvme 6.98627 osd.56 up 1.00000 1.00000 67 nvme 6.98627 osd.67 up 1.00000 1.00000 77 nvme 6.98627 osd.77 up 1.00000 1.00000 87 nvme 6.98627 osd.87 up 1.00000 1.00000 97 nvme 6.98627 osd.97 up 1.00000 1.00000 107 nvme 6.98627 osd.107 up 1.00000 1.00000 117 nvme 6.98627 osd.117 up 1.00000 1.00000 127 nvme 6.98627 osd.127 up 1.00000 1.00000 137 nvme 6.98627 osd.137 up 1.00000 1.00000 147 nvme 6.98627 osd.147 up 1.00000 1.00000 157 nvme 6.98627 osd.157 up 1.00000 1.00000 168 nvme 6.98627 osd.168 up 1.00000 1.00000 178 nvme 6.98627 osd.178 up 1.00000 1.00000 188 nvme 6.98627 osd.188 up 1.00000 1.00000 198 nvme 6.98627 osd.198 up 1.00000 1.00000 -31 146.71161 host bd-ssd03-node02-ffedbbd9-3794-4b10-ac1d-bf7595395892 1 nvme 6.98627 osd.1 up 1.00000 1.00000 6 nvme 6.98627 osd.6 up 1.00000 1.00000 14 nvme 6.98627 osd.14 up 1.00000 1.00000 24 nvme 6.98627 osd.24 up 1.00000 1.00000 34 nvme 6.98627 osd.34 up 1.00000 1.00000 45 nvme 6.98627 osd.45 up 1.00000 1.00000 55 nvme 6.98627 osd.55 up 1.00000 1.00000 66 nvme 6.98627 osd.66 up 1.00000 1.00000 76 nvme 6.98627 osd.76 up 1.00000 1.00000 86 nvme 6.98627 osd.86 up 1.00000 1.00000 96 nvme 6.98627 osd.96 up 1.00000 1.00000 106 nvme 6.98627 osd.106 up 1.00000 1.00000 116 nvme 6.98627 osd.116 up 1.00000 1.00000 130 nvme 6.98627 osd.130 up 1.00000 1.00000 141 nvme 6.98627 osd.141 up 1.00000 1.00000 151 nvme 6.98627 osd.151 up 1.00000 1.00000 161 nvme 6.98627 osd.161 up 1.00000 1.00000 171 nvme 6.98627 osd.171 up 1.00000 1.00000 181 nvme 6.98627 osd.181 up 1.00000 1.00000 191 nvme 6.98627 osd.191 up 1.00000 1.00000 201 nvme 6.98627 osd.201 up 1.00000 1.00000 -29 146.71161 host bd-ssd03-node03-637c96a1-6b93-42cc-9b35-44b729bd7a5c 2 nvme 6.98627 osd.2 up 1.00000 1.00000 8 nvme 6.98627 osd.8 up 1.00000 1.00000 17 nvme 6.98627 osd.17 up 1.00000 1.00000 27 nvme 6.98627 osd.27 up 1.00000 1.00000 37 nvme 6.98627 osd.37 up 1.00000 1.00000 48 nvme 6.98627 osd.48 up 1.00000 1.00000 58 nvme 6.98627 osd.58 up 1.00000 1.00000 68 nvme 6.98627 osd.68 up 1.00000 1.00000 79 nvme 6.98627 osd.79 up 1.00000 1.00000 89 nvme 6.98627 osd.89 up 1.00000 1.00000 99 nvme 6.98627 osd.99 up 1.00000 1.00000 109 nvme 6.98627 osd.109 up 1.00000 1.00000 119 nvme 6.98627 osd.119 up 1.00000 1.00000 129 nvme 6.98627 osd.129 up 1.00000 1.00000 140 nvme 6.98627 osd.140 up 1.00000 1.00000 150 nvme 6.98627 osd.150 up 1.00000 1.00000 166 nvme 6.98627 osd.166 up 1.00000 1.00000 176 nvme 6.98627 osd.176 up 1.00000 1.00000 186 nvme 6.98627 osd.186 up 1.00000 1.00000 196 nvme 6.98627 osd.196 up 1.00000 1.00000 206 nvme 6.98627 osd.206 up 1.00000 1.00000 -41 146.71161 host bd-ssd03-node04-87186f77-3df0-4b71-8103-fdc2a104eb17 3 nvme 6.98627 osd.3 up 1.00000 1.00000 10 nvme 6.98627 osd.10 up 1.00000 1.00000 20 nvme 6.98627 osd.20 up 1.00000 1.00000 30 nvme 6.98627 osd.30 up 1.00000 1.00000 40 nvme 6.98627 osd.40 up 1.00000 1.00000 50 nvme 6.98627 osd.50 up 1.00000 1.00000 60 nvme 6.98627 osd.60 up 1.00000 1.00000 70 nvme 6.98627 osd.70 up 1.00000 1.00000 81 nvme 6.98627 osd.81 up 1.00000 1.00000 90 nvme 6.98627 osd.90 up 1.00000 1.00000 100 nvme 6.98627 osd.100 up 1.00000 1.00000 110 nvme 6.98627 osd.110 up 1.00000 1.00000 120 nvme 6.98627 osd.120 up 1.00000 1.00000 132 nvme 6.98627 osd.132 up 1.00000 1.00000 143 nvme 6.98627 osd.143 up 1.00000 1.00000 153 nvme 6.98627 osd.153 up 1.00000 1.00000 162 nvme 6.98627 osd.162 up 1.00000 1.00000 172 nvme 6.98627 osd.172 up 1.00000 1.00000 182 nvme 6.98627 osd.182 up 1.00000 1.00000 192 nvme 6.98627 osd.192 up 1.00000 1.00000 202 nvme 6.98627 osd.202 up 1.00000 1.00000 -25 146.71161 host bd-ssd03-node05-a6f31379-cb27-4bc3-bbda-726023d9ff96 5 nvme 6.98627 osd.5 up 1.00000 1.00000 13 nvme 6.98627 osd.13 up 1.00000 1.00000 23 nvme 6.98627 osd.23 up 1.00000 1.00000 33 nvme 6.98627 osd.33 up 1.00000 1.00000 43 nvme 6.98627 osd.43 up 1.00000 1.00000 53 nvme 6.98627 osd.53 up 1.00000 1.00000 64 nvme 6.98627 osd.64 up 1.00000 1.00000 78 nvme 6.98627 osd.78 up 1.00000 1.00000 88 nvme 6.98627 osd.88 up 1.00000 1.00000 98 nvme 6.98627 osd.98 up 1.00000 1.00000 108 nvme 6.98627 osd.108 up 1.00000 1.00000 118 nvme 6.98627 osd.118 up 1.00000 1.00000 128 nvme 6.98627 osd.128 up 1.00000 1.00000 139 nvme 6.98627 osd.139 up 1.00000 1.00000 149 nvme 6.98627 osd.149 up 1.00000 1.00000 159 nvme 6.98627 osd.159 up 1.00000 1.00000 169 nvme 6.98627 osd.169 up 1.00000 1.00000 179 nvme 6.98627 osd.179 up 1.00000 1.00000 189 nvme 6.98627 osd.189 up 1.00000 1.00000 199 nvme 6.98627 osd.199 up 1.00000 1.00000 209 nvme 6.98627 osd.209 up 1.00000 1.00000 -33 146.71161 host bd-ssd03-node06-e4288b8f-7d64-40c1-a3f1-c60757e1fbb1 7 nvme 6.98627 osd.7 up 1.00000 1.00000 16 nvme 6.98627 osd.16 up 1.00000 1.00000 26 nvme 6.98627 osd.26 up 1.00000 1.00000 36 nvme 6.98627 osd.36 up 1.00000 1.00000 51 nvme 6.98627 osd.51 up 1.00000 1.00000 62 nvme 6.98627 osd.62 up 1.00000 1.00000 71 nvme 6.98627 osd.71 up 1.00000 1.00000 82 nvme 6.98627 osd.82 up 1.00000 1.00000 93 nvme 6.98627 osd.93 up 1.00000 1.00000 103 nvme 6.98627 osd.103 up 1.00000 1.00000 113 nvme 6.98627 osd.113 up 1.00000 1.00000 123 nvme 6.98627 osd.123 up 1.00000 1.00000 134 nvme 6.98627 osd.134 up 1.00000 1.00000 145 nvme 6.98627 osd.145 up 1.00000 1.00000 155 nvme 6.98627 osd.155 up 1.00000 1.00000 165 nvme 6.98627 osd.165 up 1.00000 1.00000 175 nvme 6.98627 osd.175 up 1.00000 1.00000 185 nvme 6.98627 osd.185 up 1.00000 1.00000 195 nvme 6.98627 osd.195 up 1.00000 1.00000 208 nvme 6.98627 osd.208 up 1.00000 1.00000 215 nvme 6.98627 osd.215 up 1.00000 1.00000 -37 146.71161 host bd-ssd03-node07-fabbd3f5-1466-493b-85f4-dac9254720c4 9 nvme 6.98627 osd.9 up 1.00000 1.00000 18 nvme 6.98627 osd.18 up 1.00000 1.00000 28 nvme 6.98627 osd.28 up 1.00000 1.00000 44 nvme 6.98627 osd.44 up 1.00000 1.00000 54 nvme 6.98627 osd.54 up 1.00000 1.00000 65 nvme 6.98627 osd.65 up 1.00000 1.00000 75 nvme 6.98627 osd.75 up 1.00000 1.00000 85 nvme 6.98627 osd.85 up 1.00000 1.00000 95 nvme 6.98627 osd.95 up 1.00000 1.00000 105 nvme 6.98627 osd.105 up 1.00000 1.00000 115 nvme 6.98627 osd.115 up 1.00000 1.00000 125 nvme 6.98627 osd.125 up 1.00000 1.00000 136 nvme 6.98627 osd.136 up 1.00000 1.00000 146 nvme 6.98627 osd.146 up 1.00000 1.00000 156 nvme 6.98627 osd.156 up 1.00000 1.00000 167 nvme 6.98627 osd.167 up 1.00000 1.00000 177 nvme 6.98627 osd.177 up 1.00000 1.00000 187 nvme 6.98627 osd.187 up 1.00000 1.00000 197 nvme 6.98627 osd.197 up 1.00000 1.00000 207 nvme 6.98627 osd.207 up 1.00000 1.00000 214 nvme 6.98627 osd.214 up 1.00000 1.00000 -35 146.71161 host bd-ssd03-node08-65ab0995-22d0-4b4c-b661-4cc53ceb5b61 12 nvme 6.98627 osd.12 up 1.00000 1.00000 22 nvme 6.98627 osd.22 up 1.00000 1.00000 32 nvme 6.98627 osd.32 up 1.00000 1.00000 42 nvme 6.98627 osd.42 up 1.00000 1.00000 52 nvme 6.98627 osd.52 up 1.00000 1.00000 63 nvme 6.98627 osd.63 up 1.00000 1.00000 73 nvme 6.98627 osd.73 up 1.00000 1.00000 84 nvme 6.98627 osd.84 up 1.00000 1.00000 94 nvme 6.98627 osd.94 up 1.00000 1.00000 104 nvme 6.98627 osd.104 up 1.00000 1.00000 114 nvme 6.98627 osd.114 up 1.00000 1.00000 124 nvme 6.98627 osd.124 up 1.00000 1.00000 138 nvme 6.98627 osd.138 up 1.00000 1.00000 148 nvme 6.98627 osd.148 up 1.00000 1.00000 158 nvme 6.98627 osd.158 up 1.00000 1.00000 170 nvme 6.98627 osd.170 up 1.00000 1.00000 180 nvme 6.98627 osd.180 up 1.00000 1.00000 190 nvme 6.98627 osd.190 up 1.00000 1.00000 200 nvme 6.98627 osd.200 up 1.00000 1.00000 210 nvme 6.98627 osd.210 up 1.00000 1.00000 216 nvme 6.98627 osd.216 up 1.00000 1.00000 -39 146.71161 host bd-ssd03-node09-4833dda0-f95f-4116-97fa-635ee8685ba2 15 nvme 6.98627 osd.15 up 1.00000 1.00000 25 nvme 6.98627 osd.25 up 1.00000 1.00000 35 nvme 6.98627 osd.35 up 1.00000 1.00000 47 nvme 6.98627 osd.47 up 1.00000 1.00000 61 nvme 6.98627 osd.61 up 1.00000 1.00000 72 nvme 6.98627 osd.72 up 1.00000 1.00000 83 nvme 6.98627 osd.83 up 1.00000 1.00000 92 nvme 6.98627 osd.92 up 1.00000 1.00000 102 nvme 6.98627 osd.102 up 1.00000 1.00000 112 nvme 6.98627 osd.112 up 1.00000 1.00000 122 nvme 6.98627 osd.122 up 1.00000 1.00000 133 nvme 6.98627 osd.133 up 1.00000 1.00000 144 nvme 6.98627 osd.144 up 1.00000 1.00000 154 nvme 6.98627 osd.154 up 1.00000 1.00000 164 nvme 6.98627 osd.164 up 1.00000 1.00000 174 nvme 6.98627 osd.174 up 1.00000 1.00000 184 nvme 6.98627 osd.184 up 1.00000 1.00000 194 nvme 6.98627 osd.194 up 1.00000 1.00000 204 nvme 6.98627 osd.204 up 1.00000 1.00000 213 nvme 6.98627 osd.213 up 1.00000 1.00000 218 nvme 6.98627 osd.218 up 1.00000 1.00000 -45 146.71161 host bd-ssd03-node10-9cfe177c-2e0f-43d3-a1a9-6524dac0009a 19 nvme 6.98627 osd.19 up 1.00000 1.00000 29 nvme 6.98627 osd.29 up 1.00000 1.00000 39 nvme 6.98627 osd.39 up 1.00000 1.00000 49 nvme 6.98627 osd.49 up 1.00000 1.00000 59 nvme 6.98627 osd.59 up 1.00000 1.00000 69 nvme 6.98627 osd.69 up 1.00000 1.00000 80 nvme 6.98627 osd.80 up 1.00000 1.00000 91 nvme 6.98627 osd.91 up 1.00000 1.00000 101 nvme 6.98627 osd.101 up 1.00000 1.00000 111 nvme 6.98627 osd.111 up 1.00000 1.00000 121 nvme 6.98627 osd.121 up 1.00000 1.00000 131 nvme 6.98627 osd.131 up 1.00000 1.00000 142 nvme 6.98627 osd.142 up 1.00000 1.00000 152 nvme 6.98627 osd.152 up 1.00000 1.00000 163 nvme 6.98627 osd.163 up 1.00000 1.00000 173 nvme 6.98627 osd.173 up 1.00000 1.00000 183 nvme 6.98627 osd.183 up 1.00000 1.00000 193 nvme 6.98627 osd.193 up 1.00000 1.00000 203 nvme 6.98627 osd.203 up 1.00000 1.00000 212 nvme 6.98627 osd.212 up 1.00000 1.00000 217 nvme 6.98627 osd.217 up 1.00000 1.00000 -4 14.55475 root bd-ssd03-index -11 1.45547 host bd-ssd03-node01-0581ef65-8df6-4cfc-8992-08ffb7959914 41 nvme 1.45547 osd.41 up 1.00000 1.00000 -13 1.45547 host bd-ssd03-node02-0025ee25-77d9-4731-bc94-1c541e1fa2e7 126 nvme 1.45547 osd.126 up 1.00000 1.00000 -3 1.45547 host bd-ssd03-node03-a2629c55-9ddc-4c16-ba85-187af70eef23 160 nvme 1.45547 osd.160 up 1.00000 1.00000 -9 1.45547 host bd-ssd03-node04-9e02d968-28b5-43a8-b93a-418cb61af7ee 211 nvme 1.45547 osd.211 up 1.00000 1.00000 -15 1.45547 host bd-ssd03-node05-536f9f5e-cf15-4bb5-abaa-afd313473ab1 74 nvme 1.45547 osd.74 up 1.00000 1.00000 -23 1.45547 host bd-ssd03-node06-43949775-80b1-4297-b23f-d36ffb784d76 205 nvme 1.45547 osd.205 up 1.00000 1.00000 -21 1.45547 host bd-ssd03-node07-11ca9165-47b2-4b1d-84d7-c6b4f62f51eb 38 nvme 1.45547 osd.38 up 1.00000 1.00000 -7 1.45547 host bd-ssd03-node08-d6739f0c-a356-415c-ae4d-323cb4197efe 135 nvme 1.45547 osd.135 up 1.00000 1.00000 -17 1.45547 host bd-ssd03-node09-ece27fca-7711-45f1-b899-6d6f61aad2cf 57 nvme 1.45547 osd.57 up 1.00000 1.00000 -19 1.45547 host bd-ssd03-node10-c53d891c-b393-4663-88bf-f2ec4e9e0a9a 219 nvme 1.45547 osd.219 up 1.00000 1.00000 2. memory 512G [root@bd-ssd03-node01 ~]# free -g total used free shared buff/cache available Mem: 503 105 134 0 262 393 Swap: 0 0 0 3. cpu 128 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 256 On-line CPU(s) list: 0-255 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 2 NUMA node(s): 2 Vendor ID: AuthenticAMD BIOS Vendor ID: Advanced Micro Devices, Inc. CPU family: 25 Model: 1 Model name: AMD EPYC 7763 64-Core Processor BIOS Model name: AMD EPYC 7763 64-Core Processor Stepping: 1 CPU MHz: 3243.566 CPU max MHz: 3529.0520 CPU min MHz: 1500.0000 BogoMIPS: 4890.70 Virtualization: AMD-V L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 32768K 4. Network 50Gb/s Settings for bond4: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 50000Mb/s Duplex: Full Auto-negotiation: off Port: Other PHYAD: 0 Transceiver: internal Link detected: yes I'm using cosbench to test the rgw, the xml config of the cosbench is 10 driver to 10 rgw and every driver with 16 works: <?xml version="1.0" encoding="UTF-8" ?> <workload name="test" description="sample benchmark for s3"> <workflow> <workstage name="create bucket" closuredelay="0" config=""> <auth type="none" config=""/> <work name="rgw1-create" type="init" workers="1" interval="5" division="container" totalOps="2560000" rampup="0" rampdown="0" afr="0" totalBytes="0" config="cprefix=bdssd03-bucket;containers=r(1,10)"> <auth type="none" config=""/> <storage type="s3" config="accesskey=GHWT013TX7ISVKR430Y0;secretkey=u7aCkNfznEztWpVJcARHceVC0ZPjR8XiVQDPmzAR;endpoint=http://10.x.x.143:80;path_style_access=true"/> <operation type="init" ratio="100" division="container" config="containers=r(1,1);objects=r(0,0);sizes=c(0)B" id="none"/> </work> </workstage> <workstage name="write"> <work name="w1" workers="16" totalOps="2560000" driver="driver1"> <operation type="write" ratio="100" config="cprefix=bdssd03-bucket;oprefix=test4m-;containers=c(1);objects=r(1,2000000);sizes=c(4)MB;" /> <storage type="s3" config="path_style_access=true;accesskey=GHWT013TX7ISVKR430Y0;secretkey=u7aCkNfznEztWpVJcARHceVC0ZPjR8XiVQDPmzAR;endpoint=http://10.x.x.144:80;server_side_encryption=false; "/> </work> <work name="w2" workers="16" totalOps="2560000" driver="driver2"> <operation type="write" ratio="100" config="cprefix=bdssd03-bucket;oprefix=test4m-;containers=c(2);objects=r(1,2000000);sizes=c(4)MB;" /> <storage type="s3" config="path_style_access=true;accesskey=GHWT013TX7ISVKR430Y0;secretkey=u7aCkNfznEztWpVJcARHceVC0ZPjR8XiVQDPmzAR;endpoint=http://10.x.x.145:80;server_side_encryption=false;" /> </work> <work name="w3" workers="16" totalOps="2560000" driver="driver3"> <operation type="write" ratio="100" config="cprefix=bdssd03-bucket;oprefix=test4m-;containers=c(3);objects=r(1,2000000);sizes=c(4)MB;" /> <storage type="s3" config="path_style_access=true;accesskey=GHWT013TX7ISVKR430Y0;secretkey=u7aCkNfznEztWpVJcARHceVC0ZPjR8XiVQDPmzAR;endpoint=http://10.x.x.146:80;server_side_encryption=false;" /> </work> <work name="w4" workers="16" totalOps="2560000" driver="driver4"> <operation type="write" ratio="100" config="cprefix=bdssd03-bucket;oprefix=test4m-;containers=c(4);objects=r(1,2000000);sizes=c(4)MB;" /> <storage type="s3" config="path_style_access=true;accesskey=GHWT013TX7ISVKR430Y0;secretkey=u7aCkNfznEztWpVJcARHceVC0ZPjR8XiVQDPmzAR;endpoint=http://10.x.x.147:80;server_side_encryption=false; "/> </work> <work name="w5" workers="16" totalOps="2560000" driver="driver5"> <operation type="write" ratio="100" config="cprefix=bdssd03-bucket;oprefix=test4m-;containers=c(5);objects=r(1,2000000);sizes=c(4)MB;" /> <storage type="s3" config="path_style_access=true;accesskey=GHWT013TX7ISVKR430Y0;secretkey=u7aCkNfznEztWpVJcARHceVC0ZPjR8XiVQDPmzAR;endpoint=http://10.x.x.148:80;server_side_encryption=false;" /> </work> <work name="w6" workers="16" totalOps="2560000" driver="driver6"> <operation type="write" ratio="100" config="cprefix=bdssd03-bucket;oprefix=test4m-;containers=c(6);objects=r(1,2000000);sizes=c(4)MB;" /> <storage type="s3" config="path_style_access=true;accesskey=GHWT013TX7ISVKR430Y0;secretkey=u7aCkNfznEztWpVJcARHceVC0ZPjR8XiVQDPmzAR;endpoint=http://10.x.x.149:80;server_side_encryption=false;" /> </work> <work name="w7" workers="16" totalOps="2560000" driver="driver7"> <operation type="write" ratio="100" config="cprefix=bdssd03-bucket;oprefix=test4m-;containers=c(7);objects=r(1,2000000);sizes=c(4)MB;" /> <storage type="s3" config="path_style_access=true;accesskey=GHWT013TX7ISVKR430Y0;secretkey=u7aCkNfznEztWpVJcARHceVC0ZPjR8XiVQDPmzAR;endpoint=http://10.x.x.150:80;server_side_encryption=false; "/> </work> <work name="w8" workers="16" totalOps="2560000" driver="driver8"> <operation type="write" ratio="100" config="cprefix=bdssd03-bucket;oprefix=test4m-;containers=c(8);objects=r(1,2000000);sizes=c(4)MB;" /> <storage type="s3" config="path_style_access=true;accesskey=GHWT013TX7ISVKR430Y0;secretkey=u7aCkNfznEztWpVJcARHceVC0ZPjR8XiVQDPmzAR;endpoint=http://10.x.x.151:80;server_side_encryption=false;" /> </work> <work name="w9" workers="16" totalOps="2560000" driver="driver9"> <operation type="write" ratio="100" config="cprefix=bdssd03-bucket;oprefix=test4m-;containers=c(9);objects=r(1,2000000);sizes=c(4)MB;" /> <storage type="s3" config="path_style_access=true;accesskey=GHWT013TX7ISVKR430Y0;secretkey=u7aCkNfznEztWpVJcARHceVC0ZPjR8XiVQDPmzAR;endpoint=http://10.x.x.152:80;server_side_encryption=false;" /> </work> <work name="w10" workers="16" totalOps="2560000" driver="driver10"> <operation type="write" ratio="100" config="cprefix=bdssd03-bucket;oprefix=test4m-;containers=c(10);objects=r(1,2000000);sizes=c(4)MB;" /> <storage type="s3" config="path_style_access=true;accesskey=GHWT013TX7ISVKR430Y0;secretkey=u7aCkNfznEztWpVJcARHceVC0ZPjR8XiVQDPmzAR;endpoint=http://10.x.x.143:80;server_side_encryption=false; "/> </work> </workstage> <workstage name="cleanup"> <work type="cleanup" workers="16" config="cprefix=bdssd03-bucket;oprefix=test4m-;containers=r(1,10);objects=r(1,2000000)" /> <storage type="s3" config="path_style_access=true;accesskey=GHWT013TX7ISVKR430Y0;secretkey=u7aCkNfznEztWpVJcARHceVC0ZPjR8XiVQDPmzAR;endpoint=http://10.x.x.143:80;server_side_encryption=false;" /> </workstage> <workstage name="dispose"> <work type="dispose" workers="10" config="cprefix=bdssd03-bucket;containers=r(1,10)" /> <storage type="s3" config="path_style_access=true;accesskey=GHWT013TX7ISVKR430Y0;secretkey=u7aCkNfznEztWpVJcARHceVC0ZPjR8XiVQDPmzAR;endpoint=http://10.x.x.143:80;server_side_encryption=false;" /> </workstage> </workflow> </workload> the performance result is: cluster: id: 075b70fa-6a72-4ce2-8956-d249c7cf10ce health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 17h) mgr: a(active, since 17h), standbys: b osd: 220 osds: 220 up (since 17h), 220 in (since 17h) rgw: 10 daemons active (10 hosts, 1 zones) data: pools: 9 pools, 4273 pgs objects: 17.76M objects, 65 TiB usage: 82 TiB used, 1.4 PiB / 1.4 PiB avail pgs: 4273 active+clean io: client: 11 MiB/s rd, 9.2 GiB/s wr, 15.44k op/s rd, 28.70k op/s wr only can reach to 9.2 GiB with 4MB write, I don't know that whether the performance meet expectations?

7 months, 1 week

1
0
0 0

compounded problems interfering with recovery

by Simon Oosthoek

Hi we're still struggling with our getting our ceph to health_ok. We're having compounded issues interfering with recovery, as I understand it. To summarize, we have a cluster of 22 osd nodes running ceph 16.2.x. About a month back we had one of the OSDs break down (just the OS disk, but we didn't have a cold spare available, it took a week to get it fixed). Since the failure of the node, ceph has been repairing the situation of course, but then it became a problem that our OSDs are really unevenly balanced (lowest below 50%, highest around 85%). So whenever a disk fails (and there were 2 since then), the load spreads over the other OSDs and our fullest OSDs go over the 85% threshold, slowing down recovery, normal use and rebalancing. We had issues with degraded PGs, but they weren't being repaired (because we had turned on the scrubbing during recovery, since we got messages that lots of PGs weren't being scrubbed in time. Now there's still one remaining PG degraded because one object is unfound. The whole error state is taking far too long now and as this is going on, I was wondering how the balancer wasn't doing its job. Turns out this is dependent on the cluster being OK or at least not having any degraded things in it. The balancer hasn't done it's job even though our cluster was OK for a long time before; we added some 8 nodes a few years ago and still the newer nodes are having the lowest used OSDs. Our cluster has about 70-71% usage overall, but with the unbalanced situation we cannot grow any more. The single node issue (though now resolved) and ongoing disk failures (we are seeing a handful of OSDs with read-repaired messages), it looks like we can't get back to health for a while. I'm trying to mitigate this by reweighting the fullest OSDs, but the fuller OSDs keep going over the threshold, while the emptiest OSDs have plenty of space (just 55% full now). If you read this far ;-) I'm wondering, can I force repair a PG around all the restrictions so it doesn't block auto rebalancing? It seems to me, like that would help, but perhaps there are other things I can do as well? (Budget wise, adding more OSD nodes is a bit difficult at the moment...) Thanks for reading! Cheers /Simon

7 months, 2 weeks

1
0
0 0

Introduce: Storage stability testing and DATA consistency verifying tools and system

by 张友加

Dear All, I hope you are all well. I would like to introduce new tools I have developed, named "LBA tools" which including hd_write_verify & hd_write_verify_dump. github: https://github.com/zhangyoujia/hd_write_verify pdf: https://github.com/zhangyoujia/hd_write_verify/DISK&MEMORY stability testing and DATA consistency verifying tools and system.pdf ppt: https://github.com/zhangyoujia/hd_write_verify/存储稳定性测试与数据一致性校验工具和系统.pptx bin: https://github.com/zhangyoujia/hd_write_verify/bin iso: https://github.com/zhangyoujia/hd_write_verify/iso Data is a vital asset for many businesses, making storage stability and data consistency the most fundamental requirements in storage technology scenarios. The purpose of storage stability testing is to ensure that storage devices or systems can operate normally and remain stable over time, while also handling various abnormal situations such as sudden power outages and network failures. This testing typically includes stress testing, load testing, fault tolerance testing, and other evaluations to assess the performance and reliability of the storage system. Data consistency checking is designed to ensure that the data stored in the system is accurate and consistent. This means that whenever data changes occur, all replicas should be updated simultaneously to avoid data inconsistency. Data consistency checking typically involves aspects such as data integrity, accuracy, consistency, and reliability. LBA tools are very useful for testing Storage stability and verifying DATA consistency, there are much better than FIO & vdbench's verifying functions. I believe that LBA tools will have a positive impact on the community and help users handle storage data more effectively. Your feedback and suggestions are greatly appreciated, and I hope you can try using LBA tools and share your experiences and recommendations. Best regards

7 months, 2 weeks

2
1
0 0

cannot repair a handful of damaged pg's

by Simon Oosthoek

Hi we're still in HEALTH_ERR state with our cluster, this is the top of the output of `ceph health detail` HEALTH_ERR 1/846829349 objects unfound (0.000%); 248 scrub errors; Possible data damage: 1 pg recovery_unfound, 2 pgs inconsistent; Degraded data redundancy: 6/7118781559 objects degraded (0.000%), 1 pg degraded, 1 pg undersized; 63 pgs not deep-scrubbed in time; 657 pgs not scrubbed in time [WRN] OBJECT_UNFOUND: 1/846829349 objects unfound (0.000%) pg 26.323 has 1 unfound objects [ERR] OSD_SCRUB_ERRORS: 248 scrub errors [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 2 pgs inconsistent pg 26.323 is active+recovery_unfound+degraded+remapped, acting [92,109,116,70,158,128,243,189,256], 1 unfound pg 26.337 is active+clean+inconsistent, acting [139,137,48,126,165,89,237,199,189] pg 26.3e2 is active+clean+inconsistent, acting [12,27,24,234,195,173,98,32,35] [WRN] PG_DEGRADED: Degraded data redundancy: 6/7118781559 objects degraded (0.000%), 1 pg degraded, 1 pg undersized pg 13.3a5 is stuck undersized for 4m, current state active+undersized+remapped+backfilling, last acting [2,45,32,62,2147483647,55,116,25,225,202,240] pg 26.323 is active+recovery_unfound+degraded+remapped, acting [92,109,116,70,158,128,243,189,256], 1 unfound For the PG_DAMAGED pgs I try the usual `ceph pg repair 26.323` etc., however it fails to get resolved. The osd.116 is already marked out and is beginning to get empty. I've tried restarting the osd processes of the first osd listed for each PG, but that doesn't get it resolved either. I guess we should have enough redundancy to get the correct data back, but how can I tell ceph to fix it in order to get back to a healthy state? Cheers /Simon

7 months, 2 weeks

3
4
0 0

Unable to fix 1 Inconsistent PG

by samdto987＠gmail.com

Hello All, Greetings. We've a Ceph Cluster with the version *ceph version 14.2.16-402-g7d47dbaf4d (7d47dbaf4d0960a2e910628360ae36def84ed913) nautilus (stable) =================================== Issues: 1 pg in inconsistent state and does not recover. # ceph -s cluster: id: 30d6f7ee-fa02-4ab3-8a09-9321c8002794 health: HEALTH_ERR 2 large omap objects 1 pools have many more objects per pg than average 159224 scrub errors Possible data damage: 1 pg inconsistent 2 pgs not deep-scrubbed in time 2 pgs not scrubbed in time # ceph health detail HEALTH_ERR 2 large omap objects; 1 pools have many more objects per pg than average; 159224 scrub errors; Possible data damage: 1 pg inconsistent; 2 pgs not deep-scrubbed in time; 2 pgs not scrubbed in time LARGE_OMAP_OBJECTS 2 large omap objects 2 large objects found in pool 'default.rgw.log' Search the cluster log for 'Large omap object found' for more details. MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average pool iscsi-images objects per pg (541376) is more than 14.9829 times cluster average (36133) OSD_SCRUB_ERRORS 159224 scrub errors PG_DAMAGED Possible data damage: 1 pg inconsistent pg 15.f4f is active+clean+inconsistent, acting [238,106,402,266,374,498,590,627,684,73,66] PG_NOT_DEEP_SCRUBBED 2 pgs not deep-scrubbed in time pg 1.5c not deep-scrubbed since 2021-04-05 23:20:13.714446 pg 1.55 not deep-scrubbed since 2021-04-11 07:12:37.185074 PG_NOT_SCRUBBED 2 pgs not scrubbed in time pg 1.5c not scrubbed since 2023-07-10 21:15:50.352848 pg 1.55 not scrubbed since 2023-06-24 10:02:10.038311 ====================================== We have implemented below command to resolve it 1. We have ran pg repair command "ceph pg repair 15.f4f 2. We have restarted associated OSDs that is mapped to pg 15.f4f 3. We tuned osd_max_scrubs value and set it to 9. 4. We have done scrub and deep scrub by ceph pg scrub 15.4f4 & ceph pg deep-scrub 15.f4f 5. We also tried to ceph-objectstore-tool command to fix it ============================================== We have checked the logs of the primary OSD of the respective inconsistent PG and found the below errors. [ERR] : 15.f4fs0 shard 402(2) 15:f2f3fff4:::94a51ddb-a94f-47bc-9068-509e8c09af9a.7862003.20_c%2f4%2fd61%2f885%2f49627697%2f192_1.ts:head : missing /var/log/ceph/ceph-osd.238.log:339:2023-10-06 00:37:06.410 7f65024cb700 -1 log_channel(cluster) log [ERR] : 15.f4fs0 shard 266(3) 15:f2f00002:::94a51ddb-a94f-47bc-9068-509e8c09af9a.11432468.3_TN8QHE_04.20.2020_08.41%2fCV_MAGNETIC%2fV_274396%2fCHUNK_2440801%2fSFILE_CONTAINER_031.FOLDER%2f3:head : missing /var/log/ceph/ceph-osd.238.log:340:2023-10-06 00:37:06.410 7f65024cb700 -1 log_channel(cluster) log [ERR] : 15.f4fs0 shard 402(2) 15:f2f00002:::94a51ddb-a94f-47bc-9068-509e8c09af9a.11432468.3_TN8QHE_04.20.2020_08.41%2fCV_MAGNETIC%2fV_274396%2fCHUNK_2440801%2fSFILE_CONTAINER_031.FOLDER%2f3:head : missing /var/log/ceph/ceph-osd.238.log:341:2023-10-06 00:37:06.410 7f65024cb700 -1 log_channel(cluster) log [ERR] : 15.f4fs0 shard 590(6) 15:f2f00002:::94a51ddb-a94f-47bc-9068-509e8c09af9a.11432468.3_TN8QHE_04.20.2020_08.41%2fCV_MAGNETIC%2fV_274396%2fCHUNK_2440801%2fSFILE_CONTAINER_031.FOLDER%2f3:head : missing =============================== and also we noticed that the no. of scrub errors in ceph health status are matching with the ERR log entries in the primary OSD logs of the inconsistent PG as below grep -Hn 'ERR' /var/log/ceph/ceph-osd.238.log|wc -l 159226 ================================ Ceph is cleaning the scrub errors but rate of scrub repair is very slow (avg of 200 scrub errors per day) ,we want to increase the rate of scrub error repair to finish the cleanup of pending 159224 scrub errors. #ceph pg 15.f4f query { "state": "active+clean+inconsistent", "snap_trimq": "[]", "snap_trimq_len": 0, "epoch": 409009, "up": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "acting": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "acting_recovery_backfill": [ "66(10)", "73(9)", "106(1)", "238(0)", "266(3)", "374(4)", "402(2)", "498(5)", "590(6)", "627(7)", "684(8)" ], "info": { "pgid": "15.f4fs0", "last_update": "409009'7998", "last_complete": "409009'7998", "log_tail": "382701'4900", "last_user_version": 592883, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": [], "history": { "epoch_created": 19813, "epoch_pool_created": 16141, "last_epoch_started": 407097, "last_interval_started": 407096, "last_epoch_clean": 407097, "last_interval_clean": 407096, "last_epoch_split": 19849, "last_epoch_marked_full": 0, "same_up_since": 407096, "same_interval_since": 407096, "same_primary_since": 407038, "last_scrub": "408987'7946", "last_scrub_stamp": "2023-10-06 00:53:53.705241", "last_deep_scrub": "408987'7946", "last_deep_scrub_stamp": "2023-10-06 00:53:53.705241", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935" }, "stats": { "version": "409009'7998", "reported_seq": "379048", "reported_epoch": "409009", "state": "active+clean+inconsistent", "last_fresh": "2023-10-06 15:49:09.174662", "last_change": "2023-10-06 00:53:53.705308", "last_active": "2023-10-06 15:49:09.174662", "last_peered": "2023-10-06 15:49:09.174662", "last_clean": "2023-10-06 15:49:09.174662", "last_became_active": "2023-10-03 19:55:56.742034", "last_became_peered": "2023-10-03 19:55:56.742034", "last_unstale": "2023-10-06 15:49:09.174662", "last_undegraded": "2023-10-06 15:49:09.174662", "last_fullsized": "2023-10-06 15:49:09.174662", "mapping_epoch": 407096, "log_start": "382701'4900", "ondisk_log_start": "382701'4900", "created": 19813, "last_epoch_clean": 407097, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "408987'7946", "last_scrub_stamp": "2023-10-06 00:53:53.705241", "last_deep_scrub": "408987'7946", "last_deep_scrub_stamp": "2023-10-06 00:53:53.705241", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935", "log_size": 3098, "ondisk_log_size": 3098, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 45852508474, "num_objects": 40234, "num_object_clones": 0, "num_object_copies": 442574, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 0, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 40234, "num_whiteouts": 0, "num_read": 23940, "num_read_kb": 2900611, "num_write": 34950, "num_write_kb": 2440513, "num_scrub_errors": 159224, "num_shallow_scrub_errors": 159224, "num_deep_scrub_errors": 0, "num_objects_recovered": 86741, "num_bytes_recovered": 97127524421, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0, "num_omap_bytes": 0, "num_omap_keys": 0, "num_objects_repaired": 42593 }, "up": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "acting": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "avail_no_missing": [], "object_location_counts": [], "blocked_by": [], "up_primary": 238, "acting_primary": 238, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 407097, "hit_set_history": { "current_last_update": "0'0", "history": [] } }, "peer_info": [ { "peer": "66(10)", "pgid": "15.f4fs10", "last_update": "409009'7998", "last_complete": "407068'7791", "log_tail": "378965'4700", "last_user_version": 592676, "last_backfill": "MAX", "last_backfill_bitwise": 1, "purged_snaps": [], "history": { "epoch_created": 19813, "epoch_pool_created": 16141, "last_epoch_started": 407097, "last_interval_started": 407096, "last_epoch_clean": 407097, "last_interval_clean": 407096, "last_epoch_split": 19849, "last_epoch_marked_full": 0, "same_up_since": 407096, "same_interval_since": 407096, "same_primary_since": 407038, "last_scrub": "408987'7946", "last_scrub_stamp": "2023-10-06 00:53:53.705241", "last_deep_scrub": "408987'7946", "last_deep_scrub_stamp": "2023-10-06 00:53:53.705241", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935" }, "stats": { "version": "407068'7791", "reported_seq": "376441", "reported_epoch": "407068", "state": "active+clean+inconsistent", "last_fresh": "2023-10-03 19:07:04.483241", "last_change": "2023-10-03 18:02:27.182058", "last_active": "2023-10-03 19:07:04.483241", "last_peered": "2023-10-03 19:07:04.483241", "last_clean": "2023-10-03 19:07:04.483241", "last_became_active": "2023-10-03 18:02:27.181798", "last_became_peered": "2023-10-03 18:02:27.181798", "last_unstale": "2023-10-03 19:07:04.483241", "last_undegraded": "2023-10-03 19:07:04.483241", "last_fullsized": "2023-10-03 19:07:04.483241", "mapping_epoch": 407096, "log_start": "378965'4700", "ondisk_log_start": "378965'4700", "created": 19813, "last_epoch_clean": 407068, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "407037'7783", "last_scrub_stamp": "2023-10-03 17:14:13.232978", "last_deep_scrub": "406839'7692", "last_deep_scrub_stamp": "2023-09-28 14:48:47.098935", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935", "log_size": 3091, "ondisk_log_size": 3091, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 45930070381, "num_objects": 40379, "num_object_clones": 0, "num_object_copies": 444169, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 0, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 40379, "num_whiteouts": 0, "num_read": 23277, "num_read_kb": 2877425, "num_write": 34413, "num_write_kb": 2409735, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 86740, "num_bytes_recovered": 97127524421, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0, "num_omap_bytes": 0, "num_omap_keys": 0, "num_objects_repaired": 42593 }, "up": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "acting": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "avail_no_missing": [], "object_location_counts": [], "blocked_by": [], "up_primary": 238, "acting_primary": 238, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 407097, "hit_set_history": { "current_last_update": "0'0", "history": [] } }, { "peer": "73(9)", "pgid": "15.f4fs9", "last_update": "409009'7998", "last_complete": "409009'7998", "log_tail": "378965'4700", "last_user_version": 592677, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": [], "history": { "epoch_created": 19813, "epoch_pool_created": 16141, "last_epoch_started": 407097, "last_interval_started": 407096, "last_epoch_clean": 407097, "last_interval_clean": 407096, "last_epoch_split": 19849, "last_epoch_marked_full": 0, "same_up_since": 407096, "same_interval_since": 407096, "same_primary_since": 407038, "last_scrub": "408987'7946", "last_scrub_stamp": "2023-10-06 00:53:53.705241", "last_deep_scrub": "408987'7946", "last_deep_scrub_stamp": "2023-10-06 00:53:53.705241", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935" }, "stats": { "version": "407095'7792", "reported_seq": "376677", "reported_epoch": "407095", "state": "active+undersized+degraded+inconsistent", "last_fresh": "2023-10-03 19:54:53.799024", "last_change": "2023-10-03 19:53:22.457623", "last_active": "2023-10-03 19:54:53.799024", "last_peered": "2023-10-03 19:54:53.799024", "last_clean": "2023-10-03 19:49:47.048960", "last_became_active": "2023-10-03 19:53:22.457623", "last_became_peered": "2023-10-03 19:53:22.457623", "last_unstale": "2023-10-03 19:54:53.799024", "last_undegraded": "2023-10-03 19:53:22.379335", "last_fullsized": "2023-10-03 19:53:22.379134", "mapping_epoch": 407096, "log_start": "378965'4700", "ondisk_log_start": "378965'4700", "created": 19813, "last_epoch_clean": 407093, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "407037'7783", "last_scrub_stamp": "2023-10-03 17:14:13.232978", "last_deep_scrub": "406839'7692", "last_deep_scrub_stamp": "2023-09-28 14:48:47.098935", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935", "log_size": 3092, "ondisk_log_size": 3092, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 45929876177, "num_objects": 40378, "num_object_clones": 0, "num_object_copies": 444158, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 40378, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 40378, "num_whiteouts": 0, "num_read": 23280, "num_read_kb": 2877429, "num_write": 34414, "num_write_kb": 2409735, "num_scrub_errors": 159632, "num_shallow_scrub_errors": 159632, "num_deep_scrub_errors": 0, "num_objects_recovered": 86740, "num_bytes_recovered": 97127524421, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0, "num_omap_bytes": 0, "num_omap_keys": 0, "num_objects_repaired": 42593 }, "up": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "acting": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "avail_no_missing": [ "238(0)", "73(9)", "106(1)", "266(3)", "374(4)", "402(2)", "498(5)", "590(6)", "627(7)", "684(8)" ], "object_location_counts": [ { "shards": "73(9),106(1),238(0),266(3),374(4),402(2),498(5),590(6),627(7),684(8)", "objects": 40378 } ], "blocked_by": [], "up_primary": 238, "acting_primary": 238, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 407097, "hit_set_history": { "current_last_update": "0'0", "history": [] } }, { "peer": "106(1)", "pgid": "15.f4fs1", "last_update": "409009'7998", "last_complete": "409009'7998", "log_tail": "378965'4700", "last_user_version": 592677, "last_backfill": "MAX", "last_backfill_bitwise": 1, "purged_snaps": [], "history": { "epoch_created": 19813, "epoch_pool_created": 16141, "last_epoch_started": 407097, "last_interval_started": 407096, "last_epoch_clean": 407097, "last_interval_clean": 407096, "last_epoch_split": 19849, "last_epoch_marked_full": 0, "same_up_since": 407096, "same_interval_since": 407096, "same_primary_since": 407038, "last_scrub": "408987'7946", "last_scrub_stamp": "2023-10-06 00:53:53.705241", "last_deep_scrub": "408987'7946", "last_deep_scrub_stamp": "2023-10-06 00:53:53.705241", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935" }, "stats": { "version": "407095'7792", "reported_seq": "376677", "reported_epoch": "407095", "state": "active+undersized+degraded+inconsistent", "last_fresh": "2023-10-03 19:54:53.799024", "last_change": "2023-10-03 19:53:22.457623", "last_active": "2023-10-03 19:54:53.799024", "last_peered": "2023-10-03 19:54:53.799024", "last_clean": "2023-10-03 19:49:47.048960", "last_became_active": "2023-10-03 19:53:22.457623", "last_became_peered": "2023-10-03 19:53:22.457623", "last_unstale": "2023-10-03 19:54:53.799024", "last_undegraded": "2023-10-03 19:53:22.379335", "last_fullsized": "2023-10-03 19:53:22.379134", "mapping_epoch": 407096, "log_start": "378965'4700", "ondisk_log_start": "378965'4700", "created": 19813, "last_epoch_clean": 407093, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "407037'7783", "last_scrub_stamp": "2023-10-03 17:14:13.232978", "last_deep_scrub": "406839'7692", "last_deep_scrub_stamp": "2023-09-28 14:48:47.098935", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935", "log_size": 3092, "ondisk_log_size": 3092, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 45929876177, "num_objects": 40378, "num_object_clones": 0, "num_object_copies": 444158, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 40378, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 40378, "num_whiteouts": 0, "num_read": 23280, "num_read_kb": 2877429, "num_write": 34414, "num_write_kb": 2409735, "num_scrub_errors": 159632, "num_shallow_scrub_errors": 159632, "num_deep_scrub_errors": 0, "num_objects_recovered": 86740, "num_bytes_recovered": 97127524421, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0, "num_omap_bytes": 0, "num_omap_keys": 0, "num_objects_repaired": 42593 }, "up": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "acting": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "avail_no_missing": [ "238(0)", "73(9)", "106(1)", "266(3)", "374(4)", "402(2)", "498(5)", "590(6)", "627(7)", "684(8)" ], "object_location_counts": [ { "shards": "73(9),106(1),238(0),266(3),374(4),402(2),498(5),590(6),627(7),684(8)", "objects": 40378 } ], "blocked_by": [], "up_primary": 238, "acting_primary": 238, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 407097, "hit_set_history": { "current_last_update": "0'0", "history": [] } }, { "peer": "266(3)", "pgid": "15.f4fs3", "last_update": "409009'7998", "last_complete": "409009'7998", "log_tail": "378965'4700", "last_user_version": 592677, "last_backfill": "MAX", "last_backfill_bitwise": 1, "purged_snaps": [], "history": { "epoch_created": 19813, "epoch_pool_created": 16141, "last_epoch_started": 407097, "last_interval_started": 407096, "last_epoch_clean": 407097, "last_interval_clean": 407096, "last_epoch_split": 19849, "last_epoch_marked_full": 0, "same_up_since": 407096, "same_interval_since": 407096, "same_primary_since": 407038, "last_scrub": "408987'7946", "last_scrub_stamp": "2023-10-06 00:53:53.705241", "last_deep_scrub": "408987'7946", "last_deep_scrub_stamp": "2023-10-06 00:53:53.705241", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935" }, "stats": { "version": "407095'7792", "reported_seq": "376677", "reported_epoch": "407095", "state": "active+undersized+degraded+inconsistent", "last_fresh": "2023-10-03 19:54:53.799024", "last_change": "2023-10-03 19:53:22.457623", "last_active": "2023-10-03 19:54:53.799024", "last_peered": "2023-10-03 19:54:53.799024", "last_clean": "2023-10-03 19:49:47.048960", "last_became_active": "2023-10-03 19:53:22.457623", "last_became_peered": "2023-10-03 19:53:22.457623", "last_unstale": "2023-10-03 19:54:53.799024", "last_undegraded": "2023-10-03 19:53:22.379335", "last_fullsized": "2023-10-03 19:53:22.379134", "mapping_epoch": 407096, "log_start": "378965'4700", "ondisk_log_start": "378965'4700", "created": 19813, "last_epoch_clean": 407093, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "407037'7783", "last_scrub_stamp": "2023-10-03 17:14:13.232978", "last_deep_scrub": "406839'7692", "last_deep_scrub_stamp": "2023-09-28 14:48:47.098935", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935", "log_size": 3092, "ondisk_log_size": 3092, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 45929876177, "num_objects": 40378, "num_object_clones": 0, "num_object_copies": 444158, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 40378, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 40378, "num_whiteouts": 0, "num_read": 23280, "num_read_kb": 2877429, "num_write": 34414, "num_write_kb": 2409735, "num_scrub_errors": 159632, "num_shallow_scrub_errors": 159632, "num_deep_scrub_errors": 0, "num_objects_recovered": 86740, "num_bytes_recovered": 97127524421, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0, "num_omap_bytes": 0, "num_omap_keys": 0, "num_objects_repaired": 42593 }, "up": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "acting": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "avail_no_missing": [ "238(0)", "73(9)", "106(1)", "266(3)", "374(4)", "402(2)", "498(5)", "590(6)", "627(7)", "684(8)" ], "object_location_counts": [ { "shards": "73(9),106(1),238(0),266(3),374(4),402(2),498(5),590(6),627(7),684(8)", "objects": 40378 } ], "blocked_by": [], "up_primary": 238, "acting_primary": 238, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 407097, "hit_set_history": { "current_last_update": "0'0", "history": [] } }, { "peer": "374(4)", "pgid": "15.f4fs4", "last_update": "409009'7998", "last_complete": "409009'7998", "log_tail": "378965'4700", "last_user_version": 592677, "last_backfill": "MAX", "last_backfill_bitwise": 1, "purged_snaps": [], "history": { "epoch_created": 19813, "epoch_pool_created": 16141, "last_epoch_started": 407097, "last_interval_started": 407096, "last_epoch_clean": 407097, "last_interval_clean": 407096, "last_epoch_split": 19849, "last_epoch_marked_full": 0, "same_up_since": 407096, "same_interval_since": 407096, "same_primary_since": 407038, "last_scrub": "408987'7946", "last_scrub_stamp": "2023-10-06 00:53:53.705241", "last_deep_scrub": "408987'7946", "last_deep_scrub_stamp": "2023-10-06 00:53:53.705241", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935" }, "stats": { "version": "407095'7792", "reported_seq": "376677", "reported_epoch": "407095", "state": "active+undersized+degraded+inconsistent", "last_fresh": "2023-10-03 19:54:53.799024", "last_change": "2023-10-03 19:53:22.457623", "last_active": "2023-10-03 19:54:53.799024", "last_peered": "2023-10-03 19:54:53.799024", "last_clean": "2023-10-03 19:49:47.048960", "last_became_active": "2023-10-03 19:53:22.457623", "last_became_peered": "2023-10-03 19:53:22.457623", "last_unstale": "2023-10-03 19:54:53.799024", "last_undegraded": "2023-10-03 19:53:22.379335", "last_fullsized": "2023-10-03 19:53:22.379134", "mapping_epoch": 407096, "log_start": "378965'4700", "ondisk_log_start": "378965'4700", "created": 19813, "last_epoch_clean": 407093, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "407037'7783", "last_scrub_stamp": "2023-10-03 17:14:13.232978", "last_deep_scrub": "406839'7692", "last_deep_scrub_stamp": "2023-09-28 14:48:47.098935", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935", "log_size": 3092, "ondisk_log_size": 3092, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 45929876177, "num_objects": 40378, "num_object_clones": 0, "num_object_copies": 444158, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 40378, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 40378, "num_whiteouts": 0, "num_read": 23280, "num_read_kb": 2877429, "num_write": 34414, "num_write_kb": 2409735, "num_scrub_errors": 159632, "num_shallow_scrub_errors": 159632, "num_deep_scrub_errors": 0, "num_objects_recovered": 86740, "num_bytes_recovered": 97127524421, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0, "num_omap_bytes": 0, "num_omap_keys": 0, "num_objects_repaired": 42593 }, "up": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "acting": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "avail_no_missing": [ "238(0)", "73(9)", "106(1)", "266(3)", "374(4)", "402(2)", "498(5)", "590(6)", "627(7)", "684(8)" ], "object_location_counts": [ { "shards": "73(9),106(1),238(0),266(3),374(4),402(2),498(5),590(6),627(7),684(8)", "objects": 40378 } ], "blocked_by": [], "up_primary": 238, "acting_primary": 238, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 407097, "hit_set_history": { "current_last_update": "0'0", "history": [] } }, { "peer": "402(2)", "pgid": "15.f4fs2", "last_update": "409009'7998", "last_complete": "409009'7998", "log_tail": "378965'4700", "last_user_version": 592677, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": [], "history": { "epoch_created": 19813, "epoch_pool_created": 16141, "last_epoch_started": 407097, "last_interval_started": 407096, "last_epoch_clean": 407097, "last_interval_clean": 407096, "last_epoch_split": 19849, "last_epoch_marked_full": 0, "same_up_since": 407096, "same_interval_since": 407096, "same_primary_since": 407038, "last_scrub": "408987'7946", "last_scrub_stamp": "2023-10-06 00:53:53.705241", "last_deep_scrub": "408987'7946", "last_deep_scrub_stamp": "2023-10-06 00:53:53.705241", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935" }, "stats": { "version": "407095'7792", "reported_seq": "376677", "reported_epoch": "407095", "state": "active+undersized+degraded+inconsistent", "last_fresh": "2023-10-03 19:54:53.799024", "last_change": "2023-10-03 19:53:22.457623", "last_active": "2023-10-03 19:54:53.799024", "last_peered": "2023-10-03 19:54:53.799024", "last_clean": "2023-10-03 19:49:47.048960", "last_became_active": "2023-10-03 19:53:22.457623", "last_became_peered": "2023-10-03 19:53:22.457623", "last_unstale": "2023-10-03 19:54:53.799024", "last_undegraded": "2023-10-03 19:53:22.379335", "last_fullsized": "2023-10-03 19:53:22.379134", "mapping_epoch": 407096, "log_start": "378965'4700", "ondisk_log_start": "378965'4700", "created": 19813, "last_epoch_clean": 407093, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "407037'7783", "last_scrub_stamp": "2023-10-03 17:14:13.232978", "last_deep_scrub": "406839'7692", "last_deep_scrub_stamp": "2023-09-28 14:48:47.098935", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935", "log_size": 3092, "ondisk_log_size": 3092, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 45929876177, "num_objects": 40378, "num_object_clones": 0, "num_object_copies": 444158, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 40378, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 40378, "num_whiteouts": 0, "num_read": 23280, "num_read_kb": 2877429, "num_write": 34414, "num_write_kb": 2409735, "num_scrub_errors": 159632, "num_shallow_scrub_errors": 159632, "num_deep_scrub_errors": 0, "num_objects_recovered": 86740, "num_bytes_recovered": 97127524421, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0, "num_omap_bytes": 0, "num_omap_keys": 0, "num_objects_repaired": 42593 }, "up": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "acting": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "avail_no_missing": [ "238(0)", "73(9)", "106(1)", "266(3)", "374(4)", "402(2)", "498(5)", "590(6)", "627(7)", "684(8)" ], "object_location_counts": [ { "shards": "73(9),106(1),238(0),266(3),374(4),402(2),498(5),590(6),627(7),684(8)", "objects": 40378 } ], "blocked_by": [], "up_primary": 238, "acting_primary": 238, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 407097, "hit_set_history": { "current_last_update": "0'0", "history": [] } }, { "peer": "498(5)", "pgid": "15.f4fs5", "last_update": "409009'7998", "last_complete": "409009'7998", "log_tail": "378965'4700", "last_user_version": 592677, "last_backfill": "MAX", "last_backfill_bitwise": 1, "purged_snaps": [], "history": { "epoch_created": 19813, "epoch_pool_created": 16141, "last_epoch_started": 407097, "last_interval_started": 407096, "last_epoch_clean": 407097, "last_interval_clean": 407096, "last_epoch_split": 19849, "last_epoch_marked_full": 0, "same_up_since": 407096, "same_interval_since": 407096, "same_primary_since": 407038, "last_scrub": "408987'7946", "last_scrub_stamp": "2023-10-06 00:53:53.705241", "last_deep_scrub": "408987'7946", "last_deep_scrub_stamp": "2023-10-06 00:53:53.705241", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935" }, "stats": { "version": "407095'7792", "reported_seq": "376677", "reported_epoch": "407095", "state": "active+undersized+degraded+inconsistent", "last_fresh": "2023-10-03 19:54:53.799024", "last_change": "2023-10-03 19:53:22.457623", "last_active": "2023-10-03 19:54:53.799024", "last_peered": "2023-10-03 19:54:53.799024", "last_clean": "2023-10-03 19:49:47.048960", "last_became_active": "2023-10-03 19:53:22.457623", "last_became_peered": "2023-10-03 19:53:22.457623", "last_unstale": "2023-10-03 19:54:53.799024", "last_undegraded": "2023-10-03 19:53:22.379335", "last_fullsized": "2023-10-03 19:53:22.379134", "mapping_epoch": 407096, "log_start": "378965'4700", "ondisk_log_start": "378965'4700", "created": 19813, "last_epoch_clean": 407093, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "407037'7783", "last_scrub_stamp": "2023-10-03 17:14:13.232978", "last_deep_scrub": "406839'7692", "last_deep_scrub_stamp": "2023-09-28 14:48:47.098935", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935", "log_size": 3092, "ondisk_log_size": 3092, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 45929876177, "num_objects": 40378, "num_object_clones": 0, "num_object_copies": 444158, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 40378, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 40378, "num_whiteouts": 0, "num_read": 23280, "num_read_kb": 2877429, "num_write": 34414, "num_write_kb": 2409735, "num_scrub_errors": 159632, "num_shallow_scrub_errors": 159632, "num_deep_scrub_errors": 0, "num_objects_recovered": 86740, "num_bytes_recovered": 97127524421, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0, "num_omap_bytes": 0, "num_omap_keys": 0, "num_objects_repaired": 42593 }, "up": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "acting": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "avail_no_missing": [ "238(0)", "73(9)", "106(1)", "266(3)", "374(4)", "402(2)", "498(5)", "590(6)", "627(7)", "684(8)" ], "object_location_counts": [ { "shards": "73(9),106(1),238(0),266(3),374(4),402(2),498(5),590(6),627(7),684(8)", "objects": 40378 } ], "blocked_by": [], "up_primary": 238, "acting_primary": 238, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 407097, "hit_set_history": { "current_last_update": "0'0", "history": [] } }, { "peer": "590(6)", "pgid": "15.f4fs6", "last_update": "409009'7998", "last_complete": "409009'7998", "log_tail": "378965'4700", "last_user_version": 592677, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": [], "history": { "epoch_created": 19813, "epoch_pool_created": 16141, "last_epoch_started": 407097, "last_interval_started": 407096, "last_epoch_clean": 407097, "last_interval_clean": 407096, "last_epoch_split": 19849, "last_epoch_marked_full": 0, "same_up_since": 407096, "same_interval_since": 407096, "same_primary_since": 407038, "last_scrub": "408987'7946", "last_scrub_stamp": "2023-10-06 00:53:53.705241", "last_deep_scrub": "408987'7946", "last_deep_scrub_stamp": "2023-10-06 00:53:53.705241", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935" }, "stats": { "version": "407095'7792", "reported_seq": "376677", "reported_epoch": "407095", "state": "active+undersized+degraded+inconsistent", "last_fresh": "2023-10-03 19:54:53.799024", "last_change": "2023-10-03 19:53:22.457623", "last_active": "2023-10-03 19:54:53.799024", "last_peered": "2023-10-03 19:54:53.799024", "last_clean": "2023-10-03 19:49:47.048960", "last_became_active": "2023-10-03 19:53:22.457623", "last_became_peered": "2023-10-03 19:53:22.457623", "last_unstale": "2023-10-03 19:54:53.799024", "last_undegraded": "2023-10-03 19:53:22.379335", "last_fullsized": "2023-10-03 19:53:22.379134", "mapping_epoch": 407096, "log_start": "378965'4700", "ondisk_log_start": "378965'4700", "created": 19813, "last_epoch_clean": 407093, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "407037'7783", "last_scrub_stamp": "2023-10-03 17:14:13.232978", "last_deep_scrub": "406839'7692", "last_deep_scrub_stamp": "2023-09-28 14:48:47.098935", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935", "log_size": 3092, "ondisk_log_size": 3092, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 45929876177, "num_objects": 40378, "num_object_clones": 0, "num_object_copies": 444158, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 40378, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 40378, "num_whiteouts": 0, "num_read": 23280, "num_read_kb": 2877429, "num_write": 34414, "num_write_kb": 2409735, "num_scrub_errors": 159632, "num_shallow_scrub_errors": 159632, "num_deep_scrub_errors": 0, "num_objects_recovered": 86740, "num_bytes_recovered": 97127524421, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0, "num_omap_bytes": 0, "num_omap_keys": 0, "num_objects_repaired": 42593 }, "up": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "acting": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "avail_no_missing": [ "238(0)", "73(9)", "106(1)", "266(3)", "374(4)", "402(2)", "498(5)", "590(6)", "627(7)", "684(8)" ], "object_location_counts": [ { "shards": "73(9),106(1),238(0),266(3),374(4),402(2),498(5),590(6),627(7),684(8)", "objects": 40378 } ], "blocked_by": [], "up_primary": 238, "acting_primary": 238, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 407097, "hit_set_history": { "current_last_update": "0'0", "history": [] } }, { "peer": "627(7)", "pgid": "15.f4fs7", "last_update": "409009'7998", "last_complete": "409009'7998", "log_tail": "378965'4700", "last_user_version": 592677, "last_backfill": "MAX", "last_backfill_bitwise": 1, "purged_snaps": [], "history": { "epoch_created": 19813, "epoch_pool_created": 16141, "last_epoch_started": 407097, "last_interval_started": 407096, "last_epoch_clean": 407097, "last_interval_clean": 407096, "last_epoch_split": 19849, "last_epoch_marked_full": 0, "same_up_since": 407096, "same_interval_since": 407096, "same_primary_since": 407038, "last_scrub": "408987'7946", "last_scrub_stamp": "2023-10-06 00:53:53.705241", "last_deep_scrub": "408987'7946", "last_deep_scrub_stamp": "2023-10-06 00:53:53.705241", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935" }, "stats": { "version": "407095'7792", "reported_seq": "376677", "reported_epoch": "407095", "state": "active+undersized+degraded+inconsistent", "last_fresh": "2023-10-03 19:54:53.799024", "last_change": "2023-10-03 19:53:22.457623", "last_active": "2023-10-03 19:54:53.799024", "last_peered": "2023-10-03 19:54:53.799024", "last_clean": "2023-10-03 19:49:47.048960", "last_became_active": "2023-10-03 19:53:22.457623", "last_became_peered": "2023-10-03 19:53:22.457623", "last_unstale": "2023-10-03 19:54:53.799024", "last_undegraded": "2023-10-03 19:53:22.379335", "last_fullsized": "2023-10-03 19:53:22.379134", "mapping_epoch": 407096, "log_start": "378965'4700", "ondisk_log_start": "378965'4700", "created": 19813, "last_epoch_clean": 407093, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "407037'7783", "last_scrub_stamp": "2023-10-03 17:14:13.232978", "last_deep_scrub": "406839'7692", "last_deep_scrub_stamp": "2023-09-28 14:48:47.098935", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935", "log_size": 3092, "ondisk_log_size": 3092, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 45929876177, "num_objects": 40378, "num_object_clones": 0, "num_object_copies": 444158, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 40378, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 40378, "num_whiteouts": 0, "num_read": 23280, "num_read_kb": 2877429, "num_write": 34414, "num_write_kb": 2409735, "num_scrub_errors": 159632, "num_shallow_scrub_errors": 159632, "num_deep_scrub_errors": 0, "num_objects_recovered": 86740, "num_bytes_recovered": 97127524421, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0, "num_omap_bytes": 0, "num_omap_keys": 0, "num_objects_repaired": 42593 }, "up": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "acting": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "avail_no_missing": [ "238(0)", "73(9)", "106(1)", "266(3)", "374(4)", "402(2)", "498(5)", "590(6)", "627(7)", "684(8)" ], "object_location_counts": [ { "shards": "73(9),106(1),238(0),266(3),374(4),402(2),498(5),590(6),627(7),684(8)", "objects": 40378 } ], "blocked_by": [], "up_primary": 238, "acting_primary": 238, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 407097, "hit_set_history": { "current_last_update": "0'0", "history": [] } }, { "peer": "684(8)", "pgid": "15.f4fs8", "last_update": "409009'7998", "last_complete": "409009'7998", "log_tail": "378965'4700", "last_user_version": 592677, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": [], "history": { "epoch_created": 19813, "epoch_pool_created": 16141, "last_epoch_started": 407097, "last_interval_started": 407096, "last_epoch_clean": 407097, "last_interval_clean": 407096, "last_epoch_split": 19849, "last_epoch_marked_full": 0, "same_up_since": 407096, "same_interval_since": 407096, "same_primary_since": 407038, "last_scrub": "408987'7946", "last_scrub_stamp": "2023-10-06 00:53:53.705241", "last_deep_scrub": "408987'7946", "last_deep_scrub_stamp": "2023-10-06 00:53:53.705241", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935" }, "stats": { "version": "407095'7792", "reported_seq": "376677", "reported_epoch": "407095", "state": "active+undersized+degraded+inconsistent", "last_fresh": "2023-10-03 19:54:53.799024", "last_change": "2023-10-03 19:53:22.457623", "last_active": "2023-10-03 19:54:53.799024", "last_peered": "2023-10-03 19:54:53.799024", "last_clean": "2023-10-03 19:49:47.048960", "last_became_active": "2023-10-03 19:53:22.457623", "last_became_peered": "2023-10-03 19:53:22.457623", "last_unstale": "2023-10-03 19:54:53.799024", "last_undegraded": "2023-10-03 19:53:22.379335", "last_fullsized": "2023-10-03 19:53:22.379134", "mapping_epoch": 407096, "log_start": "378965'4700", "ondisk_log_start": "378965'4700", "created": 19813, "last_epoch_clean": 407093, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "407037'7783", "last_scrub_stamp": "2023-10-03 17:14:13.232978", "last_deep_scrub": "406839'7692", "last_deep_scrub_stamp": "2023-09-28 14:48:47.098935", "last_clean_scrub_stamp": "2023-09-28 14:48:47.098935", "log_size": 3092, "ondisk_log_size": 3092, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 45929876177, "num_objects": 40378, "num_object_clones": 0, "num_object_copies": 444158, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 40378, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 40378, "num_whiteouts": 0, "num_read": 23280, "num_read_kb": 2877429, "num_write": 34414, "num_write_kb": 2409735, "num_scrub_errors": 159632, "num_shallow_scrub_errors": 159632, "num_deep_scrub_errors": 0, "num_objects_recovered": 86740, "num_bytes_recovered": 97127524421, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0, "num_omap_bytes": 0, "num_omap_keys": 0, "num_objects_repaired": 42593 }, "up": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "acting": [ 238, 106, 402, 266, 374, 498, 590, 627, 684, 73, 66 ], "avail_no_missing": [ "238(0)", "73(9)", "106(1)", "266(3)", "374(4)", "402(2)", "498(5)", "590(6)", "627(7)", "684(8)" ], "object_location_counts": [ { "shards": "73(9),106(1),238(0),266(3),374(4),402(2),498(5),590(6),627(7),684(8)", "objects": 40378 } ], "blocked_by": [], "up_primary": 238, "acting_primary": 238, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 407097, "hit_set_history": { "current_last_update": "0'0", "history": [] } } ], "recovery_state": [ { "name": "Started/Primary/Active", "enter_time": "2023-10-03 19:55:56.164849", "might_have_unfound": [ { "osd": "66(10)", "status": "already probed" }, { "osd": "73(9)", "status": "already probed" }, { "osd": "106(1)", "status": "already probed" }, { "osd": "266(3)", "status": "already probed" }, { "osd": "374(4)", "status": "already probed" }, { "osd": "402(2)", "status": "already probed" }, { "osd": "498(5)", "status": "already probed" }, { "osd": "590(6)", "status": "already probed" }, { "osd": "627(7)", "status": "already probed" }, { "osd": "684(8)", "status": "already probed" } ], "recovery_progress": { "backfill_targets": [], "waiting_on_backfill": [], "last_backfill_started": "MIN", "backfill_info": { "begin": "MIN", "end": "MIN", "objects": [] }, "peer_backfill_info": [], "backfills_in_flight": [], "recovering": [], "pg_backend": { "recovery_ops": [], "read_ops": [] } }, "scrub": { "scrubber.epoch_start": "407096", "scrubber.active": false, "scrubber.state": "INACTIVE", "scrubber.start": "MIN", "scrubber.end": "MIN", "scrubber.max_end": "MIN", "scrubber.subset_last_update": "0'0", "scrubber.deep": false, "scrubber.waiting_on_whom": [] } }, { "name": "Started", "enter_time": "2023-10-03 19:55:55.247621" } ], "agent_state": {} } node1:~ #

7 months, 2 weeks

1
0
0 0

cephadm, cannot use ECDSA key with quincy

by paul.jurco＠gmail.com

Hi ceph users, We have a few clusters with quincy 17.2.6 and we are preparing to migrate from ceph-deploy to cephadm for better management. We are using Ubuntu20 with latest updates (latest openssh). While testing the migration to cephadm on a test cluster with octopus (v16 latest) we had no issues replacing ceph generated cert/key with our own CA signed certs (ECDSA). After upgrading to quincy the test cluster and test again the migration we cannot add hosts due to the errors below, ssh access errors specified a while ago in a tracker. We use the following type of certs: Type: ecdsa-sha2-nistp384-cert-v01(a)openssh.com user certificate The certificate works everytime when using ssh client from shell to connect to all hosts in the cluster. We do a ceph mgr fail every time we replace cert/key so they are restarted. ----- cephadm logs from mgr ------ Oct 06 09:23:27 ceph-m2 bash[1363]: Log: Opening SSH connection to 10.10.10.232, port 22 Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Connected to SSH server at 10.10.10.232, port 22 Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Local address: 10.10.12.160, port 51870 Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Peer address: 10.10.10.232, port 22 Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Beginning auth for user root Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Auth failed for user root Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Connection failure: Permission denied Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Aborting connection Oct 06 09:23:27 ceph-m2 bash[1363]: Traceback (most recent call last): Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/ssh.py", line 111, in redirect_log Oct 06 09:23:27 ceph-m2 bash[1363]: yield Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/ssh.py", line 90, in _remote_connection Oct 06 09:23:27 ceph-m2 bash[1363]: preferred_auth=['publickey'], options=ssh_options) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/lib/python3.6/site-packages/asyncssh/connection.py", line 6804, in connect Oct 06 09:23:27 ceph-m2 bash[1363]: 'Opening SSH connection to') Oct 06 09:23:27 ceph-m2 bash[1363]: File "/lib/python3.6/site-packages/asyncssh/connection.py", line 303, in _connect Oct 06 09:23:27 ceph-m2 bash[1363]: await conn.wait_established() Oct 06 09:23:27 ceph-m2 bash[1363]: File "/lib/python3.6/site-packages/asyncssh/connection.py", line 2243, in wait_established Oct 06 09:23:27 ceph-m2 bash[1363]: await self._waiter Oct 06 09:23:27 ceph-m2 bash[1363]: asyncssh.misc.PermissionDenied: Permission denied Oct 06 09:23:27 ceph-m2 bash[1363]: During handling of the above exception, another exception occurred: Oct 06 09:23:27 ceph-m2 bash[1363]: Traceback (most recent call last): Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 125, in wrapper Oct 06 09:23:27 ceph-m2 bash[1363]: return OrchResult(f(*args, **kwargs)) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/module.py", line 2810, in apply Oct 06 09:23:27 ceph-m2 bash[1363]: results.append(self._apply(spec)) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/module.py", line 2558, in _apply Oct 06 09:23:27 ceph-m2 bash[1363]: return self._add_host(cast(HostSpec, spec)) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1434, in _add_host Oct 06 09:23:27 ceph-m2 bash[1363]: ip_addr = self._check_valid_addr(spec.hostname, spec.addr) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1415, in _check_valid_addr Oct 06 09:23:27 ceph-m2 bash[1363]: error_ok=True, no_fsid=True)) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/module.py", line 615, in wait_async Oct 06 09:23:27 ceph-m2 bash[1363]: return self.event_loop.get_result(coro) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/ssh.py", line 56, in get_result Oct 06 09:23:27 ceph-m2 bash[1363]: return asyncio.run_coroutine_threadsafe(coro, self._loop).result() Oct 06 09:23:27 ceph-m2 bash[1363]: File "/lib64/python3.6/concurrent/futures/_base.py", line 432, in result Oct 06 09:23:27 ceph-m2 bash[1363]: return self.__get_result() Oct 06 09:23:27 ceph-m2 bash[1363]: File "/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result Oct 06 09:23:27 ceph-m2 bash[1363]: raise self._exception Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/serve.py", line 1361, in _run_cephadm Oct 06 09:23:27 ceph-m2 bash[1363]: await self.mgr.ssh._remote_connection(host, addr) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/ssh.py", line 96, in _remote_connection Oct 06 09:23:27 ceph-m2 bash[1363]: raise Oct 06 09:23:27 ceph-m2 bash[1363]: File "/lib64/python3.6/contextlib.py", line 99, in __exit__ Oct 06 09:23:27 ceph-m2 bash[1363]: self.gen.throw(type, value, traceback) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/ssh.py", line 123, in redirect_log Oct 06 09:23:27 ceph-m2 bash[1363]: raise HostConnectionError(msg, host, addr) Oct 06 09:23:27 ceph-m2 bash[1363]: cephadm.ssh.HostConnectionError: Failed to connect to ceph-m1 (10.10.10.232). Permission denied Oct 06 09:23:27 ceph-m2 bash[1363]: Log: Opening SSH connection to 10.10.10.232, port 22 Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Connected to SSH server at 10.10.10.232, port 22 Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Local address: 10.10.12.160, port 51870 Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Peer address: 10.10.10.232, port 22 Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Beginning auth for user root Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Auth failed for user root Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Connection failure: Permission denied Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Aborting connection Oct 06 09:23:27 ceph-m2 bash[1363]: debug 2023-10-06T09:23:27.081+0000 7f78d86d8700 -1 log_channel(cephadm) log [ERR] : Failed to connect to ceph-m1 (10.10.10.232). Permission denied Oct 06 09:23:27 ceph-m2 bash[1363]: Log: Opening SSH connection to 10.10.10.232, port 22 Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Connected to SSH server at 10.10.10.232, port 22 Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Local address: 10.10.12.160, port 51870 Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Peer address: 10.10.10.232, port 22 Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Beginning auth for user root Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Auth failed for user root Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Connection failure: Permission denied Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Aborting connection Oct 06 09:23:27 ceph-m2 bash[1363]: Traceback (most recent call last): Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/ssh.py", line 111, in redirect_log Oct 06 09:23:27 ceph-m2 bash[1363]: yield Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/ssh.py", line 90, in _remote_connection Oct 06 09:23:27 ceph-m2 bash[1363]: preferred_auth=['publickey'], options=ssh_options) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/lib/python3.6/site-packages/asyncssh/connection.py", line 6804, in connect Oct 06 09:23:27 ceph-m2 bash[1363]: 'Opening SSH connection to') Oct 06 09:23:27 ceph-m2 bash[1363]: File "/lib/python3.6/site-packages/asyncssh/connection.py", line 303, in _connect Oct 06 09:23:27 ceph-m2 bash[1363]: await conn.wait_established() Oct 06 09:23:27 ceph-m2 bash[1363]: File "/lib/python3.6/site-packages/asyncssh/connection.py", line 2243, in wait_established Oct 06 09:23:27 ceph-m2 bash[1363]: await self._waiter Oct 06 09:23:27 ceph-m2 bash[1363]: asyncssh.misc.PermissionDenied: Permission denied Oct 06 09:23:27 ceph-m2 bash[1363]: During handling of the above exception, another exception occurred: Oct 06 09:23:27 ceph-m2 bash[1363]: Traceback (most recent call last): Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 125, in wrapper Oct 06 09:23:27 ceph-m2 bash[1363]: return OrchResult(f(*args, **kwargs)) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/module.py", line 2810, in apply Oct 06 09:23:27 ceph-m2 bash[1363]: results.append(self._apply(spec)) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/module.py", line 2558, in _apply Oct 06 09:23:27 ceph-m2 bash[1363]: return self._add_host(cast(HostSpec, spec)) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1434, in _add_host Oct 06 09:23:27 ceph-m2 bash[1363]: ip_addr = self._check_valid_addr(spec.hostname, spec.addr) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/module.py", line 1415, in _check_valid_addr Oct 06 09:23:27 ceph-m2 bash[1363]: error_ok=True, no_fsid=True)) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/module.py", line 615, in wait_async Oct 06 09:23:27 ceph-m2 bash[1363]: return self.event_loop.get_result(coro) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/ssh.py", line 56, in get_result Oct 06 09:23:27 ceph-m2 bash[1363]: return asyncio.run_coroutine_threadsafe(coro, self._loop).result() Oct 06 09:23:27 ceph-m2 bash[1363]: File "/lib64/python3.6/concurrent/futures/_base.py", line 432, in result Oct 06 09:23:27 ceph-m2 bash[1363]: return self.__get_result() Oct 06 09:23:27 ceph-m2 bash[1363]: File "/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result Oct 06 09:23:27 ceph-m2 bash[1363]: raise self._exception Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/serve.py", line 1361, in _run_cephadm Oct 06 09:23:27 ceph-m2 bash[1363]: await self.mgr.ssh._remote_connection(host, addr) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/ssh.py", line 96, in _remote_connection Oct 06 09:23:27 ceph-m2 bash[1363]: raise Oct 06 09:23:27 ceph-m2 bash[1363]: File "/lib64/python3.6/contextlib.py", line 99, in __exit__ Oct 06 09:23:27 ceph-m2 bash[1363]: self.gen.throw(type, value, traceback) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/cephadm/ssh.py", line 123, in redirect_log Oct 06 09:23:27 ceph-m2 bash[1363]: raise HostConnectionError(msg, host, addr) Oct 06 09:23:27 ceph-m2 bash[1363]: cephadm.ssh.HostConnectionError: Failed to connect to ceph-m1 (10.10.10.232). Permission denied Oct 06 09:23:27 ceph-m2 bash[1363]: Log: Opening SSH connection to 10.10.10.232, port 22 Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Connected to SSH server at 10.10.10.232, port 22 Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Local address: 10.10.12.160, port 51870 Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Peer address: 10.10.10.232, port 22 Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Beginning auth for user root Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Auth failed for user root Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Connection failure: Permission denied Oct 06 09:23:27 ceph-m2 bash[1363]: [conn=3] Aborting connection Oct 06 09:23:27 ceph-m2 bash[1363]: debug 2023-10-06T09:23:27.081+0000 7f78d86d8700 -1 mgr handle_command module 'orchestrator' command handler threw exception: __init__() missing 2 required positional arguments: > Oct 06 09:23:27 ceph-m2 bash[1363]: debug 2023-10-06T09:23:27.093+0000 7f78d86d8700 -1 mgr.server reply reply (22) Invalid argument Traceback (most recent call last): Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/mgr_module.py", line 1756, in _handle_command Oct 06 09:23:27 ceph-m2 bash[1363]: return self.handle_command(inbuf, cmd) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in handle_command Oct 06 09:23:27 ceph-m2 bash[1363]: return dispatch[cmd['prefix']].call(self, cmd, inbuf) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call Oct 06 09:23:27 ceph-m2 bash[1363]: return self.func(mgr, **kwargs) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda> Oct 06 09:23:27 ceph-m2 bash[1363]: wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731 Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper Oct 06 09:23:27 ceph-m2 bash[1363]: return func(*args, **kwargs) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/orchestrator/module.py", line 356, in _add_host Oct 06 09:23:27 ceph-m2 bash[1363]: return self._apply_misc([s], False, Format.plain) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/orchestrator/module.py", line 1092, in _apply_misc Oct 06 09:23:27 ceph-m2 bash[1363]: raise_if_exception(completion) Oct 06 09:23:27 ceph-m2 bash[1363]: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 225, in raise_if_exception Oct 06 09:23:27 ceph-m2 bash[1363]: e = pickle.loads(c.serialized_exception) Oct 06 09:23:27 ceph-m2 bash[1363]: TypeError: __init__() missing 2 required positional arguments: 'hostname' and 'addr' ----- cephadm logs from mgr ------ ----- sshd logs DEBUG3 level ------ Oct 6 09:33:09 ceph-m1 sshd[57168]: debug2: input_userauth_request: try method publickey [preauth] Oct 6 09:33:09 ceph-m1 sshd[57168]: debug2: userauth_pubkey: valid user root querying public key ecdsa-sha2-nistp384 AAAAE2VjZHNhLXNoYTItbmlzdHAzO------------ [preauth] Oct 6 09:33:09 ceph-m1 sshd[57168]: debug1: userauth_pubkey: test pkalg ecdsa-sha2-nistp384 pkblob ECDSA SHA256:m6Q0ZQVjjDLWxbmCn0hcGQ2---------- [preauth] Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: mm_key_allowed entering [preauth] Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: mm_request_send entering: type 22 [preauth] Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: mm_key_allowed: waiting for MONITOR_ANS_KEYALLOWED [preauth] Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: mm_request_receive_expect entering: type 23 [preauth] Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: mm_request_receive entering [preauth] Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: mm_request_receive entering Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: monitor_read: checking request 22 Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: mm_answer_keyallowed entering Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: mm_answer_keyallowed: key_from_blob: 0x5568f0aa7880 Oct 6 09:33:09 ceph-m1 sshd[57168]: debug1: temporarily_use_uid: 0/0 (e=0/0) Oct 6 09:33:09 ceph-m1 sshd[57168]: debug1: trying public key file /etc/ssh/fake_authorized_keys Oct 6 09:33:09 ceph-m1 sshd[57168]: debug1: fd 5 clearing O_NONBLOCK Oct 6 09:33:09 ceph-m1 sshd[57168]: debug1: restore_uid: 0/0 Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: mm_answer_keyallowed: publickey authentication test: ECDSA key is not allowed Oct 6 09:33:09 ceph-m1 sshd[57168]: Failed publickey for root from 10.10.12.160 port 40854 ssh2: ECDSA SHA256:m6Q0ZQVjjDLWxbmCn0hcGQ24gbpk------------- Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: mm_request_send entering: type 23 Oct 6 09:33:09 ceph-m1 sshd[57168]: debug2: userauth_pubkey: authenticated 0 pkalg ecdsa-sha2-nistp384 [preauth] Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: user_specific_delay: user specific delay 0.000ms [preauth] Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: ensure_minimum_time_since: elapsed 8.263ms, delaying 8.080ms (requested 8.171ms) [preauth] Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: userauth_finish: failure partial=0 next methods="publickey" [preauth] Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: send packet: type 51 [preauth] Oct 6 09:33:09 ceph-m1 sshd[57168]: Connection closed by authenticating user root 10.10.12.160 port 40854 [preauth] Oct 6 09:33:09 ceph-m1 sshd[57168]: debug1: do_cleanup [preauth] Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: PAM: sshpam_thread_cleanup entering [preauth] Oct 6 09:33:09 ceph-m1 sshd[57168]: debug1: monitor_read_log: child log fd closed Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: mm_request_receive entering Oct 6 09:33:09 ceph-m1 sshd[57168]: debug1: do_cleanup Oct 6 09:33:09 ceph-m1 sshd[57168]: debug1: PAM: cleanup Oct 6 09:33:09 ceph-m1 sshd[57168]: debug3: PAM: sshpam_thread_cleanup entering Oct 6 09:33:09 ceph-m1 sshd[57168]: debug1: Killing privsep child 57169 Oct 6 09:33:09 ceph-m1 sshd[57168]: debug1: audit_event: unhandled event 12 Oct 6 09:33:09 ceph-m1 sshd[757]: debug1: main_sigchld_handler: Child exited --------------- I get "ECDSA key is not allowed" above. From sshd logs, it looks like the client is not sending what is required or in the expected format. Now, what was changed in quincy/mgr on ssh client? Is anyone else using ECDSA keys and it works with quincy? I could not find in PRs something specific to this that could block the access, but it might be. Any suggestion? Thank you! Paul

7 months, 2 weeks

1
0
0 0

is the rbd mirror journal replayed on primary after a crash?

by Scheurer François

Hello Short question regarding journal-based rbd mirroring. ▪IO path with journaling w/o cache: a. Create an event to describe the update b. Asynchronously append event to journal object c. Asynchronously update image once event is safe d. Complete IO to client once update is safe [cf. https://events.static.linuxfound.org/sites/events/files/slides/Disaster%20R…] If a client crashes between b. and c., is there a mechanism to replay the IO from the journal on the primary image? If not, then the primary and secondary images would get out-of-sync (because of the extra write(s) on secondary) and subsequent writes to the primary would corrupt the secondary. Is that correct? Cheers Francois Scheurer -- EveryWare AG François Scheurer Senior Systems Engineer Zurlindenstrasse 52a CH-8003 Zürich tel: +41 44 466 60 00 fax: +41 44 466 60 10 mail: francois.scheurer(a)everyware.ch web: http://www.everyware.ch

7 months, 2 weeks

1
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users October 2023