I'm trying to understand what and where radosgw listen ?
There is a lot of contradictory or redundant informations about that.
First about the contradictory informations for the socket.
At https://docs.ceph.com/en/pacific/radosgw/config-ref/ <https://docs.ceph.com/en/pacific/radosgw/config-ref/>, it says rgw_socket_path, but at https://docs.ceph.com/en/pacific/man/8/radosgw/ <https://docs.ceph.com/en/pacific/man/8/radosgw/> is says 'rgw socket path'
That problem is quite common in the ceph documentation. Are both value accepted ?
Next about some naming, or binding IP. Where it's defined, and how ?
You have:
rgw_frontends = "beast ssl_endpoint=0.0.0.0:443 port=443 ..."
rgw_host =
rgw_port =
rgw_dns_name =
That's a lot of redundancy, or contradictory informations. What is the purpose of each one ? What is the difference between
rgw_frontends = ".. port = ..."
and
rgw_port =
?
Or rgw_host and rgw_dns_name. What is the difference ?
The documentation provides no help at all:
rgw_dns_name
Description: The DNS name of the served domain. See also the hostnames setting within regions.
The description says nothing new, it just repeat the field name.
Is one of them used by the manager for communication ? I already had the problem for the entry in the certificate used by the frontend, it used an IP coming from nowhere.
If a fcgi is used, how the manager find the endpoint ?
I'm trying to set up a new ceph cluster, and I've hit a bit of a blank.
I started off with centos7 and cephadm. Worked fine to a point, except I
had to upgrade podman but it mostly worked with octopus.
Since this is a fresh cluster and hence no data at risk, I decided to jump
straight into Pacific when it came out and upgrade. Which is where my
trouble began. Mostly because Pacific needs a version on lvm later than
what's in centos7.
I can't upgrade to centos8 as my boot drives are not supported by centos8
due to the way redhst disabled lots of disk drivers. I think I'm looking at
Ubuntu or debian.
Given cephadm has a very limited set of depends it would be good to have a
supported matrix, it would also be good to have a check in cephadm on
upgrade, that says no I won't upgrade if the version of lvm2 is too low on
any host and let's the admin fix the issue and try again.
I was thinking to upgrade to centos8 for this project anyway until I
relised that centos8 can't support my hardware I've inherited. But
currently I've got a broken cluster unless I can workout some way to
upgrade lvm in centos7.
Peter.
Hello,
I noticed a couple unanswered questions on this topic from a while back.
It seems, however, worth asking whether adjusting either or both of the
subject attributes could improve performance with large HDD OSDs (mine are
12TB SAS).
In the previous posts on this topic the writers indicated that they had
experimented with increasing either or both of osd_op_num_shards and
osd_op_num_threads_per_shard and had seen performance improvement. Like
myself, the writers wondering about any limitations or pitfalls relating to
such adjustments.
Since I would rather not take chances with a 500TB production cluster I am
asking for guidance from this list.
BTW, my cluster is currently running Nautilus 14.2.6 (stock Debian
packages).
Thank you.
-Dave
--
Dave Hall
Binghamton University
kdhall(a)binghamton.edu
Have you checked for disk failure? dmesg, smartctl etc. ?
Zitat von "Robert W. Eckert" <rob(a)rob.eckert.name>:
> I worked through that workflow- but it seems like the one monitor
> will run for a while - anywhere from an hour to a day, then just stop.
>
> This machine is running on AMD hardware (3600X CPU on X570 chipset)
> while my other two are running on old intel.
>
> I did find this in the service logs
>
> 2021-04-30T16:02:40.135+0000 7f5d0a94f700 -1 rocksdb: submit_common
> error: Corruption: block checksum mismatch: expected 395334538, got
> 4289108204 in /var/lib/ceph/mon/ceph-cube/store.db/073501.sst
> offset 36769734 size 84730 code = 2 Rocksdb transaction:
>
> I am attaching the output of
> journalctl -u ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867(a)mon.cube.service
>
> The error appears to be here:
> Apr 30 12:02:40 cube.robeckert.us conmon[41474]: debug -61>
> 2021-04-30T16:02:38.700+0000 7f5d21332700 4 mon.cube(a)-1(???).mgr
> e702 active server:
> [v2:192.168.2.199:6834/1641928541,v1:192.168.2.199:6835/1641928541](2184157)
> Apr 30 12:02:40 cube.robeckert.us conmon[41474]: debug -60>
> 2021-04-30T16:02:38.700+0000 7f5d21332700 4 mon.cube(a)-1(???).mgr
> e702 mkfs or daemon transitioned to available, loading commands
> Apr 30 12:02:40 cube.robeckert.us conmon[41474]: debug -59>
> 2021-04-30T16:02:38.701+0000 7f5d21332700 4 set_mon_vals no
> callback set
> Apr 30 12:02:40 cube.robeckert.us conmon[41474]: debug -58>
> 2021-04-30T16:02:38.701+0000 7f5d21332700 10 set_mon_vals
> client_cache_size = 32768
> Apr 30 12:02:40 cube.robeckert.us conmon[41474]: debug -57>
> 2021-04-30T16:02:38.701+0000 7f5d21332700 10 set_mon_vals
> container_image =
> docker.io/ceph/ceph@sha256:15b15fb7a708970f1b734285ac08aef45dcd76e86866af37412d041e00853743
> Apr 30 12:02:40 cube.robeckert.us conmon[41474]: debug -56>
> 2021-04-30T16:02:38.701+0000 7f5d21332700 10 set_mon_vals
> log_to_syslog = true
> Apr 30 12:02:40 cube.robeckert.us conmon[41474]: debug -55>
> 2021-04-30T16:02:38.701+0000 7f5d21332700 10 set_mon_vals
> mon_data_avail_warn = 10
> Apr 30 12:02:40 cube.robeckert.us conmon[41474]: debug -54>
> 2021-04-30T16:02:38.701+0000 7f5d21332700 10 set_mon_vals
> mon_warn_on_insecure_global_id_reclaim_allowed = true
> Apr 30 12:02:40 cube.robeckert.us conmon[41474]: debug -53>
> 2021-04-30T16:02:38.701+0000 7f5d21332700 4 set_mon_vals no
> callback set
> Apr 30 12:02:40 cube.robeckert.us conmon[41474]: debug -52>
> 2021-04-30T16:02:38.702+0000 7f5d21332700 2 auth: KeyRing::load:
> loaded key file /var/lib/ceph/mon/ceph-cube/keyring
> Apr 30 12:02:40 cube.robeckert.us conmon[41474]: debug -51>
> 2021-04-30T16:02:38.702+0000 7f5d1095b700 3 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:2808] Compaction error:
> Corruption: block checksum mismatch: expected 395334538, got
> 4289108204 in /var/lib/ceph/mon/ceph- cube/store.db/073501.sst
> offset 36769734 size 84730
> Apr 30 12:02:40 cube.robeckert.us conmon[41474]: debug -50>
> 2021-04-30T16:02:38.702+0000 7f5d21332700 5 asok(0x56327d226000)
> register_command compact hook 0x56327e028700
> Apr 30 12:02:40 cube.robeckert.us conmon[41474]: debug -49>
> 2021-04-30T16:02:38.702+0000 7f5d1095b700 4 rocksdb: (Original Log
> Time 2021/04/30-16:02:38.703267) [compaction/compaction_job.cc:760]
> [default] compacted to: base level 6 level multiplier 10.00 max
> bytes base 268435456 files[5 0 0 0 0 0 2] max score 0.00, MB/sec:
> 11035.6 rd, 0.0 wr, level 6, files in(5, 2) out(1) MB in(32.1,
> 126.7) out(0.0), read-write-amplify(5.0) write-amplify(0.0)
> Corruption: block checksum mismatch: expected 395334538, got
> 4289108204 in /var/lib/ceph/mon/ceph-cube/store.db/073501.sst
> offset 36769734 size 84730, records in: 7670, records dropped: 6759
> output_compres
> Apr 30 12:02:40 cube.robeckert.us conmon[41474]: debug -48>
> 2021-04-30T16:02:38.702+0000 7f5d1095b700 4 rocksdb: (Original Log
> Time 2021/04/30-16:02:38.703283) EVENT_LOG_v1 {"time_micros":
> 1619798558703277, "job": 3, "event": "compaction_finished",
> "compaction_time_micros": 15085, "compaction_time_cpu_micros":
> 11937, "output_level": 6, "num_output_files": 1,
> "total_output_size": 12627499, "num_input_records": 7670,
> "num_output_records": 911, "num_subcompactions": 1,
> "output_compression": "NoCompression",
> "num_single_delete_mismatches": 0, "num_single_delete_fallthrough":
> 0, "lsm_state": [5, 0, 0, 0, 0, 0, 2]}
> Apr 30 12:02:40 cube.robeckert.us conmon[41474]: debug -47>
> 2021-04-30T16:02:38.702+0000 7f5d1095b700 2 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:2344] Waiting after background
> compaction error: Corruption: block checksum mismatch: expected
> 395334538, got 4289108204 in
> /var/lib/ceph/mon/ceph-cube/store.db/073501.sst offset 36769734
> size 84730, Accumulated background error counts: 1
> Apr 30 12:02:40 cube.robeckert.us conmon[41474]: debug -46>
> 2021-04-30T16:02:38.702+0000 7f5d21332700 5 asok(0x56327d226000)
> register_command smart hook 0x56327e028700
>
>
> This is running the latest pacific container, but I was seeing the
> same issue in octopus.
>
> The container runs under podman on rhel 8, and the
> /var/lib/ceph/mon/ceph-cube is mapped to
> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.cube.service
> on the nvme boot drive, which has plenty of space.
>
> To recover I run a script that will stop the monitor on another
> host, copy the store.db directory then start up, and it syncs right
> up.
>
>
>
> Thanks,
> Rob
>
>
>
>
>
> -----Original Message-----
> From: Sebastian Wagner <sewagner(a)redhat.com>
> Sent: Thursday, April 29, 2021 7:44 AM
> To: Eugen Block <eblock(a)nde.ag>; ceph-users(a)ceph.io
> Subject: [ceph-users] Re: one of 3 monitors keeps going down
>
> Right, here are the docs for that workflow:
>
> https://docs.ceph.com/en/latest/cephadm/mon/#mon-service
>
> Am 29.04.21 um 13:13 schrieb Eugen Block:
>> Hi,
>>
>> instead of copying MON data to this one did you also try to redeploy
>> the MON container entirely so it gets a fresh start?
>>
>>
>> Zitat von "Robert W. Eckert" <rob(a)rob.eckert.name>:
>>
>>> Hi,
>>> On a daily basis, one of my monitors goes down
>>>
>>> [root@cube ~]# ceph health detail
>>> HEALTH_WARN 1 failed cephadm daemon(s); 1/3 mons down, quorum
>>> rhel1.robeckert.us,story [WRN] CEPHADM_FAILED_DAEMON: 1 failed
>>> cephadm daemon(s)
>>> daemon mon.cube on cube.robeckert.us is in error state [WRN]
>>> MON_DOWN: 1/3 mons down, quorum rhel1.robeckert.us,story
>>> mon.cube (rank 2) addr
>>> [v2:192.168.2.142:3300/0,v1:192.168.2.142:6789/0] is down (out of
>>> quorum) [root@cube ~]# ceph --version ceph version 15.2.11
>>> (e3523634d9c2227df9af89a4eac33d16738c49cb)
>>> octopus (stable)
>>>
>>> I have a script that will copy the mon data from another server and
>>> it restarts and runs well for a while.
>>>
>>> It is always the same monitor, and when I look at the logs the only
>>> thing I really see is the cephadm log showing it down
>>>
>>> 2021-04-28 10:07:26,173 DEBUG Running command: /usr/bin/podman
>>> --version
>>> 2021-04-28 10:07:26,217 DEBUG /usr/bin/podman: stdout podman version
>>> 2.2.1
>>> 2021-04-28 10:07:26,222 DEBUG Running command: /usr/bin/podman
>>> inspect --format
>>> {{.Id}},{{.Config.Image}},{{.Image}},{{.Created}},{{index
>>> .Config.Labels "io.ceph.version"}}
>>> ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867-osd.2
>>> 2021-04-28 10:07:26,326 DEBUG /usr/bin/podman: stdout
>>> fab17e5242eb4875e266df19ca89b596a2f2b1d470273a99ff71da2ae81eeb3c,dock
>>> er.io/ceph/ceph:v15,5b724076c58f97872fc2f7701e8405ec809047d71528f79da
>>> 452188daf2af72e,2021-04-26
>>> 17:13:15.54183375 -0400 EDT,
>>> 2021-04-28 10:07:26,328 DEBUG Running command: systemctl is-enabled
>>> ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867(a)mon.cube<mailto:ceph-fe3a7c
>>> b0-69ca-11eb-8d45-c86000d08867(a)mon.cube>
>>>
>>> 2021-04-28 10:07:26,334 DEBUG systemctl: stdout enabled
>>> 2021-04-28 10:07:26,335 DEBUG Running command: systemctl is-active
>>> ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867(a)mon.cube<mailto:ceph-fe3a7c
>>> b0-69ca-11eb-8d45-c86000d08867(a)mon.cube>
>>>
>>> 2021-04-28 10:07:26,340 DEBUG systemctl: stdout failed
>>> 2021-04-28 10:07:26,340 DEBUG Running command: /usr/bin/podman
>>> --version
>>> 2021-04-28 10:07:26,395 DEBUG /usr/bin/podman: stdout podman version
>>> 2.2.1
>>> 2021-04-28 10:07:26,402 DEBUG Running command: /usr/bin/podman
>>> inspect --format
>>> {{.Id}},{{.Config.Image}},{{.Image}},{{.Created}},{{index
>>> .Config.Labels "io.ceph.version"}}
>>> ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867-mon.cube
>>> 2021-04-28 10:07:26,526 DEBUG /usr/bin/podman: stdout
>>> 04e7c673cbacf5160427b0c3eb2f0948b2f15d02c58bd1d9dd14f975a84cfc6f,dock
>>> er.io/ceph/ceph:v15,5b724076c58f97872fc2f7701e8405ec809047d71528f79da
>>> 452188daf2af72e,2021-04-28
>>> 08:54:57.614847512 -0400 EDT,
>>>
>>> I don't know if it matters, but this server is an AMD 3600XT while
>>> my other two servers which have had no issues are intel based.
>>>
>>> The root file system was originally on a SSD, and I switched to NVME,
>>> so I eliminated controller or drive issues. (I didn't see anything
>>> in dmesg anyway)
>>>
>>> If someone could point me in the right direction on where to
>>> troubleshoot next, I would appreciate it.
>>>
>>> Thanks,
>>> Rob Eckert
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
>>> email to ceph-users-leave(a)ceph.io
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
>> email to ceph-users-leave(a)ceph.io
>>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
> email to ceph-users-leave(a)ceph.io
Hello folks,
I am new to ceph and at the moment I am doing some performance tests with a 4 node ceph-cluster (pacific, 16.2.1).
Node hardware (4 identical nodes):
* DELL 3620 workstation
* Intel Quad-Core i7-6700(a)3.4 GHz
* 8 GB RAM
* Debian Buster (base system, installed a dedicated on Patriot Burst 120 GB SATA-SSD)
* HP 530SPF+ 10 GBit dual-port NIC (tested with iperf to 9.4 GBit/s from node to node)
* 1 x Kingston KC2500 M2 NVMe PCIe SSD (500 GB, NO power loss protection !)
* 3 x Seagate Barracuda SATA disk drives (7200 rpm, 500 GB)
After bootstrapping a containerized (docker) ceph-cluster, I did some performance tests on the NVMe storage by creating a storage pool called „ssdpool“, consisting of 4 OSDs per (one) NVMe device (per node). A first write-performance test yields
=============
root@ceph1:~# rados bench -p ssdpool 10 write -b 4M -t 16 --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_ceph1_78
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 30 14 55.997 56 0.0209977 0.493427
2 16 53 37 73.9903 92 0.0264305 0.692179
3 16 76 60 79.9871 92 0.559505 0.664204
4 16 99 83 82.9879 92 0.609332 0.721016
5 16 116 100 79.9889 68 0.686093 0.698084
6 16 132 116 77.3224 64 1.19715 0.731808
7 16 153 137 78.2741 84 0.622646 0.755812
8 16 171 155 77.486 72 0.25409 0.764022
9 16 192 176 78.2076 84 0.968321 0.775292
10 16 214 198 79.1856 88 0.401339 0.766764
11 1 214 213 77.4408 60 0.969693 0.784002
Total time run: 11.0698
Total writes made: 214
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 77.3272
Stddev Bandwidth: 13.7722
Max bandwidth (MB/sec): 92
Min bandwidth (MB/sec): 56
Average IOPS: 19
Stddev IOPS: 3.44304
Max IOPS: 23
Min IOPS: 14
Average Latency(s): 0.785372
Stddev Latency(s): 0.49011
Max latency(s): 2.16532
Min latency(s): 0.0144995
=============
... and I think that 80 MB/s throughput is a very poor result in conjunction with NVMe devices and 10 GBit nics.
A bare write-test (with fsync=0 option) of the NVMe drives yields a write throughput of round about 800 MB/s per device ... the second test (with fsync=1) drops performance to 200 MB/s.
=============
root@ceph1:/home/mschmid# fio --rw=randwrite --name=IOPS-write --bs=1024k --direct=1 --filename=/dev/nvme0n1 --numjobs=4 --ioengine=libaio --iodepth=32 --refill_buffers --group_reporting --runtime=30 --time_based --fsync=0
IOPS-write: (g=0): rw=randwrite, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32...
fio-3.12
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][w=723MiB/s][w=722 IOPS][eta 00m:00s]
IOPS-write: (groupid=0, jobs=4): err= 0: pid=31585: Thu Apr 29 15:15:03 2021
write: IOPS=740, BW=740MiB/s (776MB/s)(21.8GiB/30206msec); 0 zone resets
slat (usec): min=16, max=810, avg=106.48, stdev=30.48
clat (msec): min=7, max=1110, avg=172.09, stdev=120.18
lat (msec): min=7, max=1110, avg=172.19, stdev=120.18
clat percentiles (msec):
| 1.00th=[ 32], 5.00th=[ 48], 10.00th=[ 53], 20.00th=[ 63],
| 30.00th=[ 115], 40.00th=[ 161], 50.00th=[ 169], 60.00th=[ 178],
| 70.00th=[ 190], 80.00th=[ 220], 90.00th=[ 264], 95.00th=[ 368],
| 99.00th=[ 667], 99.50th=[ 751], 99.90th=[ 894], 99.95th=[ 986],
| 99.99th=[ 1036]
bw ( KiB/s): min=22528, max=639744, per=25.02%, avg=189649.94, stdev=113845.69, samples=240
iops : min= 22, max= 624, avg=185.11, stdev=111.18, samples=240
lat (msec) : 10=0.01%, 20=0.19%, 50=6.43%, 100=20.29%, 250=61.52%
lat (msec) : 500=8.21%, 750=2.85%, 1000=0.47%
cpu : usr=11.87%, sys=2.05%, ctx=13141, majf=0, minf=45
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.3%, 32=99.4%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=0,22359,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
WRITE: bw=740MiB/s (776MB/s), 740MiB/s-740MiB/s (776MB/s-776MB/s), io=21.8GiB (23.4GB), run=30206-30206msec
Disk stats (read/write):
nvme0n1: ios=0/89150, merge=0/0, ticks=0/15065724, in_queue=15118720, util=99.75%
=============
Furthermore an IOPS-test on the NVMe device with block-size 4k shows round about 1000 IOPS with fsnyc=1 and 35000 IOPS with fsync=0.
To my question: As CPU- and network-load seem to be low during my tests, I would like to know, which bottleneck can cause such a huge performance drop between the bare hardware-performance of the nvme-drives and the write-speeds in the rados benchmark. Could the missing power loss protection (fsync=1) be the problem, or what throughput should one expect to be normal in such a setup?
Thanks for every advice!
Best regards,
Michael
Dear gents,
to get handy with cephadm upgrade path and in general (we heavily use old
style "ceph-deploy" Octopus based production clusters), we decided to do
some tests with a vanilla cluster running 15.2.11 based on Centos8 on top
of vSphere. Deployment of Octopus cluster runs very well and we are
excited about this new technique and all the possibilities. No errors no
clues... :-)
Unfortunately upgrade fails to Pacific (16.2.0 or 16.2.1) either original
docker or quay.ceph.io/ceph-ci/ceph:pacific images all the time. We use a
small setup (3 mons, 2 mgrs, some osds) This is the upgrade behaviour:
Upgrade of both MGR's seems to be ok but we get this:
2021-04-29T15:35:19.903111+0200 mgr.c0n00.vnxaqu [DBG] daemon
mgr.c0n00.vnxaqu container digest correct
2021-04-29T15:35:19.903206+0200 mgr.c0n00.vnxaqu [DBG] daemon
mgr.c0n00.vnxaqu deployed by correct version
2021-04-29T15:35:19.903298+0200 mgr.c0n00.vnxaqu [DBG] daemon
mgr.c0n01.gstlmw container digest correct
2021-04-29T15:35:19.903378+0200 mgr.c0n00.vnxaqu [DBG] daemon
mgr.c0n01.gstlmw *not deployed by correct version*
After this the upgrade process stucks completely. Although you have a
running cluster (minus one monitor daemon):
[root@c0n00 ~]# ceph -s
cluster:
id: 5541c866-a8fe-11eb-b604-005056b8f1bf
health: HEALTH_WARN
* 3 hosts fail cephadm check*
services:
mon: 2 daemons, quorum c0n00,c0n02 (age 68m)
mgr: c0n00.bmtvpr(active, since 68m), standbys: c0n01.jwfuca
osd: 4 osds: 4 up (since 63m), 4 in (since 62m)
[..]
progress:
Upgrade to 16.2.1-257-g717ce59b (0s)
[=...........................]
{
"target_image": "
quay.ceph.io/ceph-ci/ceph@sha256:d0f624287378fe63fc4c30bccc9f82bfe0e42e62381c0a3d0d3d86d985f5d788",
"in_progress": true,
"services_complete": [
"mgr"
],
"progress": "2/19 ceph daemons upgraded",
"message": "Error: UPGRADE_EXCEPTION: Upgrade: failed due to an
unexpected exception"
[root@c0n00 ~]# ceph orch ps
NAME HOST PORTS STATUS REFRESHED AGE
VERSION IMAGE ID CONTAINER ID
alertmanager.c0n00 c0n00 running (56m) 4m ago 16h
0.20.0 0881eb8f169f 30d9eff06ce2
crash.c0n00 c0n00 running (56m) 4m ago 16h
15.2.11 9d01da634b8f 91d3e4d0e14d
crash.c0n01 c0n01 host is offline 16h ago 16h
15.2.11 9d01da634b8f 0ff4a20021df
crash.c0n02 c0n02 host is offline 16h ago 16h
15.2.11 9d01da634b8f 0253e6bb29a0
crash.c0n03 c0n03 host is offline 16h ago 16h
15.2.11 9d01da634b8f 291ce4f8b854
grafana.c0n00 c0n00 running (56m) 4m ago 16h
6.7.4 80728b29ad3f 46d77b695da5
mgr.c0n00.bmtvpr c0n00 *:8443,9283 running (56m) 4m ago 16h
16.2.1-257-g717ce59b 3be927f015dd 94a7008ccb4f
mgr.c0n01.jwfuca c0n01 host is offline 16h ago 16h
16.2.1-257-g717ce59b 3be927f015dd 766ada65efa9
mon.c0n00 c0n00 running (56m) 4m ago 16h
15.2.11 9d01da634b8f b9f270cd99e2
mon.c0n02 c0n02 host is offline 16h ago 16h
15.2.11 9d01da634b8f a90c21bfd49e
node-exporter.c0n00 c0n00 running (56m) 4m ago 16h
0.18.1 e5a616e4b9cf eb1306811c6c
node-exporter.c0n01 c0n01 host is offline 16h ago 16h
0.18.1 e5a616e4b9cf 093a72542d3e
node-exporter.c0n02 c0n02 host is offline 16h ago 16h
0.18.1 e5a616e4b9cf 785531f5d6cf
node-exporter.c0n03 c0n03 host is offline 16h ago 16h
0.18.1 e5a616e4b9cf 074fac77e17c
osd.0 c0n02 host is offline 16h ago 16h
15.2.11 9d01da634b8f c075bd047c0a
osd.1 c0n01 host is offline 16h ago 16h
15.2.11 9d01da634b8f 616aeda28504
osd.2 c0n03 host is offline 16h ago 16h
15.2.11 9d01da634b8f b36453730c83
osd.3 c0n00 running (56m) 4m ago 16h
15.2.11 9d01da634b8f e043abf53206
prometheus.c0n00 c0n00 running (56m) 4m ago 16h
2.18.1 de242295e225 7cb50c04e26a
After some digging into daemon logs we found Tracebacks (please see below).
We also noticed that we successfully reach each host per ssh -F .... !!!
We've done tcpdumps while upgrading and every SYN gets its SYNACK... ;-)
Because we get no errors while deploying fresh Octopus cluster by
cephadm (from
https://github.com/ceph/ceph/raw/octopus/src/cephadm/cephadm and cephadm
prepare host is always OK) it might be a missing Python Lib or something
that's not checked cephadm itself?
Thank you for any hint.
Christoph Ackermann
Traceback:
Traceback (most recent call last):
File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line
48, in bootstrap_exec
s = io.read(1)
File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 402,
in read
raise EOFError("expected %d bytes, got %d" % (numbytes, len(buf)))
EOFError: expected 1 bytes, got 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/serve.py", line 1166, in
_remote_connection
conn, connr = self.mgr._get_connection(addr)
File "/usr/share/ceph/mgr/cephadm/module.py", line 1202, in
_get_connection
sudo=True if self.ssh_user != 'root' else False)
File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line
34, in __init__
self.gateway = self._make_gateway(hostname)
File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line
44, in _make_gateway
self._make_connection_string(hostname)
File "/lib/python3.6/site-packages/execnet/multi.py", line 134, in
makegateway
gw = gateway_bootstrap.bootstrap(io, spec)
File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line
102, in bootstrap
bootstrap_exec(io, spec)
File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line
53, in bootstrap_exec
raise HostNotFound(io.remoteaddress)
execnet.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-61otabz_ -i
/tmp/cephadm-identity-rt2nm0t4 root@c0n02
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/share/ceph/mgr/cephadm/utils.py", line 73, in do_work
return f(*arg)
File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 60, in
create_from_spec_one
replace_osd_ids=osd_id_claims.get(host, []), env_vars=env_vars
File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 75, in
create_single_host
out, err, code = self._run_ceph_volume_command(host, cmd,
env_vars=env_vars)
File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 295, in
_run_ceph_volume_command
error_ok=True)
File "/usr/share/ceph/mgr/cephadm/serve.py", line 1003, in _run_cephadm
with self._remote_connection(host, addr) as tpl:
File "/lib64/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/usr/share/ceph/mgr/cephadm/serve.py", line 1197, in
_remote_connection
raise OrchestratorError(msg) from e
orchestrator._interface.OrchestratorError: Failed to connect to c0n02
(c0n02).
Please make sure that the host is reachable and accepts connections using
the cephadm SSH key
To add the cephadm SSH key to the host:
> ceph cephadm get-pub-key > ~/ceph.pub
> ssh-copy-id -f -i ~/ceph.pub root@c0n02
To check that the host is reachable:
> ceph cephadm get-ssh-config > ssh_config
> ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
> chmod 0600 ~/cephadm_private_key
> ssh -F ssh_config -i ~/cephadm_private_key root@c0n02
Hello,
I upgraded my Octopus test cluster which has 5 hosts because one of the node (a mon/mgr node) was still on version 15.2.10 but all the others on 15.2.11.
For the upgrade I used the following command:
ceph orch upgrade start --ceph-version 15.2.11
The upgrade worked correctly and I did not see any errors in the logs but the host version in the ceph dashboard (under the navigation Cluster -> Hosts) still snows 15.2.10 for that specific node.
The output of "ceph versions", shows that every component is on 15.2.11 as you can see below:
{
"mon": {
"ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 3
},
"mgr": {
"ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 2
},
"osd": {
"ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 2
},
"mds": {},
"overall": {
"ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)": 7
}
}
So why is it still stuck on 15.2.10 in the dashboard?
Best regards,
Mabi
Good thought. The storage for the monitor data is a RAID-0 over three
NVMe devices. Watching iostat, they are completely idle, maybe 0.8% to
1.4% for a second every minute or so.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Thu, Apr 8, 2021 at 7:48 PM Zizon Qiu <zzdtsv(a)gmail.com> wrote:
>
> Will it be related to some kind of disk issue of that mon located in,which may casually
> slow down IO and further the rocksdb?
>
>
> On Fri, Apr 9, 2021 at 4:29 AM Robert LeBlanc <robert(a)leblancnet.us> wrote:
>>
>> I found this thread that matches a lot of what I'm seeing. I see the
>> ms_dispatch thread going to 100%, but I'm at a single MON, the
>> recovery is done and the rocksdb MON database is ~300MB. I've tried
>> all the settings mentioned in that thread with no noticeable
>> improvement. I was hoping that once the recovery was done (backfills
>> to reformatted OSDs) that it would clear up, but not yet. So any other
>> ideas would be really helpful. Our MDS is functioning, but stalls a
>> lot because the mons miss heartbeats.
>>
>> mon_compact_on_start = true
>> rocksdb_cache_size = 1342177280
>> mon_lease = 30
>> mon_osd_cache_size = 200000
>> mon_sync_max_payload_size = 4096
>>
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
>>
>> On Thu, Apr 8, 2021 at 1:11 PM Stefan Kooman <stefan(a)bit.nl> wrote:
>> >
>> > On 4/8/21 6:22 PM, Robert LeBlanc wrote:
>> > > I upgraded our Luminous cluster to Nautilus a couple of weeks ago and
>> > > converted the last batch of FileStore OSDs to BlueStore about 36 hours
>> > > ago. Yesterday our monitor cluster went nuts and started constantly
>> > > calling elections because monitor nodes were at 100% and wouldn't
>> > > respond to heartbeats. I reduced the monitor cluster to one to prevent
>> > > the constant elections and that let the system limp along until the
>> > > backfills finished. There are large amounts of time where ceph commands
>> > > hang with the CPU is at 100%, when the CPU drops I see a lot of work
>> > > getting done in the monitor logs which stops as soon as the CPU is at
>> > > 100% again.
>> >
>> >
>> > Try reducing mon_sync_max_payload_size=4096. I have seen Frank Schilder
>> > advise this several times because of monitor issues. Also recently for a
>> > cluster that got upgraded from Luminous -> Mimic -> Nautilus.
>> >
>> > Worth a shot.
>> >
>> > Otherwise I'll try to look in depth and see if I can come up with
>> > something smart (for now I need to go catch some sleep).
>> >
>> > Gr. Stefan