I am looking to create a new pool that would be backed by a particular set
of drives that are larger nVME SSDs (Intel SSDPF2NV153TZ, 15TB drives).
Particularly, I am wondering about what is the best way to move devices
from one pool and to direct them to be used in a new pool to be created. In
this case, the documentation suggests I could want to assign them to a new
device-class and have a placement rule that targets that device-class in
the new pool.
Currently the Ceph cluster has two device classes 'hdd' and 'ssd', and the
larger 15TB drives were automatically assigned to the 'ssd' device class
that is in use by a different pool. The `ssd` device classes are used in a
placement rule targeting that class.
The documentation describes that I could set a device class for an OSD with
a command like:
`ceph osd crush set-device-class CLASS OSD_ID [OSD_ID ..]`
Class names can be arbitrary strings like 'big_nvme". Before setting a new
device class to an OSD that already has an assigned device class, should
use `ceph osd crush rm-device-class ssd osd.XX`.
Can I proceed to directly remove these OSDs from the current device class
and assign to a new device class? Should they be moved one by one? What is
the way to safely protect data from the existing pool that they are mapped
to?
Thanks,
Matt
--
Matt Larson, PhD
Madison, WI 53705 U.S.A.
Hi all,
we had a bunch of large omap object warnings after a user deleted a lot of files on a ceph fs with snapshots. After the snapshots were rotated out, all but one of these warnings disappeared over time. However, one warning is stuck and I wonder if its something else.
Is there a reasonable way (say, one-liner with no more than 120 characters) to get ceph to tell me which PG this is coming from? I just want to issue a deep scrub to check if it disappears and going through the logs and querying every single object for its key count seems a bit of a hassle for something that ought to be part of "ceph health detail".
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Dear Ceph Users,
I am requesting the backporting changes related to the nats_adapter.lua.
This feature is in a version newer than pacific, but we don't have it
in pacific version.
I would greatly appreciate it if someone from the Ceph development
team backport this change to the pacific version.
Best regards,
Vahideh Alinouri
Hey everyone,
On 20/10/2022 10:12, Christian Rohmann wrote:
> 1) May I bring up again my remarks about the timing:
>
> On 19/10/2022 11:46, Christian Rohmann wrote:
>
>> I believe the upload of a new release to the repo prior to the
>> announcement happens quite regularly - it might just be due to the
>> technical process of releasing.
>> But I agree it would be nice to have a more "bit flip" approach to
>> new releases in the repo and not have the packages appear as updates
>> prior to the announcement and final release and update notes.
> By my observations sometimes there are packages available on the
> download servers via the "last stable" folders such as
> https://download.ceph.com/debian-quincy/ quite some time before the
> announcement of a release is out.
> I know it's hard to time this right with mirrors requiring some time
> to sync files, but would be nice to not see the packages or have
> people install them before there are the release notes and potential
> pointers to changes out.
Todays 16.2.11 release shows the exact issue I described above ....
1) 16.2.11 packages are already available via e.g.
https://download.ceph.com/debian-pacific
2) release notes not yet merged:
(https://github.com/ceph/ceph/pull/49839), thus
https://ceph.io/en/news/blog/2022/v16-2-11-pacific-released/ show a 404 :-)
3) No announcement like
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/QOCU563UD3…
to the ML yet.
Regards
Christian
i use ceph 17.2.6 and when i deploy two number of separate rgw realm with
zonegroup and zone , dashboard enabled access for bouth object gateway and
i can create user and bucket and etc .but when i trying create bucket in on
of object gatways .i get this error in below:
------------
debug 2023-10-29T12:19:50.697+0000 7fd203a26700 0 [dashboard ERROR
rest_client] RGW REST API failed PUT req status: 400
debug 2023-10-29T12:19:50.697+0000 7fd203a26700 0 [dashboard ERROR
exception] Dashboard Exception
Traceback (most recent call last):
File "/usr/share/ceph/mgr/dashboard/controllers/rgw.py", line 304, in
create
lock_enabled)
File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 534, in
func_wrapper
**kwargs)
File "/usr/share/ceph/mgr/dashboard/services/rgw_client.py", line 563, in
create_bucket
return request(data=data, headers=headers)
File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 323, in __call__
data, raw_content, headers)
File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 452, in
do_request
resp.content)
dashboard.rest_client.RequestException: RGW REST API failed request with
status code 400
(b'{"Code":"InvalidLocationConstraint","Message":"The specified
location-constr'
b'aint is not
valid","BucketName":"farhad2","RequestId":"tx000003fa9d80c50a79d'
b'b6-00653e4de6-285b3-test","HostId":"285b3-test-test"}')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 47, in
dashboard_exception_handler
return handler(*args, **kwargs)
File "/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line 54, in
__call__
return self.callable(*self.args, **self.kwargs)
File "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py",
line 258, in inner
ret = func(*args, **kwargs)
File "/usr/share/ceph/mgr/dashboard/controllers/_rest_controller.py",
line 191, in wrapper
return func(*vpath, **params)
File "/usr/share/ceph/mgr/dashboard/controllers/rgw.py", line 315, in
create
raise DashboardException(e, http_status_code=500, component='rgw')
dashboard.exceptions.DashboardException: RGW REST API failed request with
status code 400
(b'{"Code":"InvalidLocationConstraint","Message":"The specified
location-constr'
b'aint is not
valid","BucketName":"farhad2","RequestId":"tx000003fa9d80c50a79d'
b'b6-00653e4de6-285b3-test","HostId":"285b3-test-test"}')
debug 2023-10-29T12:19:50.701+0000 7fd203a26700 0 [dashboard INFO request]
[192.168.0.1:55833] [POST] [500] [0.031s] [admin] [252.0B] /api/rgw/bucket
debug 2023-10-29T12:19:50.713+0000 7fd204a28700 0 [dashboard ERROR
rest_client] RGW REST API failed GET req status: 404
debug 2023-10-29T12:19:50.715+0000 7fd204a28700 0 [dashboard ERROR
exception] Dashboard Exception
Traceback (most recent call last):
File "/usr/share/ceph/mgr/dashboard/controllers/rgw.py", line 145, in
proxy
result = instance.proxy(method, path, params, None)
File "/usr/share/ceph/mgr/dashboard/services/rgw_client.py", line 513, in
proxy
params, data)
File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 534, in
func_wrapper
**kwargs)
File "/usr/share/ceph/mgr/dashboard/services/rgw_client.py", line 507, in
_proxy_request
raw_content=True)
File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 323, in __call__
data, raw_content, headers)
File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 452, in
do_request
resp.content)
dashboard.rest_client.RequestException: RGW REST API failed request with
status code 404
(b'{"Code":"NoSuchBucket","RequestId":"tx0000086cdcd9547b301e2-00653e4de6-285b3'
b'-test","HostId":"285b3-test-test"}')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 47, in
dashboard_exception_handler
return handler(*args, **kwargs)
File "/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line 54, in
__call__
return self.callable(*self.args, **self.kwargs)
File "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py",
line 258, in inner
ret = func(*args, **kwargs)
File "/usr/share/ceph/mgr/dashboard/controllers/_rest_controller.py",
line 191, in wrapper
return func(*vpath, **params)
File "/usr/share/ceph/mgr/dashboard/controllers/rgw.py", line 275, in get
result = self.proxy(daemon_name, 'GET', 'bucket', {'bucket': bucket})
File "/usr/share/ceph/mgr/dashboard/controllers/rgw.py", line 151, in
proxy
raise DashboardException(e, http_status_code=http_status_code,
component='rgw')
dashboard.exceptions.DashboardException: RGW REST API failed request with
status code 404
(b'{"Code":"NoSuchBucket","RequestId":"tx0000086cdcd9547b301e2-00653e4de6-285b3'
b'-test","HostId":"285b3-test-test"}')
--------------------------------------------------
my cluster have tow realm :
1) rgw-realm= test , rgw-zonegroup=test(master) ,zone= test( master )
2) rgw-realm= velero(default), rgw-zonegroup=velero(master) ,zone= velero(
master
And another question, is it possible that I can create users and buckets
with the same name on each rgw-realm?For example, user Yan on both.
So this is a new host (you didn't provide the osd tree)? In that case
I would compare the ceph.conf files between a working and this failing
host, and paste it here (mask sensitive data). It looks like the
connection to the MONs is successful though, and "ceph-volume create"
worked as well. You could try to avoid a crush update on start:
[osd]
osd crush update on start = false
Or you could also try to manually assign the location:
[osd.301]
osd crush location = "root=ssd"
Try one option at a time to see which one works (if at all).
Zitat von Pardhiv Karri <meher4india(a)gmail.com>:
> Hi Eugen,
>
> Thank you for the reply. For some reason I'm not getting individual reply
> but only the digest. Below is the ceph -s output (renamed hostnames) and
> the command I am using to create a bluestore OSD. It should create a OSD
> with its hostname and then the OSD should be up but it is not creating the
> host and just a rogue OSD which is down.
>
> [root@hbmon1 ~]# ceph -s
> cluster:
> id: f1579737-d2c9-49ab-a6fa-8ca952488120
> health: HEALTH_WARN
> 116896/167701779 objects misplaced (0.070%)
>
> services:
> mon: 3 daemons, quorum hbmon1,hbmon2,hbmon3
> mgr: hbmon2(active), standbys: hbmon1, hbmon3
> osd: 721 osds: 717 up, 716 in; 60 remapped pgs
> rgw: 1 daemon active
>
> data:
> pools: 13 pools, 32384 pgs
> objects: 55.90M objects, 324TiB
> usage: 973TiB used, 331TiB / 1.27PiB avail
> pgs: 116896/167701779 objects misplaced (0.070%)
> 32294 active+clean
> 59 active+remapped+backfill_wait
> 27 active+clean+scrubbing+deep
> 3 active+clean+scrubbing
> 1 active+remapped+backfilling
>
> io:
> client: 237MiB/s rd, 635MiB/s wr, 10.66kop/s rd, 6.98kop/s wr
> recovery: 12.9MiB/s, 1objects/s
>
> [root@hbmon1 ~]#
>
>
> Command used to create OSD, "ceph-volume lvm create --data /dev/sda"
>
>
>
> Debug log output of OSD creation command.
>
> [root@dra1361 ~]# ceph-volume lvm create --data /dev/sda
> Running command: /usr/bin/ceph-authtool --gen-print-key
> Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
> 21e9a327-ada5-4734-ab5d-7be333d4f3cf
> Running command: vgcreate --force --yes
> ceph-81236ab2-f6e0-4cc3-9815-95c8dd16c6ef /dev/sda
> stdout: Physical volume "/dev/sda" successfully created.
> stdout: Volume group "ceph-81236ab2-f6e0-4cc3-9815-95c8dd16c6ef"
> successfully created
> Running command: lvcreate --yes -l 100%FREE -n
> osd-block-21e9a327-ada5-4734-ab5d-7be333d4f3cf
> ceph-81236ab2-f6e0-4cc3-9815-95c8dd16c6ef
> stdout: Logical volume "osd-block-21e9a327-ada5-4734-ab5d-7be333d4f3cf"
> created.
> Running command: /usr/bin/ceph-authtool --gen-print-key
> Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-301
> --> Absolute path not found for executable: restorecon
> --> Ensure $PATH environment variable contains common executable locations
> Running command: chown -h ceph:ceph
> /dev/ceph-81236ab2-f6e0-4cc3-9815-95c8dd16c6ef/osd-block-21e9a327-ada5-4734-ab5d-7be333d4f3cf
> Running command: chown -R ceph:ceph /dev/dm-0
> Running command: ln -s
> /dev/ceph-81236ab2-f6e0-4cc3-9815-95c8dd16c6ef/osd-block-21e9a327-ada5-4734-ab5d-7be333d4f3cf
> /var/lib/ceph/osd/ceph-301/block
> Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o
> /var/lib/ceph/osd/ceph-301/activate.monmap
> stderr: 2023-10-27 19:48:57.789631 7ff36a340700 2 Event(0x7ff3640e2950
> nevent=5000 time_id=1).set_owner idx=0 owner=140683435575040
> 2023-10-27 19:48:57.789713 7ff369b3f700 2 Event(0x7ff36410f670 nevent=5000
> time_id=1).set_owner idx=1 owner=140683427182336
> 2023-10-27 19:48:57.789771 7ff36933e700 2 Event(0x7ff36413c4e0 nevent=5000
> time_id=1).set_owner idx=2 owner=140683418789632
> stderr: 2023-10-27 19:48:57.790044 7ff36c135700 1 Processor -- start
> 2023-10-27 19:48:57.790100 7ff36c135700 1 -- - start start
> 2023-10-27 19:48:57.790352 7ff36c135700 1 -- - --> 10.51.228.32:6789/0 --
> auth(proto 0 38 bytes epoch 0) v1 -- 0x7ff364175e70 con 0
> 2023-10-27 19:48:57.790368 7ff36c135700 1 -- - --> 10.51.228.33:6789/0 --
> auth(proto 0 38 bytes epoch 0) v1 -- 0x7ff3641762b0 con 0
> stderr: 2023-10-27 19:48:57.791313 7ff369b3f700 1 --
> 10.51.228.213:0/2678799534 learned_addr learned my addr
> 10.51.228.213:0/2678799534
> stderr: 2023-10-27 19:48:57.791740 7ff36933e700 2 --
> 10.51.228.213:0/2678799534 >> 10.51.228.32:6789/0 conn(0x7ff36417f4e0 :-1
> s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=1)._process_connection got
> newly_acked_seq 0 vs out_seq 0
> 2023-10-27 19:48:57.791763 7ff369b3f700 2 -- 10.51.228.213:0/2678799534 >>
> 10.51.228.33:6789/0 conn(0x7ff36417be80 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ
> pgs=0 cs=0 l=1)._process_connection got newly_acked_seq 0 vs out_seq 0
> stderr: 2023-10-27 19:48:57.792414 7ff353fff700 1 --
> 10.51.228.213:0/2678799534 <== mon.1 10.51.228.33:6789/0 1 ==== mon_map
> magic: 0 v1 ==== 442+0+0 (171445244 0 0) 0x7ff360001690 con 0x7ff36417be80
> 2023-10-27 19:48:57.792544 7ff353fff700 1 -- 10.51.228.213:0/2678799534
> <== mon.1 10.51.228.33:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1
> ==== 33+0+0 (2209822748 0 0) 0x7ff3641762b0 con 0x7ff36417be80
> 2023-10-27 19:48:57.792686 7ff353fff700 1 -- 10.51.228.213:0/2678799534
> --> 10.51.228.33:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 --
> 0x7ff34c001880 con 0
> 2023-10-27 19:48:57.792722 7ff353fff700 1 -- 10.51.228.213:0/2678799534
> <== mon.0 10.51.228.32:6789/0 1 ==== mon_map magic: 0 v1 ==== 442+0+0
> (171445244 0 0) 0x7ff354001710 con 0x7ff36417f4e0
> stderr: 2023-10-27 19:48:57.792776 7ff353fff700 1 --
> 10.51.228.213:0/2678799534 <== mon.0 10.51.228.32:6789/0 2 ====
> auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (782387931 0 0)
> 0x7ff354001c10 con 0x7ff36417f4e0
> 2023-10-27 19:48:57.792832 7ff353fff700 1 -- 10.51.228.213:0/2678799534
> --> 10.51.228.32:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 --
> 0x7ff34c0035b0 con 0
> stderr: 2023-10-27 19:48:57.793541 7ff353fff700 1 --
> 10.51.228.213:0/2678799534 <== mon.1 10.51.228.33:6789/0 3 ====
> auth_reply(proto 2 0 (0) Success) v1 ==== 222+0+0 (713751175 0 0)
> 0x7ff3600022d0 con 0x7ff36417be80
> 2023-10-27 19:48:57.793740 7ff353fff700 1 -- 10.51.228.213:0/2678799534
> --> 10.51.228.33:6789/0 -- auth(proto 2 181 bytes epoch 0) v1 --
> 0x7ff34c0024d0 con 0
> 2023-10-27 19:48:57.793774 7ff353fff700 1 -- 10.51.228.213:0/2678799534
> <== mon.0 10.51.228.32:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1
> ==== 222+0+0 (1047601458 0 0) 0x7ff3540022d0 con 0x7ff36417f4e0
> stderr: 2023-10-27 19:48:57.793868 7ff353fff700 1 --
> 10.51.228.213:0/2678799534 --> 10.51.228.32:6789/0 -- auth(proto 2 181
> bytes epoch 0) v1 -- 0x7ff34c005f10 con 0
> stderr: 2023-10-27 19:48:57.794682 7ff353fff700 1 --
> 10.51.228.213:0/2678799534 <== mon.1 10.51.228.33:6789/0 4 ====
> auth_reply(proto 2 0 (0) Success) v1 ==== 612+0+0 (1210392221 0 0)
> 0x7ff360002cd0 con 0x7ff36417be80
> stderr: 2023-10-27 19:48:57.794875 7ff353fff700 1 --
> 10.51.228.213:0/2678799534 >> 10.51.228.32:6789/0 conn(0x7ff36417f4e0 :-1
> s=STATE_OPEN pgs=316443717 cs=1 l=1).mark_down
> 2023-10-27 19:48:57.794897 7ff353fff700 2 -- 10.51.228.213:0/2678799534 >>
> 10.51.228.32:6789/0 conn(0x7ff36417f4e0 :-1 s=STATE_OPEN pgs=316443717 cs=1
> l=1)._stop
> 2023-10-27 19:48:57.794955 7ff353fff700 1 -- 10.51.228.213:0/2678799534
> --> 10.51.228.33:6789/0 -- mon_subscribe({monmap=0+}) v2 -- 0x7ff364176570
> con 0
> 2023-10-27 19:48:57.795071 7ff36c135700 1 -- 10.51.228.213:0/2678799534
> --> 10.51.228.33:6789/0 -- mon_subscribe({mgrmap=0+}) v2 -- 0x7ff364176b70
> con 0
> stderr: 2023-10-27 19:48:57.795186 7ff36c135700 1 --
> 10.51.228.213:0/2678799534 --> 10.51.228.33:6789/0 --
> mon_subscribe({osdmap=0}) v2 -- 0x7ff3641843c0 con 0
> stderr: 2023-10-27 19:48:57.795919 7ff353fff700 1 --
> 10.51.228.213:0/2678799534 <== mon.1 10.51.228.33:6789/0 5 ==== mon_map
> magic: 0 v1 ==== 442+0+0 (171445244 0 0) 0x7ff3600032f0 con 0x7ff36417be80
> 2023-10-27 19:48:57.796020 7ff353fff700 1 -- 10.51.228.213:0/2678799534
> <== mon.1 10.51.228.33:6789/0 6 ==== mgrmap(e 255) v1 ==== 580+0+0
> (3748818868 0 0) 0x7ff3600037e0 con 0x7ff36417be80
> stderr: 2023-10-27 19:48:57.797732 7ff353fff700 1 --
> 10.51.228.213:0/2678799534 <== mon.1 10.51.228.33:6789/0 7 ====
> osd_map(1497788..1497788 src has 1496148..1497788) v3 ==== 383089+0+0
> (4062048124 0 0) 0x7ff34c0024d0 con 0x7ff36417be80
> stderr: 2023-10-27 19:48:57.797968 7ff36933e700 2 --
> 10.51.228.213:0/2678799534 >> 10.51.228.33:6800/5258 conn(0x7ff34c00ebb0
> :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=1)._process_connection got
> newly_acked_seq 0 vs out_seq 0
> stderr: 2023-10-27 19:48:57.804679 7ff36c135700 1 --
> 10.51.228.213:0/2678799534 --> 10.51.228.33:6789/0 --
> mon_command({"prefix": "get_command_descriptions"} v 0) v1 --
> 0x7ff364099090 con 0
> stderr: 2023-10-27 19:48:57.807820 7ff353fff700 1 --
> 10.51.228.213:0/2678799534 <== mon.1 10.51.228.33:6789/0 8 ====
> mon_command_ack([{"prefix": "get_command_descriptions"}]=0 v0) v1 ====
> 72+0+66166 (1092875540 0 105479317) 0x7ff360072010 con 0x7ff36417be80
> stderr: 2023-10-27 19:48:57.894128 7ff36c135700 1 --
> 10.51.228.213:0/2678799534 --> 10.51.228.33:6789/0 --
> mon_command({"prefix": "mon getmap"} v 0) v1 -- 0x7ff3640d9070 con 0
> stderr: 2023-10-27 19:48:57.894988 7ff353fff700 1 --
> 10.51.228.213:0/2678799534 <== mon.1 10.51.228.33:6789/0 9 ====
> mon_command_ack([{"prefix": "mon getmap"}]=0 got monmap epoch 4 v4) v1 ====
> 76+0+438 (3852220838 0 1414311087) 0x7ff360062000 con 0x7ff36417be80
> stderr: got monmap epoch 4
> stderr: 2023-10-27 19:48:57.899563 7ff36c135700 1 --
> 10.51.228.213:0/2678799534 >> 10.51.228.33:6800/5258 conn(0x7ff34c00ebb0
> :-1 s=STATE_OPEN pgs=915966 cs=1 l=1).mark_down
> 2023-10-27 19:48:57.899603 7ff36c135700 2 -- 10.51.228.213:0/2678799534 >>
> 10.51.228.33:6800/5258 conn(0x7ff34c00ebb0 :-1 s=STATE_OPEN pgs=915966 cs=1
> l=1)._stop
> 2023-10-27 19:48:57.899637 7ff36c135700 1 -- 10.51.228.213:0/2678799534 >>
> 10.51.228.33:6789/0 conn(0x7ff36417be80 :-1 s=STATE_OPEN pgs=494490687 cs=1
> l=1).mark_down
> 2023-10-27 19:48:57.899644 7ff36c135700 2 -- 10.51.228.213:0/2678799534 >>
> 10.51.228.33:6789/0 conn(0x7ff36417be80 :-1 s=STATE_OPEN pgs=494490687 cs=1
> l=1)._stop
> stderr: 2023-10-27 19:48:57.900080 7ff36c135700 1 --
> 10.51.228.213:0/2678799534 shutdown_connections
> stderr: 2023-10-27 19:48:57.900797 7ff36c135700 1 --
> 10.51.228.213:0/2678799534 shutdown_connections
> stderr: 2023-10-27 19:48:57.901041 7ff36c135700 1 --
> 10.51.228.213:0/2678799534 wait complete.
> 2023-10-27 19:48:57.901079 7ff36c135700 1 -- 10.51.228.213:0/2678799534 >>
> 10.51.228.213:0/2678799534 conn(0x7ff3641698a0 :-1 s=STATE_NONE pgs=0 cs=0
> l=0).mark_down
> 2023-10-27 19:48:57.901090 7ff36c135700 2 -- 10.51.228.213:0/2678799534 >>
> 10.51.228.213:0/2678799534 conn(0x7ff3641698a0 :-1 s=STATE_NONE pgs=0 cs=0
> l=0)._stop
> Running command: ceph-authtool /var/lib/ceph/osd/ceph-301/keyring
> --create-keyring --name osd.301 --add-key
> AQAnFDxlSUVdERAAvv6P4q/MGml/tPx9Kka77w==
> stdout: creating /var/lib/ceph/osd/ceph-301/keyring
> added entity osd.301 auth auth(auid = 18446744073709551615
> key=AQAnFDxlSUVdERAAvv6P4q/MGml/tPx9Kka77w== with 0 caps)
> Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-301/keyring
> Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-301/
> Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore
> bluestore --mkfs -i 301 --monmap /var/lib/ceph/osd/ceph-301/activate.monmap
> --keyfile - --osd-data /var/lib/ceph/osd/ceph-301/ --osd-uuid
> 21e9a327-ada5-4734-ab5d-7be333d4f3cf --setuser ceph --setgroup ceph
> --> ceph-volume lvm prepare successful for: /dev/sda
> Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-301
> Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev
> /dev/ceph-81236ab2-f6e0-4cc3-9815-95c8dd16c6ef/osd-block-21e9a327-ada5-4734-ab5d-7be333d4f3cf
> --path /var/lib/ceph/osd/ceph-301
> Running command: ln -snf
> /dev/ceph-81236ab2-f6e0-4cc3-9815-95c8dd16c6ef/osd-block-21e9a327-ada5-4734-ab5d-7be333d4f3cf
> /var/lib/ceph/osd/ceph-301/block
> Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-301/block
> Running command: chown -R ceph:ceph /dev/dm-0
> Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-301
> Running command: systemctl enable
> ceph-volume@lvm-301-21e9a327-ada5-4734-ab5d-7be333d4f3cf
> stderr: Created symlink
> /etc/systemd/system/multi-user.target.wants/ceph-volume(a)lvm-301-21e9a327-ada5-4734-ab5d-7be333d4f3cf.service
> → /lib/systemd/system/ceph-volume@.service.
> Running command: systemctl enable --runtime ceph-osd@301
> Running command: systemctl start ceph-osd@301
> --> ceph-volume lvm activate successful for osd ID: 301
> --> ceph-volume lvm create successful for: /dev/sda
> [root@dra1361 ~]#
>
>
> Log file of OSD 301 (modified the IP address for security reasons):
>
>
> 2023-10-27 20:02:25.597254 7fbe3eee5e40 0 osd.301 0 crush map has features
> 288232575208783872, adjusting msgr requires for osds
> 2023-10-27 20:02:25.597293 7fbe3eee5e40 0 osd.301 0 load_pgs
> 2023-10-27 20:02:25.597301 7fbe3eee5e40 0 osd.301 0 load_pgs opened 0 pgs
> 2023-10-27 20:02:25.597303 7fbe3eee5e40 2 osd.301 0 superblock: I am
> osd.301
> 2023-10-27 20:02:25.597304 7fbe3eee5e40 0 osd.301 0 using weightedpriority
> op queue with priority op cut off at 64.
> 2023-10-27 20:02:25.597390 7fbe3eee5e40 1 Processor -- start
> 2023-10-27 20:02:25.597594 7fbe3eee5e40 1 Processor -- start
> 2023-10-27 20:02:25.597933 7fbe3eee5e40 1 Processor -- start
> 2023-10-27 20:02:25.598011 7fbe3eee5e40 1 Processor -- start
> 2023-10-27 20:02:25.598086 7fbe3eee5e40 1 Processor -- start
> 2023-10-27 20:02:25.598224 7fbe3eee5e40 1 Processor -- start
> 2023-10-27 20:02:25.598467 7fbe3eee5e40 1 Processor -- start
> 2023-10-27 20:02:25.598833 7fbe3eee5e40 -1 osd.301 0 log_to_monitors
> {default=true}
> 2023-10-27 20:02:25.599904 7fbe3eee5e40 1 -- 10.10.21.213:6800/20559 -->
> 10.10.21.32:6789/0 -- auth(proto 0 28 bytes epoch 0) v1 -- 0x55cf49fcd180
> con 0
> 2023-10-27 20:02:25.599922 7fbe3eee5e40 1 -- 10.10.21.213:6800/20559 -->
> 10.10.21.33:6789/0 -- auth(proto 0 28 bytes epoch 0) v1 -- 0x55cf49fcd400
> con 0
> 2023-10-27 20:02:25.601479 7fbe3de99700 2 -- 10.10.21.213:6800/20559 >>
> 10.10.21.33:6789/0 conn(0x55cf4a291800 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ
> pgs=0 cs=0 l=1)._process_connection got newly_acked_seq 0 vs out_seq 0
> 2023-10-27 20:02:25.601712 7fbe3d698700 2 -- 10.10.21.213:6800/20559 >>
> 10.10.21.32:6789/0 conn(0x55cf4a293000 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ
> pgs=0 cs=0 l=1)._process_connection got newly_acked_seq 0 vs out_seq 0
> 2023-10-27 20:02:25.602181 7fbe3508b700 1 -- 10.10.21.213:6800/20559 <==
> mon.1 10.10.21.33:6789/0 1 ==== mon_map magic: 0 v1 ==== 442+0+0 (171445244
> 0 0) 0x55cf4a29efc0 con 0x55cf4a291800
> 2023-10-27 20:02:25.602305 7fbe3508b700 1 -- 10.10.21.213:6800/20559 <==
> mon.1 10.10.21.33:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ====
> 33+0+0 (1293857214 0 0) 0x55cf49fcd400 con 0x55cf4a291800
> 2023-10-27 20:02:25.602475 7fbe3508b700 1 -- 10.10.21.213:6800/20559 -->
> 10.10.21.33:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x55cf49fcd900
> con 0
> 2023-10-27 20:02:25.602507 7fbe3508b700 1 -- 10.10.21.213:6800/20559 <==
> mon.0 10.10.21.32:6789/0 1 ==== mon_map magic: 0 v1 ==== 442+0+0 (171445244
> 0 0) 0x55cf4a29efc0 con 0x55cf4a293000
> 2023-10-27 20:02:25.602557 7fbe3508b700 1 -- 10.10.21.213:6800/20559 <==
> mon.0 10.10.21.32:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ====
> 33+0+0 (2698054748 0 0) 0x55cf49fcd180 con 0x55cf4a293000
> 2023-10-27 20:02:25.602627 7fbe3508b700 1 -- 10.10.21.213:6800/20559 -->
> 10.10.21.32:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x55cf49fcd400
> con 0
> 2023-10-27 20:02:25.603288 7fbe3508b700 1 -- 10.10.21.213:6800/20559 <==
> mon.1 10.10.21.33:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ====
> 206+0+0 (3831096342 0 0) 0x55cf49fcd900 con 0x55cf4a291800
> 2023-10-27 20:02:25.603488 7fbe3508b700 1 -- 10.10.21.213:6800/20559 -->
> 10.10.21.33:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x55cf49fcd180
> con 0
> 2023-10-27 20:02:25.603707 7fbe3508b700 1 -- 10.10.21.213:6800/20559 <==
> mon.0 10.10.21.32:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ====
> 206+0+0 (1800961675 0 0) 0x55cf49fcd400 con 0x55cf4a293000
> 2023-10-27 20:02:25.603889 7fbe3508b700 1 -- 10.10.21.213:6800/20559 -->
> 10.10.21.32:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x55cf49fcd900
> con 0
> 2023-10-27 20:02:25.604366 7fbe3508b700 1 -- 10.10.21.213:6800/20559 <==
> mon.1 10.10.21.33:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ====
> 596+0+0 (1171153923 0 0) 0x55cf49fcd180 con 0x55cf4a291800
> 2023-10-27 20:02:25.604560 7fbe3508b700 1 -- 10.10.21.213:6800/20559 >>
> 10.10.21.32:6789/0 conn(0x55cf4a293000 :-1 s=STATE_OPEN pgs=316446787 cs=1
> l=1).mark_down
> 2023-10-27 20:02:25.604572 7fbe3508b700 2 -- 10.10.21.213:6800/20559 >>
> 10.10.21.32:6789/0 conn(0x55cf4a293000 :-1 s=STATE_OPEN pgs=316446787 cs=1
> l=1)._stop
> 2023-10-27 20:02:25.604622 7fbe3508b700 1 monclient: found mon.or1dra1301
> 2023-10-27 20:02:25.604663 7fbe3508b700 1 -- 10.10.21.213:6800/20559 -->
> 10.10.21.33:6789/0 -- mon_subscribe({monmap=0+}) v2 -- 0x55cf49dc9680 con 0
> 2023-10-27 20:02:25.604712 7fbe3508b700 1 -- 10.10.21.213:6800/20559 -->
> 10.10.21.33:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- 0x55cf49fcd400
> con 0
> 2023-10-27 20:02:25.604792 7fbe3eee5e40 5 monclient: authenticate success,
> global_id 1528606696
> 2023-10-27 20:02:25.605306 7fbe3508b700 1 -- 10.10.21.213:6800/20559 <==
> mon.1 10.10.21.33:6789/0 5 ==== mon_map magic: 0 v1 ==== 442+0+0 (171445244
> 0 0) 0x55cf4a29f200 con 0x55cf4a291800
> 2023-10-27 20:02:25.605398 7fbe3508b700 1 -- 10.10.21.213:6800/20559 <==
> mon.1 10.10.21.33:6789/0 6 ==== auth_reply(proto 2 0 (0) Success) v1 ====
> 194+0+0 (2618733096 0 0) 0x55cf49fcd400 con 0x55cf4a291800
> 2023-10-27 20:02:25.605807 7fbe3eee5e40 1 -- 10.10.21.213:6800/20559 -->
> 10.10.21.33:6789/0 -- mon_command({"prefix": "osd crush set-device-class",
> "class": "ssd", "ids": ["301"]} v 0) v1 -- 0x55cf49dc98c0 con 0
> 2023-10-27 20:02:25.608362 7fbe3508b700 1 -- 10.10.21.213:6800/20559 <==
> mon.1 10.10.21.33:6789/0 7 ==== mon_command_ack([{"prefix": "osd crush
> set-device-class", "class": "ssd", "ids": ["301"]}]=0 osd.301 already set
> to class ssdset-device-class item id 301 name 'osd.301' device_class 'ssd':
> no change v1497811) v1 ==== 211+0+0 (3030640430 0 0) 0x55cf49dc98c0 con
> 0x55cf4a291800
> 2023-10-27 20:02:25.608668 7fbe3eee5e40 1 -- 10.10.21.213:6800/20559 -->
> 10.10.21.33:6789/0 -- mon_command({"prefix": "osd crush create-or-move",
> "id": 301, "weight":3.4931, "args": ["host=or1dra1361", "root=default"]} v
> 0) v1 -- 0x55cf49dc9b00 con 0
> 2023-10-27 20:02:25.611784 7fbe3508b700 1 -- 10.10.21.213:6800/20559 <==
> mon.1 10.10.21.33:6789/0 8 ==== mon_command_ack([{"prefix": "osd crush
> create-or-move", "id": 301, "weight":3.4931, "args": ["host=or1dra1361",
> "root=default"]}]=-34 (34) Numerical result out of range v1497811) v1 ====
> 179+0+0 (1380436622 0 0) 0x55cf49dc9b00 con 0x55cf4a291800
> 2023-10-27 20:02:25.612011 7fbe3eee5e40 -1 osd.301 0
> mon_cmd_maybe_osd_create fail: '(34) Numerical result out of range': (34)
> Numerical result out of range
> 2023-10-27 20:02:25.612070 7fbe3eee5e40 -1 osd.301 0 init unable to
> update_crush_location: (34) Numerical result out of range
> [root@dra1361 /var/log/ceph]#
>
>
>
> Thanks,
> Pardhiv Karri
>
>
>
>
>
>
> On Fri, Oct 27, 2023 at 7:00 AM <ceph-users-request(a)ceph.io> wrote:
>
>> Send ceph-users mailing list submissions to
>> ceph-users(a)ceph.io
>>
>> To subscribe or unsubscribe via email, send a message with subject or
>> body 'help' to
>> ceph-users-request(a)ceph.io
>>
>> You can reach the person managing the list at
>> ceph-users-owner(a)ceph.io
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of ceph-users digest..."
>>
>> Today's Topics:
>>
>> 1. Re: Ceph - Error ERANGE: (34) Numerical result out of range
>> (Eugen Block)
>> 2. Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD
>> (Patrick Begou)
>> 3. Re: [ext] CephFS pool not releasing space after data deletion
>> (Kuhring, Mathias)
>> 4. Re: "cephadm version" in reef returns "AttributeError:
>> 'CephadmContext' object has no attribute 'fsid'"
>> (John Mulligan)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Date: Fri, 27 Oct 2023 11:56:38 +0000
>> From: Eugen Block <eblock(a)nde.ag>
>> Subject: [ceph-users] Re: Ceph - Error ERANGE: (34) Numerical result
>> out of range
>> To: ceph-users(a)ceph.io
>> Message-ID:
>> <20231027115638.Horde.48HDZ8Azsv-ho_0pQNe0p-s(a)webmail.nde.ag>
>> Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes
>>
>> Hi,
>>
>> please provide more information about your cluster, like 'ceph -s',
>> 'ceph osd tree' and the exact procedure you used to create the OSDs.
>> From your last post it seems like the OSD creation failed and this
>> might be just a consequence of that? Do you have the logs from the OSD
>> creation as well? Not just the logs from the failing OSD start.
>>
>> Thanks,
>> Eugen
>>
>> Zitat von Pardhiv Karri <meher4india(a)gmail.com>:
>>
>> > Hi,
>> > Trying to move a node/host under a new SSD root and getting below error.
>> > Has anyone seen it and know the fix? the pg_num and pgp_num are same for
>> > all pools so that is not the issue.
>> >
>> > [root@hbmon1 ~]# ceph osd crush move hbssdhost1 root=ssd
>> > Error ERANGE: (34) Numerical result out of range
>> > [root@hbmon1 ~]#
>> >
>> > Thanks,
>> > Pardhiv
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users(a)ceph.io
>> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
>>
>>
>> ------------------------------
>>
>> Date: Fri, 27 Oct 2023 15:35:37 +0200
>> From: Patrick Begou <Patrick.Begou(a)univ-grenoble-alpes.fr>
>> Subject: [ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do
>> not returns any HDD
>> To: ceph-users(a)ceph.io
>> Message-ID:
>> <1fe5ff89-5b96-20e6-988d-ab2dd514ca2f(a)univ-grenoble-alpes.fr>
>> Content-Type: text/plain; charset=UTF-8; format=flowed
>>
>> Hi all,
>>
>> First of all I apologize if I've not done things correctly but these are
>> some tests results.
>>
>> 1) I've compiled the main branch in a fresh podman container (Alma Linux
>> 8) and installed. Successfull!
>> 2) I have done a copy of the /etc/ceph directory of the host (member of
>> the ceph cluster in Pacific 16.2.14) in this container (good or bad idea ?)
>> 3) "ceph-volume inventory" works but with some error messages:
>>
>> [root@74285dcfa91f etc]# ceph-volume inventory
>> stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or
>> /sys expected.
>> stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or
>> /sys expected.
>> stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or
>> /sys expected.
>> stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or
>> /sys expected.
>> stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or
>> /sys expected.
>>
>> Device Path Size Device nodes rotates available
>> Model name
>> /dev/sdc 232.83 GB sdc True True
>> SAMSUNG HE253GJ
>> /dev/sda 232.83 GB sda True False
>> SAMSUNG HE253GJ
>> /dev/sdb 465.76 GB sdb True False
>> WDC WD5003ABYX-1
>> 4) ceph version show:
>> [root@74285dcfa91f etc]# ceph -v
>> ceph version 18.0.0-6846-g2706ecac4a9
>> (2706ecac4a90447420904e42d6e0445134dff2be) reef (dev)
>>
>>
>> 5) lsblk works (container launched with "--privileged" flag)
>> [root@74285dcfa91f etc]# lsblk
>> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
>> sda 8:0 1 232.9G 0 disk
>> |-sda1 8:1 3.9G 0 part
>> |-sda2 8:2 1 3.9G 0 part [SWAP]
>> `-sda3 8:3 1 225G 0 part
>> sdb 8:16 1 465.8G 0 disk
>> sdc 8:32 1 232.9G 0 disk
>>
>> But some commands do not works (my setup or ceph ?)
>>
>> [root@74285dcfa91f etc]# ceph orch device zap
>> mostha1.legi.grenoble-inp.fr /dev/sdc --force
>> Error EINVAL: Device path '/dev/sdc' not found on host
>> 'mostha1.legi.grenoble-inp.fr'
>> [root@74285dcfa91f etc]#
>>
>> [root@74285dcfa91f etc]# ceph orch device ls
>> [root@74285dcfa91f etc]#
>>
>> Patrick
>>
>>
>> Le 24/10/2023 à 22:43, Zack Cerza a écrit :
>> > That's correct - it's the removable flag that's causing the disks to
>> > be excluded.
>> >
>> > I actually just merged this PR last week:
>> > https://github.com/ceph/ceph/pull/49954
>> >
>> > One of the changes it made was to enable removable (but not USB)
>> > devices, as there are vendors that report hot-swappable drives as
>> > removable. Patrick, it looks like this may resolve your issue as well.
>> >
>> >
>> > On Tue, Oct 24, 2023 at 5:57 AM Eugen Block <eblock(a)nde.ag> wrote:
>> >> Hi,
>> >>
>> >>> May be because they are hot-swappable hard drives.
>> >> yes, that's my assumption as well.
>> >>
>> >>
>> >> Zitat von Patrick Begou <Patrick.Begou(a)univ-grenoble-alpes.fr>:
>> >>
>> >>> Hi Eugen,
>> >>>
>> >>> Yes Eugen, all the devices /dev/sd[abc] have the removable flag set
>> >>> to 1. May be because they are hot-swappable hard drives.
>> >>>
>> >>> I have contacted the commit author Zack Cerza and he asked me for
>> >>> some additional tests too this morning. I add him in copy to this
>> >>> mail.
>> >>>
>> >>> Patrick
>> >>>
>> >>> Le 24/10/2023 à 12:57, Eugen Block a écrit :
>> >>>> Hi,
>> >>>>
>> >>>> just to confirm, could you check that the disk which is *not*
>> >>>> discovered by 16.2.11 has a "removable" flag?
>> >>>>
>> >>>> cat /sys/block/sdX/removable
>> >>>>
>> >>>> I could reproduce it as well on a test machine with a USB thumb
>> >>>> drive (live distro) which is excluded in 16.2.11 but is shown in
>> >>>> 16.2.10. Although I'm not a developer I tried to understand what
>> >>>> changes were made in
>> >>>>
>> https://github.com/ceph/ceph/pull/46375/files#diff-330f9319b0fe352dff0486f6…
>> and there's this
>> >>>> line:
>> >>>>
>> >>>>> if get_file_contents(os.path.join(_sys_block_path, dev,
>> >>>>> 'removable')) == "1":
>> >>>>> continue
>> >>>> The thumb drive is removable, of course, apparently that is filtered
>> here.
>> >>>>
>> >>>> Regards,
>> >>>> Eugen
>> >>>>
>> >>>> Zitat von Patrick Begou <Patrick.Begou(a)univ-grenoble-alpes.fr>:
>> >>>>
>> >>>>> Le 23/10/2023 à 03:04, 544463199(a)qq.com a écrit :
>> >>>>>> I think you can try to roll back this part of the python code and
>> >>>>>> wait for your good news :)
>> >>>>>
>> >>>>> Not so easy 😕
>> >>>>>
>> >>>>>
>> >>>>> [root@e9865d9a7f41 ceph]# git revert
>> >>>>> 4fc6bc394dffaf3ad375ff29cbb0a3eb9e4dbefc
>> >>>>> Auto-merging src/ceph-volume/ceph_volume/tests/util/test_device.py
>> >>>>> CONFLICT (content): Merge conflict in
>> >>>>> src/ceph-volume/ceph_volume/tests/util/test_device.py
>> >>>>> Auto-merging src/ceph-volume/ceph_volume/util/device.py
>> >>>>> CONFLICT (content): Merge conflict in
>> >>>>> src/ceph-volume/ceph_volume/util/device.py
>> >>>>> Auto-merging src/ceph-volume/ceph_volume/util/disk.py
>> >>>>> CONFLICT (content): Merge conflict in
>> >>>>> src/ceph-volume/ceph_volume/util/disk.py
>> >>>>> error: could not revert 4fc6bc394df... ceph-volume: Optionally
>> >>>>> consume loop devices
>> >>>>>
>> >>>>> Patrick
>> >>>>> _______________________________________________
>> >>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>> >>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> >>>>
>> >>>> _______________________________________________
>> >>>> ceph-users mailing list -- ceph-users(a)ceph.io
>> >>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> >>> _______________________________________________
>> >>> ceph-users mailing list -- ceph-users(a)ceph.io
>> >>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> >>
>> >>
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users(a)ceph.io
>> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
>>
>> ------------------------------
>>
>> Date: Fri, 27 Oct 2023 13:52:03 +0000
>> From: "Kuhring, Mathias" <mathias.kuhring(a)bih-charite.de>
>> Subject: [ceph-users] Re: [ext] CephFS pool not releasing space after
>> data deletion
>> To: "ceph-users(a)ceph.io" <ceph-users(a)ceph.io>, "frans(a)dtu.dk"
>> <frans(a)dtu.dk>
>> Message-ID: <a5bb6a0a-aab5-402d-8ee3-68eccabb7b6b(a)bih-charite.de>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Dear ceph users,
>>
>> We are wondering, if this might be the same issue as with this bug:
>> https://tracker.ceph.com/issues/52581
>>
>> Except that we seem to have been snapshots dangling on the old pool.
>> And the bug report snapshots dangling on the new pool.
>> But maybe it's both?
>>
>> I mean, once the global root layout was created to a new pool,
>> the new pool became in charge for snapshooting at least of new data, right?
>> What about data which is overwritten? Is there a conflict of
>> responsibility?
>>
>> We do have similar listings of snaps with "ceph osd pool ls detail", I
>> think:
>>
>> 0|0[root@osd-1 ~]# ceph osd pool ls detail | grep -B 1 removed_snaps_queue
>> pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 1
>> object_hash rjenkins pg_num 115 pgp_num 107 pg_num_target 32
>> pgp_num_target 32 autoscale_mode on last_change 803558 lfor
>> 0/803250/803248 flags hashpspool,selfmanaged_snaps stripe_width 0
>> expected_num_objects 1 application cephfs
>> removed_snaps_queue
>>
>> [3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1]
>> --
>> pool 3 'hdd_ec' erasure profile hdd_ec size 3 min_size 2 crush_rule 3
>> object_hash rjenkins pg_num 2048 pgp_num 2048 autoscale_mode off
>> last_change 803558 lfor 0/87229/87229 flags
>> hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 8192 application
>> cephfs
>> removed_snaps_queue
>>
>> [3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1]
>> --
>> pool 20 'hdd_ec_8_2_pool' erasure profile hdd_ec_8_2_profile size 10
>> min_size 9 crush_rule 5 object_hash rjenkins pg_num 8192 pgp_num 8192
>> autoscale_mode off last_change 803558 lfor 0/0/681917 flags
>> hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 32768
>> application cephfs
>> removed_snaps_queue
>>
>> [3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1]
>>
>>
>> Here, pool hdd_ec_8_2_pool is the one we recently assigned to the root
>> layout.
>> Pool hdd_ec is the one which was assigned before and which won't release
>> space (at least where I know of).
>>
>> Is this removed_snaps_queue the same as removed_snaps in the bug issue
>> (i.e. the label was renamed)?
>> And is it normal that all queues list the same info or should this be
>> different per pool?
>> Might this be related to pools having now share responsibility over some
>> snaps due to layout changes?
>>
>> And for the big question:
>> How can I actually trigger/speedup the removal of those snaps?
>> I find the removed_snaps/removed_snaps_queue mentioned a few times in
>> the user list.
>> But never with some conclusive answer how to deal with them.
>> And the only mentions in the docs are just change logs.
>>
>> I also looked into and started cephfs stray scrubbing:
>>
>> https://docs.ceph.com/en/latest/cephfs/scrub/#evaluate-strays-using-recursi…
>> But according to the status output, no scrubbing is actually active.
>>
>> I would appreciate any further ideas. Thanks a lot.
>>
>> Best Wishes,
>> Mathias
>>
>> On 10/23/2023 12:42 PM, Kuhring, Mathias wrote:
>> > Dear Ceph users,
>> >
>> > Our CephFS is not releasing/freeing up space after deleting hundreds of
>> > terabytes of data.
>> > By now, this drives us in a "nearfull" osd/pool situation and thus
>> > throttles IO.
>> >
>> > We are on ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5)
>> > quincy (stable).
>> >
>> > Recently, we moved a bunch of data to a new pool with better EC.
>> > This was done by adding a new EC pool to the FS.
>> > Then assigning the FS root to the new EC pool via the directory layout
>> xattr
>> > (so all new data is written to the new pool).
>> > And finally copying old data to new folders.
>> >
>> > I swapped the data as follows to remain the old directory structures.
>> > I also made snapshots for validation purposes.
>> >
>> > So basically:
>> > cp -r mymount/mydata/ mymount/new/ # this creates copy on new pool
>> > mkdir mymount/mydata/.snap/tovalidate
>> > mkdir mymount/new/mydata/.snap/tovalidate
>> > mv mymount/mydata/ mymount/old/
>> > mv mymount/new/mydata mymount/
>> >
>> > I could see the increase of data in the new pool as expected (ceph df).
>> > I compared the snapshots with hashdeep to make sure the new data is
>> alright.
>> >
>> > Then I went ahead deleting the old data, basically:
>> > rmdir mymount/old/mydata/.snap/* # this also included a bunch of other
>> > older snapshots
>> > rm -r mymount/old/mydata
>> >
>> > At first we had a bunch of PGs with snaptrim/snaptrim_wait.
>> > But they are done for quite some time now.
>> > And now, already two weeks later the size of the old pool still hasn't
>> > really decreased.
>> > I'm still waiting for around 500 TB to be released (and much more is
>> > planned).
>> >
>> > I honestly have no clue, where to go from here.
>> > From my point of view (i.e. the CephFS mount), the data is gone.
>> > I also never hard/soft-linked it anywhere.
>> >
>> > This doesn't seem to be a regular issue.
>> > At least I couldn't find anything related or resolved in the docs or
>> > user list, yet.
>> > If anybody has an idea how to resolve this, I would highly appreciate it.
>> >
>> > Best Wishes,
>> > Mathias
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users(a)ceph.io
>> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
>> --
>> Mathias Kuhring
>>
>> Dr. rer. nat.
>> Bioinformatician
>> HPC & Core Unit Bioinformatics
>> Berlin Institute of Health at Charité (BIH)
>>
>> E-Mail: mathias.kuhring(a)bih-charite.de
>> Mobile: +49 172 3475576
>>
>>
>> ------------------------------
>>
>> Date: Fri, 27 Oct 2023 09:57:31 -0400
>> From: John Mulligan <phlogistonjohn(a)asynchrono.us>
>> Subject: [ceph-users] Re: "cephadm version" in reef returns
>> "AttributeError: 'CephadmContext' object has no attribute 'fsid'"
>> To: ceph-users(a)ceph.io
>> Message-ID:
>> <
>> 7411686.LvFx2qVVIh(a)li-241d88cc-27c5-11b2-a85c-c640472b3c85.ibm.com>
>> Content-Type: text/plain; charset="us-ascii"
>>
>> On Friday, October 27, 2023 2:40:17 AM EDT Eugen Block wrote:
>> > Are the issues you refer to the same as before? I don't think this
>> > version issue is the root cause, I do see it as well in my test
>> > cluster(s) but the rest works properly except for the tag issue I
>> > already reported which you can easily fix by setting the config value
>> > for the default image
>> > (
>> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LASBJCSPFGD
>> > YAWPVE2YLV2ZLF3HC5SLS/#LASBJCSPFGDYAWPVE2YLV2ZLF3HC5SLS). Or are there
>> new
>> > issues you encountered?
>>
>>
>> I concur. That `cephadm version` failure is/was a known issue but should
>> not
>> be the cause of any other issues. On the main branch `cephadm version` no
>> longer fails this way - rather, it reports the version of a cephadm build
>> and
>> no longer inspects a container image. We can look into backporting this
>> before the next reef release.
>>
>> The issue related to the container image tag that Eugen filed has also
>> been
>> fixed on reef. Thanks for filing that.
>>
>> Martin you may want to retry things after the next reef release.
>> Unfortunately, I don't know when that is planned but I think it's soonish.
>>
>> >
>> > Zitat von Martin Conway <martin.conway(a)anu.edu.au>:
>> > > I just had another look through the issues tracker and found this
>> > > bug already listed.
>> > > https://tracker.ceph.com/issues/59428
>> > >
>> > > I need to go back to the other issues I am having and figure out if
>> > > they are related or something different.
>> > >
>> > >
>> > > Hi
>> > >
>> > > I wrote before about issues I was having with cephadm in 18.2.0
>> > > Sorry, I didn't see the helpful replies because my mail service
>> > > binned the responses.
>> > >
>> > > I still can't get the reef version of cephadm to work properly.
>> > >
>> > > I had updated the system rpm to reef (ceph repo) and also upgraded
>> > > the containerised ceph daemons to reef before my first email.
>> > >
>> > > Both the system package cephadm and the one found at
>> > > /var/lib/ceph/${fsid}/cephadm.* return the same error when running
>> > > "cephadm version"
>> > >
>> > > Traceback (most recent call last):
>> > > File
>> > >
>> > >
>> "./cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4
>> > > e", line 9468, in <module>
>> > >
>> > > main()
>> > >
>> > > File
>> > >
>> > >
>> "./cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4
>> > > e", line 9456, in main
>> > >
>> > > r = ctx.func(ctx)
>> > >
>> > > File
>> > >
>> > >
>> "./cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4
>> > > e", line 2108, in _infer_image
>> > >
>> > > ctx.image = infer_local_ceph_image(ctx, ctx.container_engine.path)
>> > >
>> > > File
>> > >
>> > >
>> "./cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4
>> > > e", line 2191, in infer_local_ceph_image
>> > >
>> > > container_info = get_container_info(ctx, daemon, daemon_name is not
>> > > None)
>> > >
>> > > File
>> > >
>> > >
>> "./cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4
>> > > e", line 2154, in get_container_info
>> > >
>> > > matching_daemons = [d for d in daemons if daemon_name_or_type(d)
>> > >
>> > > == daemon_filter and d['fsid'] == ctx.fsid]
>> > >
>> > > File
>> > >
>> > >
>> "./cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4
>> > > e", line 2154, in <listcomp>
>> > >
>> > > matching_daemons = [d for d in daemons if daemon_name_or_type(d)
>> > >
>> > > == daemon_filter and d['fsid'] == ctx.fsid]
>> > >
>> > > File
>> > >
>> > >
>> "./cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4
>> > > e", line 217, in __getattr__
>> > >
>> > > return super().__getattribute__(name)
>> > >
>> > > AttributeError: 'CephadmContext' object has no attribute 'fsid'
>> > >
>> > > _______________________________________________
>> > > ceph-users mailing list -- ceph-users(a)ceph.io
>> > > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> >
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users(a)ceph.io
>> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
>>
>>
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>
>>
>> ------------------------------
>>
>> End of ceph-users Digest, Vol 112, Issue 119
>> ********************************************
>>
>
>
> --
> *Pardhiv Karri*
> "Rise and Rise again until LAMBS become LIONS"
I'm trying to upgrade our 3-monitor cluster from Centos 7 and Nautilus to
Rocky 9 and Quincy. This has been a very slow process of upgrading one
thing, running the cluster for a while, then upgrading the next thing. I
first upgraded to the last Centos 7 and upgraded to Octopus. That worked
fine. Then I was going to upgrade the OS to Rocky 9 while staying on
Octopus, but then found out that Octopus is not available for Rocky 9. So I
broke my own rule and upgraded one of the monitor (and manager) nodes to
Rocky 9 and Pacific, then rejoined it to the cluster. That seemed to work
just fine. Feeling bold, I upgraded the second monitor and manager node to
Rocky 9 and Pacific. That also seemed to work fine, with the cluster
showing all the monitors and managers running. But now, if I shut down the
last "Octopus" monitor, the cluster becomes unresponsive. This only happens
when I shut down the Octopus monitor. If I shut down one of the Pacific
monitors, the cluster keeps responding with the expected:
"HEALTH_WARN 1/3 mons down"
and then goes back to normal when the monitor process is started again.
Is this expected? What am I missing? Thanks for any pointers!
Hi Ceph users and developers,
You are invited to join us at the User + Dev meeting tomorrow at 10:00 AM
EST! See below for more meeting details.
We have two guest speakers joining us tomorrow:
1. "CRUSH Changes at Scale" by Joshua Baergen, Digital Ocean
In this talk, Joshua Baergen will discuss the problems that operators
encounter with CRUSH changes at scale and how DigitalOcean built
pg-remapper to control and speed up CRUSH-induced backfill.
2. "CephFS Management with Ceph Dashboard" by Pedro Gonzalez Gomez, IBM
This talk will demonstrate new Dashboard behavior regarding CephFS
management.
The last part of the meeting will be dedicated to open discussion. Feel
free to add questions for the speakers or additional topics under the "Open
Discussion" section on the agenda:
https://pad.ceph.com/p/ceph-user-dev-monthly-minutes
If you have an idea for a focus topic you'd like to present at a future
meeting, you are welcome to submit it to this Google Form:
https://docs.google.com/forms/d/e/1FAIpQLSdboBhxVoBZoaHm8xSmeBoemuXoV_rmh4v…
Any Ceph user or developer is eligible to submit!
Thanks,
Laura Flores
Meeting link: https://meet.jit.si/ceph-user-dev-monthly
Time conversions:
UTC: Thursday, October 19, 14:00 UTC
Mountain View, CA, US: Thursday, October 19, 7:00 PDT
Phoenix, AZ, US: Thursday, October 19, 7:00 MST
Denver, CO, US: Thursday, October 19, 8:00 MDT
Huntsville, AL, US: Thursday, October 19, 9:00 CDT
Raleigh, NC, US: Thursday, October 19, 10:00 EDT
London, England: Thursday, October 19, 15:00 BST
Paris, France: Thursday, October 19, 16:00 CEST
Helsinki, Finland: Thursday, October 19, 17:00 EEST
Tel Aviv, Israel: Thursday, October 19, 17:00 IDT
Pune, India: Thursday, October 19, 19:30 IST
Brisbane, Australia: Friday, October 20, 0:00 AEST
Singapore, Asia: Thursday, October 19, 22:00 +08
Auckland, New Zealand: Friday, October 20, 3:00 NZDT
--
Laura Flores
She/Her/Hers
Software Engineer, Ceph Storage <https://ceph.io>
Chicago, IL
lflores(a)ibm.com | lflores(a)redhat.com <lflores(a)redhat.com>
M: +17087388804