[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

26 May 2023

Patrick,

I can only say that I would not expect a specific problem due to your 
hardware. Upgrading the firmware is generally a good idea but I wouldn't 
expect it helps in your case if the osk (lsblk) sees the disk.

As for starting with octopus I don't know if it will help... But we are 
also using the same os as you (centos stream in fact but basically the 
same). We have been running octopus (with cephadm) on this os version 
without problem and upgraded since then in pacific and quincy. In fact on 
of our cluster started in infernetis, the other in luminous and they have 
been upgraded without problems since then...

Michel
Sent from my mobile
Le 26 mai 2023 18:34:22 Patrick Begou 
&lt;Patrick.Begou(a)univ-grenoble-alpes.fr&gt; a écrit :
...
  Hi Michel,
 I do not notice anything strange in the logs files (looking for errors or 
 warnings).
 The hardware is a DELL C6100 sled (from 2011) running Alma Linux8 
 up-to-date. It uses 3 sata disks.
 Is there a way to force osd installation by hand with providing the device 
 /dev/sdc  for example ? A "do what I say" approach...
 Is it a good try to deploy Octopus on the nodes, configure the osd (even if 
 podman 4.2.0 is not validated for Octopus)  and then upgrade to Pacific? 
 Could this be a workaround for this sort of regression from Octopus to 
 Pacific ?
 May be updating the BIOS from 1.7.1 to 1.8.1 ?

 All this is a little bit confusing for me as I'm trying to discover Ceph 😁
 Thanks
 Patrick

 Le 26/05/2023 à 17:19, Michel Jouvin a écrit :
> Hi Patrick,
>
> It is weird, we have a couple of clusters with cephadm and running pacify 
> or quincy and ceph orch device works well. Have you looked at the cephadm 
> logs (ceph log last cephadm)?
>
> Except if you are using a very specific hardware, I suspect Ceph is 
> suffering of a problem outside it...
>
> Cheers,
>
> Michel
> Sent from my mobile
>
> Le 26 mai 2023 17:02:50 Patrick Begou 
> &lt;Patrick.Begou(a)univ-grenoble-alpes.fr&gt; a écrit :
>
>> Hi,
>>
>> I'm back working on this problem.
>>
>> First of all, I saw that I had a hardware memory error so I had to solve
>> this first. It's done.
>>
>> I've tested some different Ceph deployments, each time starting with a
>> full OS re-install (it requires some time for each test).
>>
>> Using Octopus, the devices are found:
>>
>> dnf -y install \
>>
https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noar…
>> monip=$(getent ahostsv4 mostha1 |head -n 1| awk '{ print $1 }'))
>> cephadm bootstrap --mon-ip $monip --initial-dashboard-password xxxxx \
>> --allow-fqdn-hostname
>>
>> [ceph: root@mostha1 /]# *ceph orch device ls*
>> Hostname                      Path      Type  Serial Size   Health
>> Ident  Fault  Available
>> mostha1.legi.grenoble-inp.fr  /dev/sda  hdd S2B5J90ZA02494    250G
>> Unknown  N/A    N/A    Yes
>> mostha1.legi.grenoble-inp.fr  /dev/sdc  hdd WD-WMAYP0982329   500G
>> Unknown  N/A    N/A    Yes
>>
>>
>> But with Pacific or Quincy the command returns nothing.
>>
>> With Pacific:
>>
>> dnf -y install \
>>
https://download.ceph.com/rpm-16.2.13/el8/noarch/cephadm-16.2.13-0.el8.noar…
>> monip=$(getent ahostsv4 mostha1 |head -n 1| awk '{ print $1 }')
>> cephadm bootstrap --mon-ip $monip --initial-dashboard-password xxxxx \
>> --allow-fqdn-hostname
>>
>>
>> "ceph orch device ls" doesn't return anything but "cephadm
shell lsmcli
>> ldl"  list all the devices.
>>
>> [ceph: root@mostha1 /]# *ceph orch device ls --wide*
>> [ceph: root@mostha1 /]# *lsblk*
>> NAME                 MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>> sda                    8:0    1 232.9G  0 disk
>> |-sda1                 8:1    1   3.9G  0 part /rootfs/boot
>> |-sda2                 8:2    1  78.1G  0 part
>> | `-osvg-rootvol     253:0    0  48.8G  0 lvm  /rootfs
>> |-sda3                 8:3    1   3.9G  0 part [SWAP]
>> `-sda4                 8:4    1 146.9G  0 part
>> |-secretvg-homevol 253:1    0   9.8G  0 lvm  /rootfs/home
>> |-secretvg-tmpvol  253:2    0   9.8G  0 lvm  /rootfs/tmp
>> `-secretvg-varvol  253:3    0   9.8G  0 lvm  /rootfs/var
>> sdb                    8:16   1 232.9G  0 disk
>> sdc                    8:32   1 465.8G  0 disk
>> [ceph: root@mostha1 /]# exit
>> [root@mostha1 ~]# *cephadm ceph-volume inventory*
>> Inferring fsid 2e3e85a8-fbcf-11ed-84e5-00266cf8869c
>> Using ceph image with id '0dc91bca92c2' and tag 'v17' created on
>> 2023-05-25 16:26:31 +0000 UTC
>>
quay.io/ceph/ceph@sha256:b8df01a568f4dec7bac6d5040f9391dcca14e00ec7f4de8a3dcf3f2a6502d3a9
>>
>> Device Path               Size         Device nodes    rotates
>> available Model name
>>
>> [root@mostha1 ~]# *cephadm shell lsmcli ldl*
>> Inferring fsid 4d54823c-fb05-11ed-aecf-00266cf8869c
>> Inferring config
>> /var/lib/ceph/4d54823c-fb05-11ed-aecf-00266cf8869c/mon.mostha1/config
>> Using ceph image with id 'c9a1062f7289' and tag 'v17' created on
>> 2023-04-25 16:04:33 +0000 UTC
>>
quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
>> Path     | SCSI VPD 0x83    | Link Type | Serial Number   | Health
>> Status
>> -------------------------------------------------------------------------
>> */dev/sda | 50024e92039e4f1c | PATA/SATA | S2B5J90ZA10142  | Good**
>> **/dev/sdc | 50014ee0ad5953c9 | PATA/SATA | WD-WMAYP0982329 | Good**
>> **/dev/sdb | 50024e920387fa2c | PATA/SATA | S2B5J90ZA02494  | Good**
>> *
>>
>>
>> Could it be a bug in ceph-volume ?
>> Adam suggest looking to the underlying commands (lsblk, blkid, udevadm,
>> lvs, or pvs) but I'm not very comfortable with blkid and udevadm. Is
>> there a "debug flag" to set ceph more verbose ?
>>
>> Thanks
>>
>> Patrick
>>
>> Le 15/05/2023 à 21:20, Adam King a écrit :
>>> As you've already seem to have figured out, "ceph orch device
ls" is
>>> populated with the results from "ceph-volume inventory". My best
guess
>>> to try and debug this would be to manually run "cephadm ceph-volume --
>>> inventory" (the same as "cephadm ceph-volume inventory", I
just like
>>> to separate the ceph-volume command from cephadm itself with the " --
>>> ") and then check /var/log/ceph/<fsid>/ceph-volume.log from when
you
>>> ran the command onward to try and see why it isn't seeing your
>>> devices. For example I can see a line  like
>>>
>>> [2023-05-15 19:11:58,048][ceph_volume.main][INFO  ] Running command:
>>> ceph-volume  inventory
>>>
>>> in there. Then if I look onward from there I can see it ran things like
>>>
>>> lsblk -P -o
>>>
NAME,KNAME,PKNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL
>>>
>>> as part of getting my device list. So if I was having issues I would
>>> try running that directly and see what I got. Will note that
>>> ceph-volume on certain more recent versions (not sure about octopus)
>>> runs commands through nsenter, so you'd have to look past that part in
>>> the log lines to the underlying command being used, typically
>>> something with lsblk, blkid, udevadm, lvs, or pvs.
>>>
>>> Also, if you want to see if it's an issue with a certain version of
>>> ceph-volume, you can use different versions by passing the image flag
>>> to cephadm. E.g.
>>>
>>> cephadm --image quay.io/ceph/ceph:v17.2.6
>>> <http://quay.io/ceph/ceph:v17.2.6> ceph-volume -- inventory
>>>
>>> would use the 17.2.6 version of ceph-volume for the inventory. It
>>> works by running ceph-volume through the container, so you don't have
>>> to have to worry about installing different packages to try them and
>>> it should pull the container image on its own if it isn't on the
>>> machine already (but note that means the command will take longer as
>>> it pulls the image the first time).
>>>
>>>
>>>
>>> On Sat, May 13, 2023 at 4:34 AM Patrick Begou
>>> &lt;Patrick.Begou(a)univ-grenoble-alpes.fr&gt; wrote:
>>>
>>> Hi Joshua,
>>>
>>> I've tried these commands but it looks like CEPH is unable to see and
>>> configure these HDDs.
>>> [root@mostha1 ~]# cephadm ceph-volume inventory
>>>
>>> Inferring fsid 4b7a6504-f0be-11ed-be1a-00266cf8869c
>>> Using recent ceph image
>>>
quay.io/ceph/ceph@sha256:e6919776f0ff8331a8e9c4b18d36c5e9eed31e1a80da62ae8454e42d10e95544
>>>
<http://quay.io/ceph/ceph@sha256:e6919776f0ff8331a8e9c4b18d36c5e9eed31e1a80da62ae8454e42d10e95544>
>>>
>>> Device Path               Size         Device nodes rotates
>>> available Model name
>>>
>>> [root@mostha1 ~]# cephadm shell
>>>
>>> [ceph: root@mostha1 /]# ceph orch apply osd --all-available-devices
>>>
>>> Scheduled osd.all-available-devices update...
>>>
>>> [ceph: root@mostha1 /]# ceph orch device ls[ceph: root@mostha1 /]#
>>> ceph-volume lvm zap /dev/sdb
>>>
>>> --> Zapping: /dev/sdb
>>> --> --destroy was not specified, but zapping a whole device will
>>> remove the partition table
>>> Running command: /usr/bin/dd if=/dev/zero of=/dev/sdb bs=1M
>>> count=10
>>> conv=fsync
>>> stderr: 10+0 records in
>>> 10+0 records out
>>> 10485760 bytes (10 MB, 10 MiB) copied, 0.10039 s, 104 MB/s
>>> --> Zapping successful for: <Raw Device: /dev/sdb>
>>>
>>> I can check that /dev/sdb1 has been erased, so previous command is
>>> successful
>>> [ceph: root@mostha1 ceph]# lsblk
>>> NAME                 MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>>> sda                    8:0    1 232.9G  0 disk
>>> |-sda1                 8:1    1   3.9G  0 part /rootfs/boot
>>> |-sda2                 8:2    1  78.1G  0 part
>>> | `-osvg-rootvol     253:0    0  48.8G  0 lvm  /rootfs
>>> |-sda3                 8:3    1   3.9G  0 part [SWAP]
>>> `-sda4                 8:4    1 146.9G  0 part
>>> |-secretvg-homevol 253:1    0   9.8G  0 lvm  /rootfs/home
>>> |-secretvg-tmpvol  253:2    0   9.8G  0 lvm  /rootfs/tmp
>>> `-secretvg-varvol  253:3    0   9.8G  0 lvm  /rootfs/var
>>> sdb                    8:16   1 465.8G  0 disk
>>> sdc                    8:32   1 232.9G  0 disk
>>>
>>> But still no visible HDD:
>>>
>>> [ceph: root@mostha1 ceph]# ceph orch apply osd --all-available-devices
>>>
>>> Scheduled osd.all-available-devices update...
>>>
>>> [ceph: root@mostha1 ceph]# ceph orch device ls
>>> [ceph: root@mostha1 ceph]#
>>>
>>> May be I have done something bad at install time as in the container
>>> I've unintentionally run:
>>>
>>> dnf -y install
>>>
https://download.ceph.com/rpm-16.2.13/el8/noarch/cephadm-16.2.13-0.el8.noar…
>>>
>>> (an awful copy/paste launching the command). Can this break The
>>> container ? I do not know what should be available as ceph
>>> packages in
>>> the container to remove properly this install (no dnf.log file in the
>>> container)
>>>
>>> Patrick
>>>
>>>
>>> Le 12/05/2023 à 21:38, Beaman, Joshua a écrit :
>>>> The most significant point I see there, is you have no OSD service
>>>> spec to tell orchestrator how to deploy OSDs.  The easiest fix for
>>>> that would be “cephorchapplyosd--all-available-devices”
>>>>
>>>> This will create a simple spec that should work for a test
>>>> environment.  Most likely it will collocate the block, block.db,
>>> and
>>>> WAL all on the same device.  Not ideal for prod environments,
>>> but fine
>>>> for practice and testing.
>>>>
>>>> The other command I should have had you try is “cephadm ceph-volume
>>>> inventory”.  That should show you the devices available for OSD
>>>> deployment, and hopefully matches up to what your “lsblk”
>>> shows.  If
>>>> you need to zap HDDs and orchestrator is still not seeing them, you
>>>> can try “cephadm ceph-volume lvm zap /dev/sdb”
>>>>
>>>> Thank you,
>>>>
>>>> Josh Beaman
>>>>
>>>> *From: *Patrick Begou &lt;Patrick.Begou(a)univ-grenoble-alpes.fr&gt;
>>>> *Date: *Friday, May 12, 2023 at 2:22 PM
>>>> *To: *Beaman, Joshua &lt;Joshua_Beaman(a)comcast.com&gt;om>, ceph-users
>>>> &lt;ceph-users(a)ceph.io&gt;
>>>> *Subject: *Re: [EXTERNAL] [ceph-users] [Pacific] ceph orch
>>> device ls
>>>> do not returns any HDD
>>>>
>>>> Hi Joshua and thanks for this quick reply.
>>>>
>>>> At this step I have only one node. I was checking what ceph was
>>>> returning with different commands on this host before adding new
>>>> hosts. Just to compare with my first Octopus install. As this
>>> hardware
>>>> is for testing only, it remains easy for me to break everything and
>>>> reinstall again.
>>>>
>>>> [root@mostha1 ~]# cephadm check-host
>>>>
>>>> podman (/usr/bin/podman) version 4.2.0 is present
>>>> systemctl is present
>>>> lvcreate is present
>>>> Unit chronyd.service is enabled and running
>>>> Host looks OK
>>>>
>>>> [ceph: root@mostha1 /]# ceph -s
>>>>
>>>> cluster:
>>>> id:     4b7a6504-f0be-11ed-be1a-00266cf8869c
>>>> health: HEALTH_WARN
>>>> OSD count 0 < osd_pool_default_size 3
>>>>
>>>> services:
>>>> mon: 1 daemons, quorum mostha1.legi.grenoble-inp.fr
>>> <http://mostha1.legi.grenoble-inp.fr> (age 5h)
>>>> mgr: mostha1.legi.grenoble-inp.fr
>>> <http://mostha1.legi.grenoble-inp.fr>.hogwuz(active, since 5h)
>>>> osd: 0 osds: 0 up, 0 in
>>>>
>>>> data:
>>>> pools:   0 pools, 0 pgs
>>>> objects: 0 objects, 0 B
>>>> usage:   0 B used, 0 B / 0 B avail
>>>> pgs:
>>>>
>>>> [ceph: root@mostha1 /]# ceph orch ls
>>>>
>>>> NAME           PORTS        RUNNING REFRESHED  AGE PLACEMENT
>>>> alertmanager   ?:9093,9094      1/1  6m ago     6h count:1
>>>> crash                           1/1  6m ago     6h *
>>>> grafana        ?:3000           1/1  6m ago     6h count:1
>>>> mgr                             1/2  6m ago     6h count:2
>>>> mon                             1/5  6m ago     6h count:5
>>>> node-exporter  ?:9100           1/1  6m ago     6h *
>>>> prometheus     ?:9095           1/1  6m ago     6h count:1
>>>>
>>>> [ceph: root@mostha1 /]# ceph orch ls osd -export
>>>>
>>>> No services reported
>>>>
>>>> [ceph: root@mostha1 /]# ceph orch host ls
>>>>
>>>> HOST                          ADDR LABELS  STATUS
>>>> mostha1.legi.grenoble-inp.fr
>>> <http://mostha1.legi.grenoble-inp.fr> 194.254.66.34 _admin
>>>> 1 hosts in cluster
>>>>
>>>> [ceph: root@mostha1 /]# ceph log last cephadm
>>>>
>>>> ...
>>>> 2023-05-12T15:19:58.754655+0000
>>>> mgr.mostha1.legi.grenoble-inp.fr.hogwuz (mgr.44098) 1876 :
>>> cephadm
>>>> [INF] Zap device mostha1.legi.grenoble-inp.fr:/dev/sdb
>>>> 2023-05-12T15:19:58.756639+0000
>>>> mgr.mostha1.legi.grenoble-inp.fr.hogwuz (mgr.44098) 1877 :
>>> cephadm
>>>> [ERR] Device path '/dev/sdb' not found on host
>>>> 'mostha1.legi.grenoble-inp.fr
>>> <http://mostha1.legi.grenoble-inp.fr>'
>>>> Traceback (most recent call last):
>>>> File "/usr/share/ceph/mgr/orchestrator/_interface.py",
>>> line 125,
>>>> in wrapper
>>>> return OrchResult(f(*args, **kwargs))
>>>> File "/usr/share/ceph/mgr/cephadm/module.py", line 2275, in
>>>> zap_device
>>>> f"Device path '{path}' not found on host
'{host}'")
>>>> orchestrator._interface.OrchestratorError: Device path
>>> '/dev/sdb'
>>>> not found on host 'mostha1.legi.grenoble-inp.fr
>>> <http://mostha1.legi.grenoble-inp.fr>'
>>>> ....
>>>>
>>>> [ceph: root@mostha1 /]# ls -l /dev/sdb
>>>>
>>>> brw-rw---- 1 root disk 8, 16 May 12 15:16 /dev/sdb
>>>>
>>>> [ceph: root@mostha1 /]# lsblk /dev/sdb
>>>>
>>>> NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>>>> sdb      8:16   1 465.8G  0 disk
>>>> `-sdb1   8:17   1 465.8G  0 part
>>>>
>>>> I have crated a full partition on /dev/sdb (for testing) and
>>> /dev/sdc
>>>> has no partition table (removed).
>>>>
>>>> But all seams fine with these commands.
>>>>
>>>> Patrick
>>>>
>>>> Le 12/05/2023 à 20:19, Beaman, Joshua a écrit :
>>>>
>>>> I don’t quite understand why that zap would not work.  But,
>>> here’s
>>>> where I’d start.
>>>>
>>>> 1. cephadm check-host
>>>>
>>>> 1. Run this on each of your hosts to make sure cephadm,
>>>> podman and all other prerequisites are installed and
>>>> recognized
>>>>
>>>> 2. ceph orch ls
>>>>
>>>> 1. This should show at least a mon, mgr, and osd spec
>>> deployed
>>>>
>>>> 3. ceph orch ls osd –export
>>>>
>>>> 1. This will show the OSD placement service specifications
>>>> that orchestrator uses to identify devices to deploy
>>> as OSDs
>>>>
>>>> 4. ceph orch host ls
>>>>
>>>> 1. This will list the hosts that have been added to
>>>> orchestrator’s inventory, and what labels are applied
>>>> which correlate to the service placement labels
>>>>
>>>> 5. ceph log last cephadm
>>>>
>>>> 1. This will show you what orchestrator has been trying to
>>>> do, and how it may be failing
>>>>
>>>> Also, it’s never un-helpful to have a look at “ceph -s” and
>>> “ceph
>>>> health detail”, particularly for any people trying to help you
>>>> without access to your systems.
>>>>
>>>> Best of luck,
>>>>
>>>> Josh Beaman
>>>>
>>>> *From: *Patrick Begou &lt;Patrick.Begou(a)univ-grenoble-alpes.fr&gt;
>>>> <mailto:Patrick.Begou@univ-grenoble-alpes.fr>
>>>> *Date: *Friday, May 12, 2023 at 10:45 AM
>>>> *To: *ceph-users &lt;ceph-users(a)ceph.io&gt;
>>> <mailto:ceph-users@ceph.io>
>>>> *Subject: *[EXTERNAL] [ceph-users] [Pacific] ceph orch device ls
>>>> do not returns any HDD
>>>>
>>>> Hi everyone
>>>>
>>>> I'm new to CEPH, just a french 4 days training session with
>>>> Octopus on
>>>> VMs that convince me to build my first cluster.
>>>>
>>>> At this time I have 4 old identical nodes for testing with 3
>>> HDDs
>>>> each,
>>>> 2 network interfaces and running Alma Linux8 (el8). I try to
>>>> replay the
>>>> training session but it fails, breaking the web interface
>>> because of
>>>> some problems with podman 4.2 not compatible with Octopus.
>>>>
>>>> So I try to deploy Pacific with cephadm tool on my first node
>>>> (mostha1)
>>>> (to enable testing also an upgrade later).
>>>>
>>>> dnf -y install
>>>
https://urldefense.com/v3/__https://download.ceph.com/rpm-16.2.13/el8/noarc…
>>>>
>>>
<https://urldefense.com/v3/__https:/download.ceph.com/rpm-16.2.13/el8/noarch/cephadm-16.2.13-0.el8.noarch.rpm__;!!CQl3mcHX2A!H9cwNCJyKXYQ4BbGA3gwHHRitjOS4lBCZT9wlnBZ-8IDue0MvdcPD8Dnv5yQCZw_eA4BNDYaEq1eouKQcQO7HshgdUJ0SJ-EgLfaBGBmCQ$>
>>>>
>>>>
>>>> monip=$(getent ahostsv4 mostha1 |head -n 1| awk '{ print
>>> $1 }')
>>>> cephadm bootstrap --mon-ip $monip
>>> --initial-dashboard-password
>>>> xxxxx \
>>>> --initial-dashboard-user admceph \
>>>> --allow-fqdn-hostname --cluster-network
>>>> 10.1.0.0/16 <http://10.1.0.0/16>
>>>>
>>>> This was sucessfull.
>>>>
>>>> But running "*c**eph orch device ls*" do not show any HDD
>>> even if
>>>> I have
>>>> /dev/sda (used by the OS), /dev/sdb and /dev/sdc
>>>>
>>>> The web interface shows a row capacity which is an aggregate
>>> of the
>>>> sizes of the 3 HDDs for the node.
>>>>
>>>> I've also tried to reset /dev/sdb but cephadm do not see it:
>>>>
>>>> [ceph: root@mostha1 /]# ceph orch device zap
>>>> mostha1.legi.grenoble-inp.fr
>>> <http://mostha1.legi.grenoble-inp.fr> /dev/sdb --force
>>>> Error EINVAL: Device path '/dev/sdb' not found on host
>>>> 'mostha1.legi.grenoble-inp.fr
>>> <http://mostha1.legi.grenoble-inp.fr>'
>>>>
>>>> On my first attempt with octopus, I was able to list the
>>> available
>>>> HDD
>>>> with this command line. Before moving to Pacific, the OS on
>>> this node
>>>> has been reinstalled from scratch.
>>>>
>>>> Any advices for a CEPH beginner ?
>>>>
>>>> Thanks
>>>>
>>>> Patrick
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD