Hi,
Installing ceph from the debian unstable repository (ceph version 14.2.6
(f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable), debian
package: 14.2.6-6) has fixed things form me.
(See also the bug report and the duplicate of it and the changelog of
14.2.6-6
- bauen1
On 1/30/20 3:31 AM, Dave Hall wrote:
> Jan,
>
> In trying to recover my OSDs after the upgrade from Nautilus described
> earlier, I eventually managed to make things worse to the point where I'm
> going to scrub and fully reinstall. So I zapped all of the devices on one
> of my nodes and reproduced the ceph-volume lvm create error I mentioned
> earlier, using the procedure from
>
https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/ to
> lay out the LVs and issue ceph-volume lvm create. As I was concerned that
> maybe it was a size thing, I only create a 4TB block LV for my first
> attempt, and the full 12TB drive for my second attempt.
>
> The output is:
>
> root@ceph01:~# ceph-volume lvm create --bluestore --data
> ceph-block-0/block-0 --block.db ceph-db-0/db-0
> Running command: /usr/bin/ceph-authtool --gen-print-key
> Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
> 6441f236-8694-46b9-9c6a-bf82af89765d
> Running command: /usr/bin/ceph-authtool --gen-print-key
> Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-24
> --> Absolute path not found for executable: selinuxenabled
> --> Ensure $PATH environment variable contains common executable locations
> Running command: /bin/chown -h ceph:ceph /dev/ceph-block-0/block-0
> Running command: /bin/chown -R ceph:ceph /dev/dm-0
> Running command: /bin/ln -s /dev/ceph-block-0/block-0
> /var/lib/ceph/osd/ceph-24/block
> Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o
> /var/lib/ceph/osd/ceph-24/activate.monmap
> stderr: got monmap epoch 4
> Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-24/keyring
> --create-keyring --name osd.24 --add-key
> AQAuMjJe5OGHBRAAP94+1E7CzV5Rv9HFj9WVqA==
> stdout: creating /var/lib/ceph/osd/ceph-24/keyring
> added entity osd.24 auth(key=AQAuMjJe5OGHBRAAP94+1E7CzV5Rv9HFj9WVqA==)
> Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-24/keyring
> Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-24/
> Running command: /bin/chown -h ceph:ceph /dev/ceph-db-0/db-0
> Running command: /bin/chown -R ceph:ceph /dev/dm-1
> Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore
> bluestore --mkfs -i 24 --monmap /var/lib/ceph/osd/ceph-24/activate.monmap
> --keyfile - --bluestore-block-db-path /dev/ceph-db-0/db-0 --osd-data
> /var/lib/ceph/osd/ceph-24/ --osd-uuid 6441f236-8694-46b9-9c6a-bf82af89765d
> --setuser ceph --setgroup ceph
> stderr: 2020-01-29 20:32:33.054 7ff4c24abc80 -1
> bluestore(/var/lib/ceph/osd/ceph-24/) _read_fsid unparsable uuid
> stderr: terminate called after throwing an instance of
>
'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::bad_get>
>> '
> stderr: what(): boost::bad_get: failed value get using boost::get
> stderr: *** Caught signal (Aborted) **
> stderr: in thread 7ff4c24abc80 thread_name:ceph-osd
> stderr: ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9)
> nautilus (stable)
> stderr: 1: (()+0x12730) [0x7ff4c2f54730]
> stderr: 2: (gsignal()+0x10b) [0x7ff4c2a377bb]
> stderr: 3: (abort()+0x121) [0x7ff4c2a22535]
> stderr: 4: (()+0x8c983) [0x7ff4c2dea983]
> stderr: 5: (()+0x928c6) [0x7ff4c2df08c6]
> stderr: 6: (()+0x92901) [0x7ff4c2df0901]
> stderr: 7: (()+0x92b34) [0x7ff4c2df0b34]
> stderr: 8: (()+0x5a3f53) [0x564eed1c4f53]
> stderr: 9: (Option::size_t const
> md_config_t::get_val<Option::size_t>(ConfigValues const&,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&) const+0x81) [0x564eed1cac91]
> stderr: 10: (BlueStore::_set_cache_sizes()+0x15a) [0x564eed645d8a]
> stderr: 11: (BlueStore::_open_bdev(bool)+0x173) [0x564eed648b23]
> stderr: 12: (BlueStore::mkfs()+0x42b) [0x564eed6adeab]
> stderr: 13: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5)
> [0x564eed1e4bf5]
> stderr: 14: (main()+0x1796) [0x564eed191366]
> stderr: 15: (__libc_start_main()+0xeb) [0x7ff4c2a2409b]
> stderr: 16: (_start()+0x2a) [0x564eed1c4c6a]
> stderr: 2020-01-29 20:32:33.062 7ff4c24abc80 -1 *** Caught signal
> (Aborted) **
> stderr: in thread 7ff4c24abc80 thread_name:ceph-osd
> stderr: ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9)
> nautilus (stable)
> stderr: 1: (()+0x12730) [0x7ff4c2f54730]
> stderr: 2: (gsignal()+0x10b) [0x7ff4c2a377bb]
> stderr: 3: (abort()+0x121) [0x7ff4c2a22535]
> stderr: 4: (()+0x8c983) [0x7ff4c2dea983]
> stderr: 5: (()+0x928c6) [0x7ff4c2df08c6]
> stderr: 6: (()+0x92901) [0x7ff4c2df0901]
> stderr: 7: (()+0x92b34) [0x7ff4c2df0b34]
> stderr: 8: (()+0x5a3f53) [0x564eed1c4f53]
> stderr: 9: (Option::size_t const
> md_config_t::get_val<Option::size_t>(ConfigValues const&,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&) const+0x81) [0x564eed1cac91]
> stderr: 10: (BlueStore::_set_cache_sizes()+0x15a) [0x564eed645d8a]
> stderr: 11: (BlueStore::_open_bdev(bool)+0x173) [0x564eed648b23]
> stderr: 12: (BlueStore::mkfs()+0x42b) [0x564eed6adeab]
> stderr: 13: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5)
> [0x564eed1e4bf5]
> stderr: 14: (main()+0x1796) [0x564eed191366]
> stderr: 15: (__libc_start_main()+0xeb) [0x7ff4c2a2409b]
> stderr: 16: (_start()+0x2a) [0x564eed1c4c6a]
> stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> stderr: -5> 2020-01-29 20:32:33.054 7ff4c24abc80 -1
> bluestore(/var/lib/ceph/osd/ceph-24/) _read_fsid unparsable uuid
> stderr: 0> 2020-01-29 20:32:33.062 7ff4c24abc80 -1 *** Caught signal
> (Aborted) **
> stderr: in thread 7ff4c24abc80 thread_name:ceph-osd
> stderr: ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9)
> nautilus (stable)
> stderr: 1: (()+0x12730) [0x7ff4c2f54730]
> stderr: 2: (gsignal()+0x10b) [0x7ff4c2a377bb]
> stderr: 3: (abort()+0x121) [0x7ff4c2a22535]
> stderr: 4: (()+0x8c983) [0x7ff4c2dea983]
> stderr: 5: (()+0x928c6) [0x7ff4c2df08c6]
> stderr: 6: (()+0x92901) [0x7ff4c2df0901]
> stderr: 7: (()+0x92b34) [0x7ff4c2df0b34]
> stderr: 8: (()+0x5a3f53) [0x564eed1c4f53]
> stderr: 9: (Option::size_t const
> md_config_t::get_val<Option::size_t>(ConfigValues const&,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&) const+0x81) [0x564eed1cac91]
> stderr: 10: (BlueStore::_set_cache_sizes()+0x15a) [0x564eed645d8a]
> stderr: 11: (BlueStore::_open_bdev(bool)+0x173) [0x564eed648b23]
> stderr: 12: (BlueStore::mkfs()+0x42b) [0x564eed6adeab]
> stderr: 13: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5)
> [0x564eed1e4bf5]
> stderr: 14: (main()+0x1796) [0x564eed191366]
> stderr: 15: (__libc_start_main()+0xeb) [0x7ff4c2a2409b]
> stderr: 16: (_start()+0x2a) [0x564eed1c4c6a]
> stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> stderr: -5> 2020-01-29 20:32:33.054 7ff4c24abc80 -1
> bluestore(/var/lib/ceph/osd/ceph-24/) _read_fsid unparsable uuid
> stderr: 0> 2020-01-29 20:32:33.062 7ff4c24abc80 -1 *** Caught signal
> (Aborted) **
> stderr: in thread 7ff4c24abc80 thread_name:ceph-osd
> stderr: ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9)
> nautilus (stable)
> stderr: 1: (()+0x12730) [0x7ff4c2f54730]
> stderr: 2: (gsignal()+0x10b) [0x7ff4c2a377bb]
> stderr: 3: (abort()+0x121) [0x7ff4c2a22535]
> stderr: 4: (()+0x8c983) [0x7ff4c2dea983]
> stderr: 5: (()+0x928c6) [0x7ff4c2df08c6]
> stderr: 6: (()+0x92901) [0x7ff4c2df0901]
> stderr: 7: (()+0x92b34) [0x7ff4c2df0b34]
> stderr: 8: (()+0x5a3f53) [0x564eed1c4f53]
> stderr: 9: (Option::size_t const
> md_config_t::get_val<Option::size_t>(ConfigValues const&,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&) const+0x81) [0x564eed1cac91]
> stderr: 10: (BlueStore::_set_cache_sizes()+0x15a) [0x564eed645d8a]
> stderr: 11: (BlueStore::_open_bdev(bool)+0x173) [0x564eed648b23]
> stderr: 12: (BlueStore::mkfs()+0x42b) [0x564eed6adeab]
> stderr: 13: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5)
> [0x564eed1e4bf5]
> stderr: 14: (main()+0x1796) [0x564eed191366]
> stderr: 15: (__libc_start_main()+0xeb) [0x7ff4c2a2409b]
> stderr: 16: (_start()+0x2a) [0x564eed1c4c6a]
> stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> --> Was unable to complete a new OSD, will rollback changes
> Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.24
> --yes-i-really-mean-it
> stderr: purged osd.24
> --> RuntimeError: Command failed with exit code 250: /usr/bin/ceph-osd
> --cluster ceph --osd-objectstore bluestore --mkfs -i 24 --monmap
> /var/lib/ceph/osd/ceph-24/activate.monmap --keyfile -
> --bluestore-block-db-path /dev/ceph-db-0/db-0 --osd-data
> /var/lib/ceph/osd/ceph-24/ --osd-uuid 6441f236-8694-46b9-9c6a-bf82af89765d
> --setuser ceph --setgroup ceph
> root@ceph01:~#
>
> Dave Hall
> Binghamton Universitykdhall(a)binghamton.edu
> 607-760-2328 (Cell)
> 607-777-4641 (Office)
>
>
>
> On 1/29/2020 3:15 AM, Jan Fajerski wrote:
>
> On Tue, Jan 28, 2020 at 08:03:35PM +0100, bauen1 wrote:
>
> Hi,
>
> I've run into the same issue while testing:
>
> ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9)
> nautilus (stable)
>
> debian bullseye
>
> Ceph was installed using ceph-ansible on a vm from the repo
>
http://download.ceph.com/debian-nautilus
>
> The output of `sudo sh -c 'CEPH_VOLUME_DEBUG=true ceph-volume
> --cluster test lvm batch --bluestore /dev/vdb'` has been attached.
>
> Thx, I opened
https://tracker.ceph.com/issues/43868.
> This looks like a bluestore/osd issue to me, though it might end up being
> ceph-volumes fault.
>
> Also worth noting might be that '/var/lib/ceph/osd/test-0/fsid' is
> empty (but I don't know too much about the internals)
>
> - bauen1
>
> On 1/28/20 4:54 PM, Dave Hall wrote:
>
> Jan,
>
> Unfortunately I'm under immense pressure right now to get some form
> of Ceph into production, so it's going to be Luminous for now, or
> maybe a live upgrade to Nautilus without recreating the OSDs (if
> that's possible).
>
> The good news is that in the next couple months I expect to add more
> hardware that should be nearly identical. I will gladly give it a
> go at that time and see if I can recreate. (Or, if I manage to
> thoroughly crash my current fledgling cluster, I'll give it another
> go on one node while I'm up all night recovering.)
>
> If you could tell me where to look I'd gladly read some code and see
> if I can find anything that way. Or if there's any sort of design
> document describing the deep internals I'd be glad to scan it to see
> if I've hit a corner case of some sort. Actually, I'd be interested
> in reading those documents anyway if I could.
>
> Thanks.
>
> -Dave
>
> Dave Hall
>
> On 1/28/2020 3:05 AM, Jan Fajerski wrote:
>
> On Mon, Jan 27, 2020 at 03:23:55PM -0500, Dave Hall wrote:
>
> All,
>
> I've just spent a significant amount of time unsuccessfully chasing
> the _read_fsid unparsable uuid error on Debian 10 / Natilus 14.2.6.
> Since this is a brand new cluster, last night I gave up and moved back
> to Debian 9 / Luminous 12.2.11. In both cases I'm using the packages
>
> >from Debian Backports with ceph-ansible as my deployment tool.
>
> Note that above I said 'the _read_fsid unparsable uuid' error. I've
> searched around a bit and found some previously reported issues, but I
> did not see any conclusive resolutions.
>
> I would like to get to Nautilus as quickly as possible, so I'd gladly
> provide additional information to help track down the cause of this
> symptom. I can confirm that, looking at the ceph-volume.log on the
> OSD host I see no difference between the ceph-volume lvm batch command
> generated by the ceph-ansible versions associated with these two Ceph
> releases:
>
> ceph-volume --cluster ceph lvm batch --bluestore --yes
> --block-db-size 133358734540 /dev/sdc /dev/sdd /dev/sde /dev/sdf
> /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/nvme0n1
>
> Note that I'm using --block-db-size to divide my NVMe into 12 segments
> as I have 4 empty drive bays on my OSD servers that I may eventually
> be able to fill.
>
> My OSD hardware is:
>
> Disk /dev/nvme0n1: 1.5 TiB, 1600321314816 bytes, 3125627568 sectors
> Disk /dev/sdc: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors
> Disk /dev/sdd: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors
> Disk /dev/sde: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors
> Disk /dev/sdf: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors
> Disk /dev/sdg: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors
> Disk /dev/sdh: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors
> Disk /dev/sdi: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors
> Disk /dev/sdj: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors
>
> I'd send the output of ceph-volume inventory on Luminous, but I'm
> getting -->: KeyError: 'human_readable_size'.
>
> Please let me know if I can provide any further information.
>
> Mind re-running you ceph-volume command with debug output
> enabled:
> CEPH_VOLUME_DEBUG=true ceph-volume --cluster ceph lvm batch
> --bluestore ...
>
> Ideally you could also openen a bug report
>
herehttps://tracker.ceph.com/projects/ceph-volume/issues/new
>
> Thanks!
>
> Thanks.
>
> -Dave
>