December 2023 - Dev - lists.ceph.io

by Casey Bodley

hey Gal and Eric, in today's standup, we discussed the version of our apache arrow submodule. it's currently pinned at 6.0.1, which was tagged in nov. 2021. the centos9 builds are using the system package libarrow-devel-9.0.0. arrow's upstream recently tagged an 11.0.0 release as far as i know, there still aren't any system packages for ubuntu, so we're likely to be stuck with the submodule for quite a while. how do guys want to handle these updates? is it worth trying to update before the reef release?

2 months, 2 weeks

4
7
0 0

CLT Meeting Minutes 2023-12-20

by Laura Flores

(Zac Dover) "make check" fails now even for some docs builds. For example: https://github.com/ceph/ceph/pull/54970, which is a simple edit of ReStructured Text in doc/radosgw/compression.rst. Greg Farnum and Dan Mick have already done preliminary investigation of this matter here: https://ceph-storage.slack.com/archives/C1HFJ4VTN/p1703048785756359. - Follow Slack thread for updates; we'll continue looking into it Still 38 PRs to scrub for 16.2.15: https://github.com/ceph/ceph/pulls?q=is%3Aopen+is%3Apr+milestone%3Apacific - Looking for PRs that are necessary in the release, as well as non-trivial PRs that have been open for a while -- Laura Flores She/Her/Hers Software Engineer, Ceph Storage <https://ceph.io> Chicago, IL lflores(a)ibm.com | lflores(a)redhat.com <lflores(a)redhat.com> M: +17087388804

4 months, 1 week

1
0
0 0

Fwd: Can not activate some OSDs after upgrade (bad crc on label)

by Huseyin Cotuk

Hi Cephers, Any idea about this? Best regards, Huseyin Cotuk hcotuk(a)gmail.com Begin forwarded message: > > From: Huseyin Cotuk <hcotuk(a)gmail.com> > Subject: Can not activate some OSDs after upgrade (bad crc on label) > Date: 19 December 2023 at 14:09:20 GMT+3 > To: ceph-users(a)ceph.io > > Hello Cephers, > > I have two identical Ceph clusters with 32 OSDs each, running radosgw with EC. They were running Octopus on Ubuntu 20.04. > > On one of these clusters, I have upgraded OS to Ubuntu 22.04 and Ceph version is upgraded to Quincy 17.2.6. This cluster completed the process without any issue and it works as expected. > > On the second cluster, I followed the same procedure and upgraded the cluster. After upgrade 9 of 32 OSDs can not be activated. AFAIU, the label of these OSDs can not be read. ceph-volume lvm activate {osd.id} {osd_fsid} command fails as below: > > stderr: failed to read label for /dev/ceph-block-13/block-13: (5) Input/output error > stderr: 2023-12-19T11:46:25.310+0300 7f088cd7ea80 -1 bluestore(/dev/ceph-block-13/block-13) _read_bdev_label bad crc on label, expected 2340927273 != actual 2067505886 > > All ceph-bluestore-tool and ceph-object-storetool commands fail with the same message, so I can not try repair, fsck or migrate. > > # ceph-bluestore-tool repair --deep yes --path /var/lib/ceph/osd/ceph-13/ > failed to load os-type: (2) No such file or directory > 2023-12-19T13:57:06.551+0300 7f39b1635a80 -1 bluestore(/var/lib/ceph/osd/ceph-13/block) _read_bdev_label bad crc on label, expected 2340927273 != actual 2067505886 > > I also tried show label with bluestore-tool without success. > > # ceph-bluestore-tool show-label --dev /dev/ceph-block-13/block-13 > unable to read label for /dev/ceph-block-13/block-13: (5) Input/output error > 2023-12-19T14:01:19.668+0300 7fdcdd111a80 -1 bluestore(/dev/ceph-block-13/block-13) _read_bdev_label bad crc on label, expected 2340927273 != actual 2067505886 > > I can get the information including osd_fsif, block_uuid of all failed OSDs via ceph-volume lvm list like below. > > ====== osd.13 ====== > > [block] /dev/ceph-block-13/block-13 > > block device /dev/ceph-block-13/block-13 > block uuid jFaTba-ln5r-muQd-7Ef9-3tWe-JwvO-qW9nqi > cephx lockbox secret > cluster fsid 4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f > cluster name ceph > crush device class None > encrypted 0 > osd fsid c9ee3ef6-73d7-4029-9cd6-086cc95d2f27 > osd id 13 > osdspec affinity > type block > vdo 0 > devices /dev/mapper/mpathb > > All vgs and lvs look healthy. > > # lvdisplay ceph-block-13/block-13 > --- Logical volume --- > LV Path /dev/ceph-block-13/block-13 > LV Name block-13 > VG Name ceph-block-13 > LV UUID jFaTba-ln5r-muQd-7Ef9-3tWe-JwvO-qW9nqi > LV Write Access read/write > LV Creation host, time ank-backup01, 2023-11-29 10:41:53 +0300 > LV Status available > # open 0 > LV Size <7.28 TiB > Current LE 1907721 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:33 > > This is a single node cluster running only radosgw. The environment is as follows: > > # ceph -v > ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) > > # lsb_release -a > No LSB modules are available. > Distributor ID: Ubuntu > Description: Ubuntu 22.04.3 LTS > Release: 22.04 > Codename: jammy > > # ceph osd crush rule dump > [ > { > "rule_id": 0, > "rule_name": "osd_replicated_rule", > "type": 1, > "steps": [ > { > "op": "take", > "item": -2, > "item_name": "default~hdd" > }, > { > "op": "choose_firstn", > "num": 0, > "type": "osd" > }, > { > "op": "emit" > } > ] > }, > { > "rule_id": 2, > "rule_name": "default.rgw.buckets.data", > "type": 3, > "steps": [ > { > "op": "set_chooseleaf_tries", > "num": 5 > }, > { > "op": "set_choose_tries", > "num": 100 > }, > { > "op": "take", > "item": -1, > "item_name": "default" > }, > { > "op": "choose_indep", > "num": 0, > "type": "osd" > }, > { > "op": "emit" > } > ] > } > ] > > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -1 226.29962 root default > -3 226.29962 host ank-backup01 > 0 hdd 7.29999 osd.0 up 1.00000 1.00000 > 1 hdd 7.29999 osd.1 up 1.00000 1.00000 > 2 hdd 7.29999 osd.2 up 1.00000 1.00000 > 3 hdd 7.29999 osd.3 up 1.00000 1.00000 > 4 hdd 7.29999 osd.4 up 1.00000 1.00000 > 5 hdd 7.29999 osd.5 up 1.00000 1.00000 > 6 hdd 7.29999 osd.6 up 1.00000 1.00000 > 7 hdd 7.29999 osd.7 up 1.00000 1.00000 > 8 hdd 7.29999 osd.8 up 1.00000 1.00000 > 9 hdd 7.29999 osd.9 up 1.00000 1.00000 > 10 hdd 7.29999 osd.10 up 1.00000 1.00000 > 11 hdd 7.29999 osd.11 up 1.00000 1.00000 > 12 hdd 7.29999 osd.12 down 0 1.00000 > 13 hdd 7.29999 osd.13 down 0 1.00000 > 14 hdd 7.29999 osd.14 down 0 1.00000 > 15 hdd 7.29999 osd.15 down 0 1.00000 > 16 hdd 7.29999 osd.16 down 0 1.00000 > 17 hdd 7.29999 osd.17 down 0 1.00000 > 18 hdd 7.29999 osd.18 down 0 1.00000 > 19 hdd 7.29999 osd.19 down 0 1.00000 > 20 hdd 7.29999 osd.20 down 0 1.00000 > 21 hdd 7.29999 osd.21 up 1.00000 1.00000 > 22 hdd 7.29999 osd.22 up 1.00000 1.00000 > 23 hdd 7.29999 osd.23 up 1.00000 1.00000 > 24 hdd 7.29999 osd.24 up 1.00000 1.00000 > 25 hdd 7.29999 osd.25 up 1.00000 1.00000 > 26 hdd 7.29999 osd.26 up 1.00000 1.00000 > 27 hdd 7.29999 osd.27 up 1.00000 1.00000 > 28 hdd 7.29999 osd.28 up 1.00000 1.00000 > 29 hdd 7.29999 osd.29 up 1.00000 1.00000 > 30 hdd 7.29999 osd.30 up 1.00000 1.00000 > 31 hdd 7.29999 osd.31 up 1.00000 1.00000 > > Does anybody have any idea why the labels of these OSDs can not be read? Any help would be appreciated. > > Best Regards, > Huseyin Cotuk > hcotuk(a)gmail.com > > > >

4 months, 1 week

2
1
0 0

No User + Dev Monthly Meetup this week - Happy Holidays!

by Laura Flores

Hi all, A quick reminder that the User + Dev Monthly Meetup that was scheduled for this week December 21 is cancelled due to the holidays. The User + Dev Monthly Meetup will resume in the new year on January 18. If you have a topic you'd like to present at an upcoming meetup, you're welcome to submit it here: https://docs.google.com/forms/d/e/1FAIpQLSdboBhxVoBZoaHm8xSmeBoemuXoV_rmh4v… Wishing everyone a happy holiday season! Laura Flores -- Laura Flores She/Her/Hers Software Engineer, Ceph Storage <https://ceph.io> Chicago, IL lflores(a)ibm.com | lflores(a)redhat.com <lflores(a)redhat.com> M: +17087388804

4 months, 1 week

1
0
0 0

v18.2.1 Reef released

by Yuri Weinstein

We're happy to announce the 1st backport release in the Reef series. This is the first backport release in the Reef series, and the first with Debian packages, for Debian Bookworm. We recommend all users update to this release. https://ceph.io/en/news/blog/2023/v18-2-1-reef-released/ Notable Changes --------------- * RGW: S3 multipart uploads using Server-Side Encryption now replicate correctly in multi-site. Previously, the replicas of such objects were corrupted on decryption. A new tool, ``radosgw-admin bucket resync encrypted multipart``, can be used to identify these original multipart uploads. The ``LastModified`` timestamp of any identified object is incremented by 1ns to cause peer zones to replicate it again. For multi-site deployments that make any use of Server-Side Encryption, we recommended running this command against every bucket in every zone after all zones have upgraded. * CEPHFS: MDS evicts clients which are not advancing their request tids which causes a large buildup of session metadata resulting in the MDS going read-only due to the RADOS operation exceeding the size threshold. `mds_session_metadata_threshold` config controls the maximum size that a (encoded) session metadata can grow. * RGW: New tools have been added to radosgw-admin for identifying and correcting issues with versioned bucket indexes. Historical bugs with the versioned bucket index transaction workflow made it possible for the index to accumulate extraneous "book-keeping" olh entries and plain placeholder entries. In some specific scenarios where clients made concurrent requests referencing the same object key, it was likely that a lot of extra index entries would accumulate. When a significant number of these entries are present in a single bucket index shard, they can cause high bucket listing latencies and lifecycle processing failures. To check whether a versioned bucket has unnecessary olh entries, users can now run ``radosgw-admin bucket check olh``. If the ``--fix`` flag is used, the extra entries will be safely removed. A distinct issue from the one described thus far, it is also possible that some versioned buckets are maintaining extra unlinked objects that are not listable from the S3/ Swift APIs. These extra objects are typically a result of PUT requests that exited abnormally, in the middle of a bucket index transaction - so the client would not have received a successful response. Bugs in prior releases made these unlinked objects easy to reproduce with any PUT request that was made on a bucket that was actively resharding. Besides the extra space that these hidden, unlinked objects consume, there can be another side effect in certain scenarios, caused by the nature of the failure mode that produced them, where a client of a bucket that was a victim of this bug may find the object associated with the key to7fe91d5d5842e04be3b4f514d6dd990c54b29c76 be in an inconsistent state. To check whether a versioned bucket has unlinked entries, users can now run ``radosgw-admin bucket check unlinked``. If the ``--fix`` flag is used, the unlinked objects will be safely removed. Finally, a third issue made it possible for versioned bucket index stats to be accounted inaccurately. The tooling for recalculating versioned bucket stats also had a bug, and was not previously capable of fixing these inaccuracies. This release resolves those issues and users can now expect that the existing ``radosgw-admin bucket check`` command will produce correct results. We recommend that users with versioned buckets, especially those that existed on prior releases, use these new tools to check whether their buckets are affected and to clean them up accordingly. * mgr/snap-schedule: For clusters with multiple CephFS file systems, all the snap-schedule commands now expect the '--fs' argument. * RADOS: A POOL_APP_NOT_ENABLED health warning will now be reported if the application is not enabled for the pool irrespective of whether the pool is in use or not. Always add ``application`` label to a pool to avoid reporting of POOL_APP_NOT_ENABLED health warning for that pool. The user might temporarilty mute this warning using ``ceph health mute POOL_APP_NOT_ENABLED``. Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at https://download.ceph.com/tarballs/ceph-18.2.1.tar.gz * Containers at https://quay.io/repository/ceph/ceph * For packages, see https://docs.ceph.com/en/latest/install/get-packages/ * Release git sha1: 7fe91d5d5842e04be3b4f514d6dd990c54b29c76

4 months, 1 week

1
0
0 0

pacific 16.2.15

by Yuri Weinstein

Hello We are finalizing the remaining issues for the last pacific release 16.2.15. Dev leads, please reply with issues that you plan to add. TIA

4 months, 2 weeks

2
1
0 0

Ceph Leadership Team meeting notes 2023-12-13

by Neha Ojha

Highlights from this week's CLT meeting Release updates: 18.2.1 - QE validation done, gibba lab upgrade complete without issue, LRC ran into issues with unexpected cephadm exception and one monitor down - being investigated with no known impact on services yet 16.2.15 - https://github.com/ceph/ceph/milestone/17 in progress General updates: - Squid kickoff PR(https://github.com/ceph/ceph/pull/53191) going through QE validation across suites, will be merged before the holidays - MDSMap decode issue (18.2.0/1 user-space cephfs clients broken with squid+) has been discovered, CephFS team will work on a plan of action to address it (Venky to send a detailed email to the dev list about the issue) - FYI, a minor change to vstart is being merged https://github.com/ceph/ceph/pull/53393 - Ceph Days NYC being planned by Bloomberg - it will be nice to announce the Squid release at the event (current ETA April 2024) Thanks, Neha

4 months, 2 weeks

1
0
0 0

mds.0.journaler.pq(ro) _finish_read got error -2

by Eugen Block

Hi, I'm trying to help someone with a broken CephFS. We managed to recover basic ceph functionality but the CephFS is still inaccessible (currently read-only). We went through the disaster recovery steps but to no avail. Here's a snippet from the startup logs: ---snip--- mds.0.41 Booting: 2: waiting for purge queue recovered mds.0.journaler.pq(ro) _finish_probe_end write_pos = 14797504512 (header had 14789452521). recovered. mds.0.purge_queue operator(): open complete mds.0.purge_queue operator(): recovering write_pos monclient: get_auth_request con 0x55c280bc5c00 auth_method 0 monclient: get_auth_request con 0x55c280ee0c00 auth_method 0 mds.0.journaler.pq(ro) _finish_read got error -2 mds.0.purge_queue _recover: Error -2 recovering write_pos mds.0.purge_queue _go_readonly: going readonly because internal IO failed: No such file or directory mds.0.journaler.pq(ro) set_readonly mds.0.41 unhandled write error (2) No such file or directory, force readonly... mds.0.cache force file system read-only force file system read-only ---snip--- I've added the dev mailing list, maybe someone can give some advice how to continue from here (we could try to recover with an empty metadata pool). Or is this FS lost? Thanks! Eugen

4 months, 2 weeks

2
2
0 0

Announcing go-ceph v0.25.0

by Anoop C S

We are happy to announce another release of the go-ceph API library. This is a regular release following our every-two-months release cadence. https://github.com/ceph/go-ceph/releases/tag/v0.25.0 More details are available at the link above. The library includes bindings that aim to play a similar role to the "pybind" python bindings in the ceph tree but for the Go language. The library also includes additional APIs that can be used to administer cephfs, rbd, and rgw subsystems. There are already a few consumers of this library in the wild, including the ceph-csi project. Anoop C S

4 months, 2 weeks

1
0
0 0

Re: MDS recovery with existing pools

by Eugen Block

Update: apparently, we did it! We walked through the disaster recovery steps where one of the steps was to reset the journal. I was under the impression that the specified command 'cephfs-journal-tool [--rank=N] journal reset' would simply reset all the journals (mdlog and purge_queue), but it seems like it doesn't. After Mykola (once again, thank you so much for your input) pointed towards running the command for the purge_queue specifically, the filesystem got out of the read-only mode and was mountable again. the exact command was: cephfs-journal-tool --rank=cephfs:0 --journal=purge_queue journal reset We didn't have to walk through the recovery with an empty pool, which is nice. I have a suggestion to include the "journal inspect" command to the docs for both mdlog and purge_queue to understand that both journals might need a reset. Thanks again, Mykola! Eugen Zitat von Eugen Block <eblock(a)nde.ag>: > So we did walk through the advanced recovery page but didn't really > succeed. The CephFS is still going to readonly because of the > purge_queue error. Is there any chance to recover from that or > should we try to recover with an empty metadata pool next? > I'd still appreciate any comments. ;-) > > Zitat von Eugen Block <eblock(a)nde.ag>: > >> Some more information on the damaged CephFS, apparently the journal >> is damaged: >> >> ---snip--- >> # cephfs-journal-tool --rank=storage:0 --journal=mdlog journal inspect >> >> 2023-12-08T15:35:22.922+0200 7f834d0320c0 -1 Missing object 200.000527c4 >> >> 2023-12-08T15:35:22.938+0200 7f834d0320c0 -1 Bad entry start ptr >> (0x149f140067f) at 0x149f1174595 >> >> 2023-12-08T15:35:22.942+0200 7f834d0320c0 -1 Bad entry start ptr >> (0x149f1400e66) at 0x149f1174d7c >> >> 2023-12-08T15:35:22.954+0200 7f834d0320c0 -1 Bad entry start ptr >> (0x149f1401642) at 0x149f1175558 >> >> 2023-12-08T15:35:22.970+0200 7f834d0320c0 -1 Bad entry start ptr >> (0x149f1401e29) at 0x149f1175d3f >> >> 2023-12-08T15:35:22.974+0200 7f834d0320c0 -1 Bad entry start ptr >> (0x149f1402610) at 0x149f1176526 >> >> 2023-12-08T15:35:22.978+0200 7f834d0320c0 -1 Missing object 200.000527ca >> >> 2023-12-08T15:35:22.978+0200 7f834d0320c0 -1 Missing object 200.000527cb >> >> 2023-12-08T15:35:22.994+0200 7f834d0320c0 -1 Bad entry start ptr >> (0x149f30008f4) at 0x149f2d7480a >> >> 2023-12-08T15:35:22.998+0200 7f834d0320c0 -1 Bad entry start ptr >> (0x149f3000ced) at 0x149f2d74c03 >> >> Overall journal integrity: DAMAGED >> Objects missing: >> 0x527c4 >> 0x527ca >> 0x527cb >> Corrupt regions: >> 0x149f0d73f16-149f1174595 >> 0x149f1174595-149f1174d7c >> 0x149f1174d7c-149f1175558 >> 0x149f1175558-149f1175d3f >> 0x149f1175d3f-149f1176526 >> 0x149f1176526-149f2d7480a >> 0x149f2d7480a-149f2d74c03 >> 0x149f2d74c03-ffffffffffffffff >> >> # cephfs-journal-tool --rank=storage:0 --journal=purge_queue journal inspect >> >> 2023-12-08T15:35:57.691+0200 7f331621e0c0 -1 Missing object 500.00000dc6 >> >> Overall journal integrity: DAMAGED >> Objects missing: >> 0xdc6 >> Corrupt regions: >> 0x3718522e9-ffffffffffffffff >> ---snip--- >> >> A backup isn't possible: >> >> ---snip--- >> # cephfs-journal-tool --rank=storage:0 journal export backup.bin >> 2023-12-08T15:42:07.643+0200 7fde6a24f0c0 -1 Missing object 200.000527c4 >> >> 2023-12-08T15:42:07.659+0200 7fde6a24f0c0 -1 Bad entry start ptr >> (0x149f140067f) at 0x149f1174595 >> >> 2023-12-08T15:42:07.667+0200 7fde6a24f0c0 -1 Bad entry start ptr >> (0x149f1400e66) at 0x149f1174d7c >> >> 2023-12-08T15:42:07.675+0200 7fde6a24f0c0 -1 Bad entry start ptr >> (0x149f1401642) at 0x149f1175558 >> >> 2023-12-08T15:42:07.687+0200 7fde6a24f0c0 -1 Bad entry start ptr >> (0x149f1401e29) at 0x149f1175d3f >> >> 2023-12-08T15:42:07.699+0200 7fde6a24f0c0 -1 Bad entry start ptr >> (0x149f1402610) at 0x149f1176526 >> >> 2023-12-08T15:42:07.699+0200 7fde6a24f0c0 -1 Missing object 200.000527ca >> >> 2023-12-08T15:42:07.699+0200 7fde6a24f0c0 -1 Missing object 200.000527cb >> >> 2023-12-08T15:42:07.707+0200 7fde6a24f0c0 -1 Bad entry start ptr >> (0x149f30008f4) at 0x149f2d7480a >> >> 2023-12-08T15:42:07.707+0200 7fde6a24f0c0 -1 Bad entry start ptr >> (0x149f3000ced) at 0x149f2d74c03 >> >> 2023-12-08T15:42:07.707+0200 7fde6a24f0c0 -1 journal_export: >> Journal not readable, attempt object-by-object dump with `rados` >> >> Error ((5) Input/output error) >> ---snip--- >> >> Does it make sense to continue with the advanced disaster recovery >> [3] bei running (all of) these steps: >> >> cephfs-journal-tool event recover_dentries summary >> cephfs-journal-tool [--rank=N] journal reset >> cephfs-table-tool all reset session >> ceph fs reset <fs name> --yes-i-really-mean-it >> cephfs-table-tool 0 reset session >> cephfs-table-tool 0 reset snap >> cephfs-table-tool 0 reset inode >> cephfs-journal-tool --rank=0 journal reset >> cephfs-data-scan init >> >> Fortunately, I didn't have to run through this procedure too often, >> so I'd appreciate any comments what the best approach would be here. >> >> Thanks! >> Eugen >> >> [3] >> https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#disaster-… >> >> >> Zitat von Eugen Block <eblock(a)nde.ag>: >> >>> I was able to (almost) reproduce the issue in a (Pacific) test >>> cluster. I rebuilt the monmap from the OSDs, brought everything >>> back up, started the mds recovery like described in [1]: >>> >>> ceph fs new <fs_name> <metadata_pool> <data_pool> --force --recover >>> >>> Then I added two mds daemons which went into standby: >>> >>> ---snip--- >>> Started Ceph mds.cephfs.pacific.uexvvq for >>> 1b0afda4-2221-11ee-87be-fa163eed040c. >>> Dez 08 12:51:53 pacific conmon[100493]: debug >>> 2023-12-08T11:51:53.086+0000 7ff5f589b900 0 set uid:gid to >>> 167:167 (ceph:ceph) >>> Dez 08 12:51:53 pacific conmon[100493]: debug >>> 2023-12-08T11:51:53.086+0000 7ff5f589b900 0 ceph version 16.2.14 >>> (238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific (stable), >>> process ceph-md> >>> Dez 08 12:51:53 pacific conmon[100493]: debug >>> 2023-12-08T11:51:53.086+0000 7ff5f589b900 1 main not setting numa >>> affinity >>> Dez 08 12:51:53 pacific conmon[100493]: debug >>> 2023-12-08T11:51:53.086+0000 7ff5f589b900 0 pidfile_write: ignore >>> empty --pid-file >>> Dez 08 12:51:53 pacific conmon[100493]: starting >>> mds.cephfs.pacific.uexvvq at >>> Dez 08 12:51:53 pacific conmon[100493]: debug >>> 2023-12-08T11:51:53.102+0000 7ff5e37be700 1 >>> mds.cephfs.pacific.uexvvq Updating MDS map to version 2 from mon.0 >>> Dez 08 12:51:53 pacific conmon[100493]: debug >>> 2023-12-08T11:51:53.802+0000 7ff5e37be700 1 >>> mds.cephfs.pacific.uexvvq Updating MDS map to version 3 from mon.0 >>> Dez 08 12:51:53 pacific conmon[100493]: debug >>> 2023-12-08T11:51:53.802+0000 7ff5e37be700 1 >>> mds.cephfs.pacific.uexvvq Monitors have assigned me to become a >>> standby. >>> ---snip--- >>> >>> >>> But as soon as I ran >>> >>> pacific:~ # ceph fs set cephfs joinable true >>> cephfs marked joinable; MDS may join as newly active. >>> >>> one MDS daemon became active and the FS is available now. So >>> apparently the "Advanced" steps from [2] usually weren't >>> necessary, but are they in this case? I'm still trying to find an >>> explanation for the purge_queue errors. >>> >>> Zitat von Eugen Block <eblock(a)nde.ag>: >>> >>>> Hi, >>>> >>>> following up on the previous thread (After hardware failure tried >>>> to recover ceph and followed instructions for recovery using >>>> OSDS), we were able to get ceph back into a healthy state >>>> (including the unfound object). Now the CephFS needs to be >>>> recovered and I'm having trouble to fully understand the docs [1] >>>> which the next steps would be. We ran the following which >>>> according to [1] sets the state to existing but failed: >>>> >>>> ceph fs new <fs_name> <metadata_pool> <data_pool> --force --recover >>>> >>>> But how to continue from here? Should we expect an active MDS at >>>> this point or not? Because the "ceph fs status" output still >>>> shows rank 0 as failed. We then tried: >>>> >>>> ceph fs set <fs_name> joinable true >>>> >>>> But apparently it was already joinable, nothing changed. Before >>>> doing anything (destructive) from the advanced options [2] I >>>> wanted to ask the community, how to get on from here. I pasted >>>> the mds logs at the bottom, I'm not really sure if the current >>>> state is expected or not. Apparently, the journal recovers but >>>> the purge_queue does not: >>>> >>>> mds.0.41 Booting: 2: waiting for purge queue recovered >>>> mds.0.journaler.pq(ro) _finish_probe_end write_pos = 14797504512 >>>> (header had 14789452521). recovered. >>>> mds.0.purge_queue operator(): open complete >>>> mds.0.purge_queue operator(): recovering write_pos >>>> monclient: get_auth_request con 0x55c280bc5c00 auth_method 0 >>>> monclient: get_auth_request con 0x55c280ee0c00 auth_method 0 >>>> mds.0.journaler.pq(ro) _finish_read got error -2 >>>> mds.0.purge_queue _recover: Error -2 recovering write_pos >>>> mds.0.purge_queue _go_readonly: going readonly because internal >>>> IO failed: No such file or directory >>>> mds.0.journaler.pq(ro) set_readonly >>>> mds.0.41 unhandled write error (2) No such file or directory, >>>> force readonly... >>>> mds.0.cache force file system read-only >>>> force file system read-only >>>> >>>> Is this expected because the "--recover" flag prevents an active >>>> MDS or not? Before running "ceph mds rmfailed ..." and/or "ceph >>>> fs reset <file system name>" with the --yes-i-really-mean-it flag >>>> I'd like to ask for your input. In which case should we run those >>>> commands? The docs are not really clear to me. Any input is >>>> highly appreciated! >>>> >>>> Thanks! >>>> Eugen >>>> >>>> [1] >>>> https://docs.ceph.com/en/latest/cephfs/recover-fs-after-mon-store-loss/ >>>> [2] >>>> https://docs.ceph.com/en/latest/cephfs/administration/#advanced-cephfs-admi… >>>> >>>> ---snip--- >>>> Dec 07 15:35:48 node02 bash[692598]: debug -90> >>>> 2023-12-07T13:35:47.730+0000 7f4cd855f700 1 >>>> mds.storage.node02.hemalk Updating MDS map to version 41 from mon.0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -89> >>>> 2023-12-07T13:35:47.730+0000 7f4cd855f700 4 mds.0.purge_queue >>>> operator(): data pool 3 not found in OSDMap >>>> Dec 07 15:35:48 node02 bash[692598]: debug -88> >>>> 2023-12-07T13:35:47.730+0000 7f4cd855f700 5 asok(0x55c27fe86000) >>>> register_command objecter_requests hook 0x55c27fe16310 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -87> >>>> 2023-12-07T13:35:47.730+0000 7f4cd855f700 10 monclient: _renew_subs >>>> Dec 07 15:35:48 node02 bash[692598]: debug -86> >>>> 2023-12-07T13:35:47.730+0000 7f4cd855f700 10 monclient: >>>> _send_mon_message to mon.node02 at v2:10.40.99.12:3300/0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -85> >>>> 2023-12-07T13:35:47.730+0000 7f4cd855f700 10 log_channel(cluster) >>>> update_config to_monitors: true to_syslog: false syslog_facility: >>>> prio: info to_graylog: false graylog_host: 127.0.0.1 >>>> graylog_port: 12201) >>>> Dec 07 15:35:48 node02 bash[692598]: debug -84> >>>> 2023-12-07T13:35:47.730+0000 7f4cd855f700 4 mds.0.purge_queue >>>> operator(): data pool 3 not found in OSDMap >>>> Dec 07 15:35:48 node02 bash[692598]: debug -83> >>>> 2023-12-07T13:35:47.730+0000 7f4cd855f700 4 mds.0.0 >>>> apply_blocklist: killed 0, blocklisted sessions (0 blocklist >>>> entries, 0) >>>> Dec 07 15:35:48 node02 bash[692598]: debug -82> >>>> 2023-12-07T13:35:47.730+0000 7f4cd855f700 1 mds.0.41 >>>> handle_mds_map i am now mds.0.41 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -81> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 1 mds.0.41 >>>> handle_mds_map state change up:standby --> up:replay >>>> Dec 07 15:35:48 node02 bash[692598]: debug -80> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 5 >>>> mds.beacon.storage.node02.hemalk set_want_state: up:standby -> >>>> up:replay >>>> Dec 07 15:35:48 node02 bash[692598]: debug -79> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 1 mds.0.41 replay_start >>>> Dec 07 15:35:48 node02 bash[692598]: debug -78> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 2 mds.0.41 Booting: 0: >>>> opening inotable >>>> Dec 07 15:35:48 node02 bash[692598]: debug -77> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient: >>>> _send_mon_message to mon.node02 at v2:10.40.99.12:3300/0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -76> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 2 mds.0.41 Booting: 0: >>>> opening sessionmap >>>> Dec 07 15:35:48 node02 bash[692598]: debug -75> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient: >>>> _send_mon_message to mon.node02 at v2:10.40.99.12:3300/0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -74> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 2 mds.0.41 Booting: 0: >>>> opening mds log >>>> Dec 07 15:35:48 node02 bash[692598]: debug -73> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 5 mds.0.log open >>>> discovering log bounds >>>> Dec 07 15:35:48 node02 bash[692598]: debug -72> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 2 mds.0.41 Booting: 0: >>>> opening purge queue (async) >>>> Dec 07 15:35:48 node02 bash[692598]: debug -71> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 4 mds.0.purge_queue >>>> open: opening >>>> Dec 07 15:35:48 node02 bash[692598]: debug -70> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 1 >>>> mds.0.journaler.pq(ro) recover start >>>> Dec 07 15:35:48 node02 bash[692598]: debug -69> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 1 >>>> mds.0.journaler.pq(ro) read_head >>>> Dec 07 15:35:48 node02 bash[692598]: debug -68> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient: >>>> _send_mon_message to mon.node02 at v2:10.40.99.12:3300/0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -67> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 2 mds.0.41 Booting: 0: >>>> loading open file table (async) >>>> Dec 07 15:35:48 node02 bash[692598]: debug -66> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient: >>>> _send_mon_message to mon.node02 at v2:10.40.99.12:3300/0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -65> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 2 mds.0.41 Booting: 0: >>>> opening snap table >>>> Dec 07 15:35:48 node02 bash[692598]: debug -64> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient: >>>> _send_mon_message to mon.node02 at v2:10.40.99.12:3300/0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -63> >>>> 2023-12-07T13:35:47.734+0000 7f4cd1d52700 4 mds.0.journalpointer >>>> Reading journal pointer '400.00000000' >>>> Dec 07 15:35:48 node02 bash[692598]: debug -62> >>>> 2023-12-07T13:35:47.734+0000 7f4cd1d52700 10 monclient: >>>> _send_mon_message to mon.node02 at v2:10.40.99.12:3300/0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -61> >>>> 2023-12-07T13:35:47.734+0000 7f4cd4557700 2 mds.0.cache Memory >>>> usage: total 316452, rss 43088, heap 198940, baseline 198940, 0 >>>> / 0 inodes have caps, 0 caps, 0 caps per inode >>>> Dec 07 15:35:48 node02 bash[692598]: debug -60> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient: _renew_subs >>>> Dec 07 15:35:48 node02 bash[692598]: debug -59> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient: >>>> _send_mon_message to mon.node02 at v2:10.40.99.12:3300/0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -58> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient: >>>> handle_get_version_reply finishing 1 version 10835 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -57> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient: >>>> handle_get_version_reply finishing 2 version 10835 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -56> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient: >>>> handle_get_version_reply finishing 3 version 10835 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -55> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient: >>>> handle_get_version_reply finishing 4 version 10835 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -54> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient: >>>> handle_get_version_reply finishing 5 version 10835 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -53> >>>> 2023-12-07T13:35:47.734+0000 7f4cd855f700 10 monclient: >>>> handle_get_version_reply finishing 6 version 10835 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -52> >>>> 2023-12-07T13:35:47.734+0000 7f4cdb565700 10 monclient: >>>> get_auth_request con 0x55c280bc5800 auth_method 0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -51> >>>> 2023-12-07T13:35:47.734+0000 7f4cdbd66700 10 monclient: >>>> get_auth_request con 0x55c280dc6800 auth_method 0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -50> >>>> 2023-12-07T13:35:47.734+0000 7f4cdad64700 10 monclient: >>>> get_auth_request con 0x55c280dc7800 auth_method 0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -49> >>>> 2023-12-07T13:35:47.734+0000 7f4cd3555700 1 >>>> mds.0.journaler.pq(ro) _finish_read_head loghead(trim >>>> 14789115904, expire 14789452521, write 14789452521, stream_format >>>> 1). probing for end of log (from 14789452521)... >>>> Dec 07 15:35:48 node02 bash[692598]: debug -48> >>>> 2023-12-07T13:35:47.734+0000 7f4cd3555700 1 >>>> mds.0.journaler.pq(ro) probing for end of the log >>>> Dec 07 15:35:48 node02 bash[692598]: debug -47> >>>> 2023-12-07T13:35:47.738+0000 7f4cd1d52700 1 >>>> mds.0.journaler.mdlog(ro) recover start >>>> Dec 07 15:35:48 node02 bash[692598]: debug -46> >>>> 2023-12-07T13:35:47.738+0000 7f4cd1d52700 1 >>>> mds.0.journaler.mdlog(ro) read_head >>>> Dec 07 15:35:48 node02 bash[692598]: debug -45> >>>> 2023-12-07T13:35:47.738+0000 7f4cd1d52700 4 mds.0.log Waiting >>>> for journal 0x200 to recover... >>>> Dec 07 15:35:48 node02 bash[692598]: debug -44> >>>> 2023-12-07T13:35:47.738+0000 7f4cdbd66700 10 monclient: >>>> get_auth_request con 0x55c280dc7c00 auth_method 0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -43> >>>> 2023-12-07T13:35:47.738+0000 7f4cd2553700 1 >>>> mds.0.journaler.mdlog(ro) _finish_read_head loghead(trim >>>> 1416940748800, expire 1416947000701, write 1417125359769, >>>> stream_format 1). probing for end of log (from 1417125359769)... >>>> Dec 07 15:35:48 node02 bash[692598]: debug -42> >>>> 2023-12-07T13:35:47.738+0000 7f4cd2553700 1 >>>> mds.0.journaler.mdlog(ro) probing for end of the log >>>> Dec 07 15:35:48 node02 bash[692598]: debug -41> >>>> 2023-12-07T13:35:47.738+0000 7f4cdb565700 10 monclient: >>>> get_auth_request con 0x55c280e2fc00 auth_method 0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -40> >>>> 2023-12-07T13:35:47.738+0000 7f4cdad64700 10 monclient: >>>> get_auth_request con 0x55c280ee0400 auth_method 0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -39> >>>> 2023-12-07T13:35:47.738+0000 7f4cd2553700 1 >>>> mds.0.journaler.mdlog(ro) _finish_probe_end write_pos = >>>> 1417129492480 (header had 1417125359769). recovered. >>>> Dec 07 15:35:48 node02 bash[692598]: debug -38> >>>> 2023-12-07T13:35:47.738+0000 7f4cd1d52700 4 mds.0.log Journal >>>> 0x200 recovered. >>>> Dec 07 15:35:48 node02 bash[692598]: debug -37> >>>> 2023-12-07T13:35:47.738+0000 7f4cd1d52700 4 mds.0.log Recovered >>>> journal 0x200 in format 1 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -36> >>>> 2023-12-07T13:35:47.738+0000 7f4cd1d52700 2 mds.0.41 Booting: 1: >>>> loading/discovering base inodes >>>> Dec 07 15:35:48 node02 bash[692598]: debug -35> >>>> 2023-12-07T13:35:47.738+0000 7f4cd1d52700 0 mds.0.cache creating >>>> system inode with ino:0x100 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -34> >>>> 2023-12-07T13:35:47.738+0000 7f4cd1d52700 0 mds.0.cache creating >>>> system inode with ino:0x1 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -33> >>>> 2023-12-07T13:35:47.742+0000 7f4cdbd66700 10 monclient: >>>> get_auth_request con 0x55c280dc7400 auth_method 0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -32> >>>> 2023-12-07T13:35:47.742+0000 7f4cd2553700 2 mds.0.41 Booting: 2: >>>> replaying mds log >>>> Dec 07 15:35:48 node02 bash[692598]: debug -31> >>>> 2023-12-07T13:35:47.742+0000 7f4cd2553700 2 mds.0.41 Booting: 2: >>>> waiting for purge queue recovered >>>> Dec 07 15:35:48 node02 bash[692598]: debug -30> >>>> 2023-12-07T13:35:47.742+0000 7f4cd3555700 1 >>>> mds.0.journaler.pq(ro) _finish_probe_end write_pos = 14797504512 >>>> (header had 14789452521). recovered. >>>> Dec 07 15:35:48 node02 bash[692598]: debug -29> >>>> 2023-12-07T13:35:47.742+0000 7f4cd3555700 4 mds.0.purge_queue >>>> operator(): open complete >>>> Dec 07 15:35:48 node02 bash[692598]: debug -28> >>>> 2023-12-07T13:35:47.742+0000 7f4cd3555700 4 mds.0.purge_queue >>>> operator(): recovering write_pos >>>> Dec 07 15:35:48 node02 bash[692598]: debug -27> >>>> 2023-12-07T13:35:47.742+0000 7f4cdb565700 10 monclient: >>>> get_auth_request con 0x55c280bc5c00 auth_method 0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -26> >>>> 2023-12-07T13:35:47.742+0000 7f4cdad64700 10 monclient: >>>> get_auth_request con 0x55c280ee0c00 auth_method 0 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -25> >>>> 2023-12-07T13:35:47.746+0000 7f4cd3555700 0 >>>> mds.0.journaler.pq(ro) _finish_read got error -2 >>>> Dec 07 15:35:48 node02 bash[692598]: debug -24> >>>> 2023-12-07T13:35:47.746+0000 7f4cd3555700 -1 mds.0.purge_queue >>>> _recover: Error -2 recovering write_pos >>>> Dec 07 15:35:48 node02 bash[692598]: debug -23> >>>> 2023-12-07T13:35:47.746+0000 7f4cd3555700 1 mds.0.purge_queue >>>> _go_readonly: going readonly because internal IO failed: No such >>>> file or directory >>>> Dec 07 15:35:48 node02 bash[692598]: debug -22> >>>> 2023-12-07T13:35:47.746+0000 7f4cd3555700 1 >>>> mds.0.journaler.pq(ro) set_readonly >>>> Dec 07 15:35:48 node02 bash[692598]: debug -21> >>>> 2023-12-07T13:35:47.746+0000 7f4cd3555700 -1 mds.0.41 unhandled >>>> write error (2) No such file or directory, force readonly... >>>> Dec 07 15:35:48 node02 bash[692598]: debug -20> >>>> 2023-12-07T13:35:47.746+0000 7f4cd3555700 1 mds.0.cache force >>>> file system read-only >>>> Dec 07 15:35:48 node02 bash[692598]: debug -19> >>>> 2023-12-07T13:35:47.746+0000 7f4cd3555700 0 log_channel(cluster) >>>> log [WRN] : force file system read-only >>>> ---snip---

4 months, 2 weeks

2
1
0 0

2024

2023

2022

2021

2020

2019

Dev December 2023