May 2021 - ceph-users - lists.ceph.io

by Andres Rojas Guerrero

Hi all, I have observed that in a Nautilus (14.2.6) cluster the mds process in the MDS server is consuming a large amount of memory, for example in a MDS server with 128 GB of RAM I have observed the mds process it's consuming ~ 80 GB: ceph 20 0 78,8g 77,1g 13772 S 4,0 61,5 28:37.62 ceph-mds Sometimes I have observed consumption of 127 GB, so the MDS starts to give slow request errors. In theory in the configuration I have limited the mds_cache_memory_limit to a 64 GB: [mds] mds_cache_memory_limit = 68719476736 client_oc_size = 104857600 mds_min_caps_per_client = 200 mds_log_max_expiring = 200 mds_log_max_segments = 250 What could be the reason for this excessive memory consumption? -- ******************************************************* Andrés Rojas Guerrero Unidad Sistemas Linux Area Arquitectura Tecnológica Secretaría General Adjunta de Informática Consejo Superior de Investigaciones Científicas (CSIC) Pinar 19 28006 - Madrid Tel: +34 915680059 -- Ext. 990059 email: a.rojas(a)csic.es ID comunicate.csic.es: @50852720l:matrix.csic.es *******************************************************

2 years, 11 months

1
0
0 0

Pool has been deleted before snaptrim finished

by Szabo, Istvan (Agoda)

Hi, We decided to delete the pool before the snaptrim finished after 4 days waiting. Now we have bigger issue, many osd started to flap, 2 of them cannot even restart due after. Did some bluestore fsck on the not started osds and has many messages like this inside: 2021-05-17 18:37:07.176203 7f416d20bec0 10 stupidalloc 0x0x564e4e804f50 init_add_free 0x482d0778000~4000 2021-05-17 18:37:07.176204 7f416d20bec0 10 freelist enumerate_next 0x482d0784000~4000 2021-05-17 18:37:07.176204 7f416d20bec0 10 stupidalloc 0x0x564e4e804f50 init_add_free 0x482d0784000~4000 2021-05-17 18:37:07.176205 7f416d20bec0 10 freelist enumerate_next 0x482d078c000~c000 2021-05-17 18:37:07.176206 7f416d20bec0 10 stupidalloc 0x0x564e4e804f50 init_add_free 0x482d078c000~c000 [root@hk-cephosd-2002 ~]# tail -f /tmp/ceph-osd-44-fsck.log 2021-05-17 18:39:16.466967 7f416d20bec0 20 bluefs _read_random read buffered 0x2cd6e8f~ed6 of 1:0x372e0700000+4200000 2021-05-17 18:39:16.467154 7f416d20bec0 20 bluefs _read_random got 3798 2021-05-17 18:39:16.467179 7f416d20bec0 10 bluefs _read_random h 0x564e4e658500 0x24d6e35~ee2 from file(ino 216551 size 0x43a382d mtime 2021-05-17 13:21:19.839668 bdev 1 allocated 4400000 extents [1:0x35bc7c00000+4400000]) 2021-05-17 18:39:16.467186 7f416d20bec0 20 bluefs _read_random read buffered 0x24d6e35~ee2 of 1:0x35bc7c00000+4400000 2021-05-17 18:39:16.467409 7f416d20bec0 20 bluefs _read_random got 3810 and uh oh, missing shared_blob I've set back buffered_io to false back because when restart the osds always had to wait to fix degraded pgs. Many of the SSDs are smashing at the moment on 100% and don't really know what to do to stop the process and bring back the 2 ssds :/ Some paste: https://justpaste.it/9bj3a Some metric (each column is 1 server metric, total 3 servers): How it is smashing the ssds: https://i.ibb.co/x3xm0Rj/ssds.png IOWAIT Super high due to ssd utilization: https://i.ibb.co/683TR9y/iowait.png Capacity seems coming back: https://i.ibb.co/mz4Lq2r/space.png Thank you the help. ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

2 years, 11 months

2
3
0 0

MDS rank 0 damaged after update to 14.2.20

by Eugen Block

Hi *, I tried a minor update (14.2.9 --> 14.2.20) on our ceph cluster today and got into a damaged CephFS. It's rather urgent since noone can really work right now, so any quick help is highly appreciated. As for the update process I followed the usual update procedure, when all MONs were finished I started to restart the OSDs, but suddenly our cephfs got unresponsive (and still is). I believe these lines are the critical ones: ---snap--- -12> 2021-05-18 09:53:01.488 7f7e9ed82700 5 mds.beacon.mds01 received beacon reply up:replay seq 906 rtt 0 -11> 2021-05-18 09:53:01.624 7f7e9f583700 10 monclient: get_auth_request con 0x5608a5171600 auth_method 0 -10> 2021-05-18 09:53:03.732 7f7e94d6e700 -1 mds.0.journaler.mdlog(ro) try_read_entry: decode error from _is_readable -9> 2021-05-18 09:53:03.732 7f7e94d6e700 0 mds.0.log _replay journaler got error -22, aborting -8> 2021-05-18 09:53:03.732 7f7e94d6e700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 0: (22) Invalid argument -7> 2021-05-18 09:53:03.732 7f7e94d6e700 5 mds.beacon.mds01 set_want_state: up:replay -> down:damaged -6> 2021-05-18 09:53:03.732 7f7e94d6e700 10 log_client log_queue is 1 last_log 1 sent 0 num 1 unsent 1 sending 1 -5> 2021-05-18 09:53:03.732 7f7e94d6e700 10 log_client will send 2021-05-18 09:53:03.735824 mds.mds01 (mds.0) 1 : cluster [ERR] Error loading MDS rank 0: (22) Invalid argument -4> 2021-05-18 09:53:03.732 7f7e94d6e700 10 monclient: _send_mon_message to mon.ceph01 at v2:XXX.XXX.XXX.XXX:3300/0 -3> 2021-05-18 09:53:03.732 7f7e94d6e700 5 mds.beacon.mds01 Sending beacon down:damaged seq 907 -2> 2021-05-18 09:53:03.732 7f7e94d6e700 10 monclient: _send_mon_message to mon.ceph01 at v2:XXX.XXX.XXX.XXX:3300/0 -1> 2021-05-18 09:53:03.908 7f7e9ed82700 5 mds.beacon.mds01 received beacon reply down:damaged seq 907 rtt 0.176001 0> 2021-05-18 09:53:03.908 7f7e94d6e700 1 mds.mds01 respawn! ---snap--- These logs are from the attempt to bring the mds rank back up with ceph mds repaired 0 I attached a longer excerpt of the log files if it helps. Before trying anything from the disaster recovery steps I'd like to ask for your input since one can damage it even more. The current status is below, please let me know if more information is required. Thanks! Eugen ceph01:~ # ceph -s cluster: id: 655cb05a-435a-41ba-83d9-8549f7c36167 health: HEALTH_ERR 1 filesystem is degraded 1 filesystem is offline 1 mds daemon damaged noout flag(s) set Some pool(s) have the nodeep-scrub flag(s) set services: mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 116m) mgr: ceph03(active, since 118m), standbys: ceph02, ceph01 mds: cephfs:0/1 3 up:standby, 1 damaged osd: 32 osds: 32 up (since 64m), 32 in (since 8w) flags noout data: pools: 14 pools, 512 pgs objects: 5.08M objects, 8.6 TiB usage: 27 TiB used, 33 TiB / 59 TiB avail pgs: 512 active+clean

2 years, 11 months

2
4
0 0

Re: Process for adding a separate block.db to an osd

by Boris Behrens

Hi, sorry for replying to this old thread: I tried to add a block.db to an OSD but now the OSD can not start with the error: Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -7> 2021-05-17 09:50:38.362 7fc48ec94a80 -1 rocksdb: Corruption: CURRENT file does not end with newline Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -6> 2021-05-17 09:50:38.362 7fc48ec94a80 -1 bluestore(/var/lib/ceph/osd/ceph-68) _open_db erroring opening db: Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -1> 2021-05-17 09:50:38.866 7fc48ec94a80 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.21/rpm/el7/BUILD/ceph-14.2.21/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_upgrade_super()' thread 7fc48ec94a80 time 2021-05-17 09:50:38.865204 Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.21/rpm/el7/BUILD/ceph-14.2.21/src/os/bluestore/BlueStore.cc: 10647: FAILED ceph_assert(ondisk_format > 0) I tried to run an fsck/repair on the disk: [root@s3db10 osd]# ceph-bluestore-tool --path ceph-68 repair 2021-05-17 10:05:25.695 7f714dea3ec0 -1 rocksdb: Corruption: CURRENT file does not end with newline 2021-05-17 10:05:25.695 7f714dea3ec0 -1 bluestore(ceph-68) _open_db erroring opening db: error from fsck: (5) Input/output error [root@s3db10 osd]# ceph-bluestore-tool --path ceph-68 fsck 2021-05-17 10:05:35.012 7fb8f22e6ec0 -1 rocksdb: Corruption: CURRENT file does not end with newline 2021-05-17 10:05:35.012 7fb8f22e6ec0 -1 bluestore(ceph-68) _open_db erroring opening db: error from fsck: (5) Input/output error These are the steps I did to add the disk: $ CEPH_ARGS="--bluestore-block-db-size 53687091200 --bluestore_block_db_create=true" ceph-bluestore-tool bluefs-bdev-new-db --path /var/lib/ceph/osd/ceph-68 --dev-target /dev/sdj1 $ chown -h ceph:ceph /var/lib/ceph/osd/ceph-68/block.db $ lvchange --addtag ceph.db_device=/dev/sdj1 /dev/ceph-3bbfd168-2a54-4593-a037-80d0d7e97afd/osd-block-aaeaea54-eb6a-480c-b2fd-d938e336c0f6 $ lvchange --addtag ceph.db_uuid=463dd37c-fd49-4ccb-849f-c5827d3d9df2 /dev/ceph-3bbfd168-2a54-4593-a037-80d0d7e97afd/osd-block-aaeaea54-eb6a-480c-b2fd-d938e336c0f6 $ ceph-volume lvm activate --all The UUIDs later I tried this: $ ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-68 --devs-source /var/lib/ceph/osd/ceph-68/block --dev-target /var/lib/ceph/osd/ceph-68/block.db bluefs-bdev-migrate Any ideas how I can get the rocksdb fixed? -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groÃƒ¼en Saal.

2 years, 11 months

2
13
0 0

Ceph increase RBD Pool Size not change

by codignotto

I am increasing the size of my POOL from 24TiB to 28TiB, I execute the change via the ceph portal, it shows that everything is ok but it does not change the real value, it remains in the same 24TiB, would it be a Bug? I was managing to increase up to 24TiB, I can increase it to 34Tb q is the size of the cluster with replica 2 but it is not running . Any idea ? Tks

2 years, 11 months

1
0
0 0

image + snapshot remove

by Szabo, Istvan (Agoda)

Hi, Pool has been deleted in which was an image that related to another pools and now can't remove the image due to it has snapshot. Can't list snapshot, can't purge, can't flattened can't really do anything. Is it possible to remove the image without pool delete? 021-05-18 14:55:26.710 7f8bef381c80 -1 librbd::image::PreRemoveRequest: 0x56301203ac00 check_image_snaps: image has snapshots - not removing Removing image: 0% complete...failed. 2021-05-18 14:55:26.722 7f8bccff9700 -1 librbd::image::RefreshRequest: failed to refresh parent image: (2) No such file or directory 2021-05-18 14:55:26.723 7f8bc7fff700 -1 librbd::image::OpenRequest: failed to refresh image: (2) No such file or directory rbd: error opening image clone_prod_financedb_lv1_20210513_160225: (2) No such file or directory rbd: image has snapshots with linked clones - these must be deleted or flattened before the image can be removed. Or we need to find the original image and upload back? Thx ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

2 years, 11 months

1
0
0 0

logrotation in ceph 16.2.4

by Fabrice Bacchella

I have a ceph cluster with 6 OSD servers. 2 are running 16.2.4 and logrotate failed with this message: etc/cron.daily/logrotate: error: Compressing program wrote following message to stderr when compressing log /var/log/ceph/ceph-osd.37.log-20210518: gzip: stdin: file size changed while zipping The four 4 16.2.3 OSD servers went fine. The /etc/logrotate.d/ceph file is the same between versions. I think there might be a bug with the handling of signal SIGHUP (-1)

2 years, 11 months

1
0
0 0

libceph: get_reply osd2 tid 1459933 data 3248128 > preallocated 131072, skipping

by Markus Kienast

I am seeing these messages when booting from RBD and booting hangs there. libceph: get_reply osd2 tid 1459933 data 3248128 > preallocated 131072, skipping However, Ceph Health is OK, so I have no idea what is going on. I reboot my 3 node cluster and it works again for about two weeks. How can I find out more about this issue, how can I dig deeper? Also there has been at least one report about this issue before on this mailing list - "[ceph-users] Strange Data Issue - Unexpected client hang on OSD I/O Error" - but no solution has been presented. This report was from 2018, so no idea if this is still an issue for Dyweni the original reporter. If you read this, I would be happy to hear how you solved the problem. Cheers Markus

2 years, 11 months

4
14
0 0

After a huge amount of snaphot delete many snaptrim+snaptrim_wait pgs

by Szabo, Istvan (Agoda)

Hi, The user deleted 20-30 snapshots and clones from the cluster and it seems like slows down the whole system. I’ve set the snaptrim parameters to the lowest as possible, set bufferred_io to true so at least have some speed for the user, but I can see the objects removal from the cluster is still happening, the beginning was 45 millioms, now 19millions but what I don’t understand many osds getting more full :( ? And the snaptrim is super slow, have 195 snaptrim wait and 36 snaptrim but in every 5 hours only 1 done :/? What can I do? One of the osd has been 62%, now it is 75% in 2 days and still growing. Set back the snap options or? The cluster has 3 servers, running on luminous 12.2.8. Some paste: https://jpst.it/2vw4H Thank you ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

2 years, 11 months

2
2
0 0

RGW: Multiple Site does not sync olds data

by 特木勒

Hi all: ceph version: 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) I have a strange question, I just create a multiple site for Ceph cluster. But I notice the old data of source cluster is not synced. Only new data will be synced into second zone cluster. Is there anything I need to do to enable full sync for bucket or this is a bug? Thanks

2 years, 11 months

3
17
0 0

2024

2023

2022

2021

2020

2019

ceph-users May 2021