Hi all, I have observed that in a Nautilus (14.2.6) cluster the mds
process in the MDS server is consuming a large amount of memory, for
example in a MDS server with 128 GB of RAM I have observed the mds
process it's consuming ~ 80 GB:
ceph 20 0 78,8g 77,1g 13772 S 4,0 61,5 28:37.62 ceph-mds
Sometimes I have observed consumption of 127 GB, so the MDS starts to
give slow request errors. In theory in the configuration I have limited
the mds_cache_memory_limit to a 64 GB:
[mds]
mds_cache_memory_limit = 68719476736
client_oc_size = 104857600
mds_min_caps_per_client = 200
mds_log_max_expiring = 200
mds_log_max_segments = 250
What could be the reason for this excessive memory consumption?
--
*******************************************************
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.rojas(a)csic.es
ID comunicate.csic.es: @50852720l:matrix.csic.es
*******************************************************
Hi,
We decided to delete the pool before the snaptrim finished after 4 days waiting.
Now we have bigger issue, many osd started to flap, 2 of them cannot even restart due after.
Did some bluestore fsck on the not started osds and has many messages like this inside:
2021-05-17 18:37:07.176203 7f416d20bec0 10 stupidalloc 0x0x564e4e804f50 init_add_free 0x482d0778000~4000
2021-05-17 18:37:07.176204 7f416d20bec0 10 freelist enumerate_next 0x482d0784000~4000
2021-05-17 18:37:07.176204 7f416d20bec0 10 stupidalloc 0x0x564e4e804f50 init_add_free 0x482d0784000~4000
2021-05-17 18:37:07.176205 7f416d20bec0 10 freelist enumerate_next 0x482d078c000~c000
2021-05-17 18:37:07.176206 7f416d20bec0 10 stupidalloc 0x0x564e4e804f50 init_add_free 0x482d078c000~c000
[root@hk-cephosd-2002 ~]# tail -f /tmp/ceph-osd-44-fsck.log
2021-05-17 18:39:16.466967 7f416d20bec0 20 bluefs _read_random read buffered 0x2cd6e8f~ed6 of 1:0x372e0700000+4200000
2021-05-17 18:39:16.467154 7f416d20bec0 20 bluefs _read_random got 3798
2021-05-17 18:39:16.467179 7f416d20bec0 10 bluefs _read_random h 0x564e4e658500 0x24d6e35~ee2 from file(ino 216551 size 0x43a382d mtime 2021-05-17 13:21:19.839668 bdev 1 allocated 4400000 extents [1:0x35bc7c00000+4400000])
2021-05-17 18:39:16.467186 7f416d20bec0 20 bluefs _read_random read buffered 0x24d6e35~ee2 of 1:0x35bc7c00000+4400000
2021-05-17 18:39:16.467409 7f416d20bec0 20 bluefs _read_random got 3810
and
uh oh, missing shared_blob
I've set back buffered_io to false back because when restart the osds always had to wait to fix degraded pgs.
Many of the SSDs are smashing at the moment on 100% and don't really know what to do to stop the process and bring back the 2 ssds :/
Some paste: https://justpaste.it/9bj3a
Some metric (each column is 1 server metric, total 3 servers):
How it is smashing the ssds: https://i.ibb.co/x3xm0Rj/ssds.png
IOWAIT Super high due to ssd utilization: https://i.ibb.co/683TR9y/iowait.png
Capacity seems coming back: https://i.ibb.co/mz4Lq2r/space.png
Thank you the help.
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Hi *,
I tried a minor update (14.2.9 --> 14.2.20) on our ceph cluster today
and got into a damaged CephFS. It's rather urgent since noone can
really work right now, so any quick help is highly appreciated.
As for the update process I followed the usual update procedure, when
all MONs were finished I started to restart the OSDs, but suddenly our
cephfs got unresponsive (and still is).
I believe these lines are the critical ones:
---snap---
-12> 2021-05-18 09:53:01.488 7f7e9ed82700 5 mds.beacon.mds01
received beacon reply up:replay seq 906 rtt 0
-11> 2021-05-18 09:53:01.624 7f7e9f583700 10 monclient:
get_auth_request con 0x5608a5171600 auth_method 0
-10> 2021-05-18 09:53:03.732 7f7e94d6e700 -1
mds.0.journaler.mdlog(ro) try_read_entry: decode error from _is_readable
-9> 2021-05-18 09:53:03.732 7f7e94d6e700 0 mds.0.log _replay
journaler got error -22, aborting
-8> 2021-05-18 09:53:03.732 7f7e94d6e700 -1 log_channel(cluster)
log [ERR] : Error loading MDS rank 0: (22) Invalid argument
-7> 2021-05-18 09:53:03.732 7f7e94d6e700 5 mds.beacon.mds01
set_want_state: up:replay -> down:damaged
-6> 2021-05-18 09:53:03.732 7f7e94d6e700 10 log_client log_queue
is 1 last_log 1 sent 0 num 1 unsent 1 sending 1
-5> 2021-05-18 09:53:03.732 7f7e94d6e700 10 log_client will send
2021-05-18 09:53:03.735824 mds.mds01 (mds.0) 1 : cluster [ERR] Error
loading MDS rank 0: (22) Invalid argument
-4> 2021-05-18 09:53:03.732 7f7e94d6e700 10 monclient:
_send_mon_message to mon.ceph01 at v2:XXX.XXX.XXX.XXX:3300/0
-3> 2021-05-18 09:53:03.732 7f7e94d6e700 5 mds.beacon.mds01
Sending beacon down:damaged seq 907
-2> 2021-05-18 09:53:03.732 7f7e94d6e700 10 monclient:
_send_mon_message to mon.ceph01 at v2:XXX.XXX.XXX.XXX:3300/0
-1> 2021-05-18 09:53:03.908 7f7e9ed82700 5 mds.beacon.mds01
received beacon reply down:damaged seq 907 rtt 0.176001
0> 2021-05-18 09:53:03.908 7f7e94d6e700 1 mds.mds01 respawn!
---snap---
These logs are from the attempt to bring the mds rank back up with
ceph mds repaired 0
I attached a longer excerpt of the log files if it helps. Before
trying anything from the disaster recovery steps I'd like to ask for
your input since one can damage it even more. The current status is
below, please let me know if more information is required.
Thanks!
Eugen
ceph01:~ # ceph -s
cluster:
id: 655cb05a-435a-41ba-83d9-8549f7c36167
health: HEALTH_ERR
1 filesystem is degraded
1 filesystem is offline
1 mds daemon damaged
noout flag(s) set
Some pool(s) have the nodeep-scrub flag(s) set
services:
mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 116m)
mgr: ceph03(active, since 118m), standbys: ceph02, ceph01
mds: cephfs:0/1 3 up:standby, 1 damaged
osd: 32 osds: 32 up (since 64m), 32 in (since 8w)
flags noout
data:
pools: 14 pools, 512 pgs
objects: 5.08M objects, 8.6 TiB
usage: 27 TiB used, 33 TiB / 59 TiB avail
pgs: 512 active+clean
Hi,
sorry for replying to this old thread:
I tried to add a block.db to an OSD but now the OSD can not start with the
error:
Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -7> 2021-05-17
09:50:38.362 7fc48ec94a80 -1 rocksdb: Corruption: CURRENT file does not end
with newline
Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -6> 2021-05-17
09:50:38.362 7fc48ec94a80 -1 bluestore(/var/lib/ceph/osd/ceph-68) _open_db
erroring opening db:
Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -1> 2021-05-17
09:50:38.866 7fc48ec94a80 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.21/rpm/el7/BUILD/ceph-14.2.21/src/os/bluestore/BlueStore.cc:
In function 'int BlueStore::_upgrade_super()' thread 7fc48ec94a80 time
2021-05-17 09:50:38.865204
Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.21/rpm/el7/BUILD/ceph-14.2.21/src/os/bluestore/BlueStore.cc:
10647: FAILED ceph_assert(ondisk_format > 0)
I tried to run an fsck/repair on the disk:
[root@s3db10 osd]# ceph-bluestore-tool --path ceph-68 repair
2021-05-17 10:05:25.695 7f714dea3ec0 -1 rocksdb: Corruption: CURRENT file
does not end with newline
2021-05-17 10:05:25.695 7f714dea3ec0 -1 bluestore(ceph-68) _open_db
erroring opening db:
error from fsck: (5) Input/output error
[root@s3db10 osd]# ceph-bluestore-tool --path ceph-68 fsck
2021-05-17 10:05:35.012 7fb8f22e6ec0 -1 rocksdb: Corruption: CURRENT file
does not end with newline
2021-05-17 10:05:35.012 7fb8f22e6ec0 -1 bluestore(ceph-68) _open_db
erroring opening db:
error from fsck: (5) Input/output error
These are the steps I did to add the disk:
$ CEPH_ARGS="--bluestore-block-db-size 53687091200
--bluestore_block_db_create=true" ceph-bluestore-tool bluefs-bdev-new-db
--path /var/lib/ceph/osd/ceph-68 --dev-target /dev/sdj1
$ chown -h ceph:ceph /var/lib/ceph/osd/ceph-68/block.db
$ lvchange --addtag ceph.db_device=/dev/sdj1
/dev/ceph-3bbfd168-2a54-4593-a037-80d0d7e97afd/osd-block-aaeaea54-eb6a-480c-b2fd-d938e336c0f6
$ lvchange --addtag ceph.db_uuid=463dd37c-fd49-4ccb-849f-c5827d3d9df2
/dev/ceph-3bbfd168-2a54-4593-a037-80d0d7e97afd/osd-block-aaeaea54-eb6a-480c-b2fd-d938e336c0f6
$ ceph-volume lvm activate --all
The UUIDs
later I tried this:
$ ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-68 --devs-source
/var/lib/ceph/osd/ceph-68/block --dev-target
/var/lib/ceph/osd/ceph-68/block.db bluefs-bdev-migrate
Any ideas how I can get the rocksdb fixed?
--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
I am increasing the size of my POOL from 24TiB to 28TiB, I execute the
change via the ceph portal, it shows that everything is ok but it does not
change the real value, it remains in the same 24TiB, would it be a Bug? I
was managing to increase up to 24TiB, I can increase it to 34Tb q is the
size of the cluster with replica 2 but it is not running .
Any idea ?
Tks
Hi,
Pool has been deleted in which was an image that related to another pools and now can't remove the image due to it has snapshot. Can't list snapshot, can't purge, can't flattened can't really do anything.
Is it possible to remove the image without pool delete?
021-05-18 14:55:26.710 7f8bef381c80 -1 librbd::image::PreRemoveRequest: 0x56301203ac00 check_image_snaps: image has snapshots - not removing
Removing image: 0% complete...failed.
2021-05-18 14:55:26.722 7f8bccff9700 -1 librbd::image::RefreshRequest: failed to refresh parent image: (2) No such file or directory
2021-05-18 14:55:26.723 7f8bc7fff700 -1 librbd::image::OpenRequest: failed to refresh image: (2) No such file or directory
rbd: error opening image clone_prod_financedb_lv1_20210513_160225: (2) No such file or directory
rbd: image has snapshots with linked clones - these must be deleted or flattened before the image can be removed.
Or we need to find the original image and upload back?
Thx
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
I have a ceph cluster with 6 OSD servers.
2 are running 16.2.4 and logrotate failed with this message:
etc/cron.daily/logrotate:
error: Compressing program wrote following message to stderr when compressing log /var/log/ceph/ceph-osd.37.log-20210518:
gzip: stdin: file size changed while zipping
The four 4 16.2.3 OSD servers went fine.
The /etc/logrotate.d/ceph file is the same between versions. I think there might be a bug with the handling of signal SIGHUP (-1)
I am seeing these messages when booting from RBD and booting hangs there.
libceph: get_reply osd2 tid 1459933 data 3248128 > preallocated
131072, skipping
However, Ceph Health is OK, so I have no idea what is going on. I
reboot my 3 node cluster and it works again for about two weeks.
How can I find out more about this issue, how can I dig deeper? Also
there has been at least one report about this issue before on this
mailing list - "[ceph-users] Strange Data Issue - Unexpected client
hang on OSD I/O Error" - but no solution has been presented.
This report was from 2018, so no idea if this is still an issue for
Dyweni the original reporter. If you read this, I would be happy to
hear how you solved the problem.
Cheers
Markus
Hi,
The user deleted 20-30 snapshots and clones from the cluster and it seems like slows down the whole system.
I’ve set the snaptrim parameters to the lowest as possible, set bufferred_io to true so at least have some speed for the user, but I can see the objects removal from the cluster is still happening, the beginning was 45 millioms, now 19millions but what I don’t understand many osds getting more full :( ?
And the snaptrim is super slow, have 195 snaptrim wait and 36 snaptrim but in every 5 hours only 1 done :/?
What can I do? One of the osd has been 62%, now it is 75% in 2 days and still growing. Set back the snap options or?
The cluster has 3 servers, running on luminous 12.2.8.
Some paste: https://jpst.it/2vw4H
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Hi all:
ceph version: 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8)
I have a strange question, I just create a multiple site for Ceph cluster.
But I notice the old data of source cluster is not synced. Only new data
will be synced into second zone cluster.
Is there anything I need to do to enable full sync for bucket or this is a
bug?
Thanks