February 2020 - ceph-users

by Frank Schilder

I recently upgraded from 13.2.2 to 13.2.8 and observe two changes that I struggle with: - from release notes: The bluestore_cache_* options are no longer needed. They are replaced by osd_memory_target, defaulting to 4GB. - the default for bluestore_allocator has changed from stupid to bitmap, which seem to conflict each other, or at least I seem unable to achieve what I want. I have a number of OSDs for which I would like to increase the cache size. In the past I used bluestore_cache_size=8G and it worked like a charm. I now changed that to osd_memory_target=8G without any effect. The usage stays at 4G and the virtual size is about 5G. I would expect both to be close to 8G. The read cache for these OSDs usually fills up within a few hours. The cluster is now running a few days with the new configs to no avail. The documentation of osd_memory_target refers to tcmalloc a lot. Is this in conflict with allocator=bitmap? If so, what is the way to tune cache sizes (say if tcmalloc is not used/how to check?)? Are bluestore_cache_* indeed obsolete as the above release notes suggest, or is this not true? Many thanks for your help. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

4 years, 2 months

2
8
0 0

Warning about non-existing (?) large omap object

by Alexandre Berthaud

Hello everyone, For the second time now, we have a warning on a Ceph cluster about a large omap object. This object is supposed to be in default.rgw.log, except after running listomapkeys on every object of that pool (`for i in `rados --cluster=ceph-par -p default.rgw.log ls`; do echo -n "$i:"; rados --cluster=ceph-par -p default.rgw.log listomapkeys $i |wc -l; done`), we get 0 for every object. As I understand it, objects in that pool are short-lived so the object which triggered the warning does not even exist anymore. Am I mistaken? The first time, we tried quite a lot of things, like triggering a deep scrub on the relevant pg, even looking into other pools (just in case) if we could find object with large omap objects (and we did not). The warning ended up going away on its own after a bit less than a week iirc. Is this a bug? Is there a way to clear this warning (if it is indeed about a non-existing object)? Thanks,

4 years, 2 months

1
0
0 0

Reorganize crush map and replicated rules

by 5 db S

Hi, I'd like to fix the crush tree and crush rule and would like to know the correct steps and worst case scenario what can happen during the maintenance. Steps should be like: 1. Create the rack structured crush tree under root default 2. create the replicated crush rules 3. Move the nodes under the crush tree mixed with ssd and hdd 4. Apply the new replicated rules ... ?? Here is the tree: https://pastebin.com/raw/CuuzuBsz Thank you

4 years, 2 months

1
1
0 0

"mds daemon damaged" after restarting MDS - Filesystem DOWN

by Luca Cervigni

Dear all Running nautilus 14.2.7. The data in the FS are important and cannot be lost. Today I increased the PGS of the volume pool from 8k to 16k. The active mds started reporting slow ops. (the filesystem is not in the volume pool). After few hours the FS was very slow, I reduced the backfill to 1 and since the situation was not improving, I restarted the MDS (no other standby MDSs. it was a single mds). After that the crash. The mds does not goes back up with this error: 020-02-07 07:03:32.477 7fbf69647700 -1 NetHandler create_socket couldn't create socket (97) Address family not supported by protocol 2020-02-07 07:03:32.541 7fbf65e6a700 1 mds.ceph-mon-01 Updating MDS map to version 48461 from mon.2 2020-02-07 07:03:37.613 7fbf65e6a700 1 mds.ceph-mon-01 Updating MDS map to version 48462 from mon.2 2020-02-07 07:03:37.613 7fbf65e6a700 1 mds.ceph-mon-01 Map has assigned me to become a standby 2020-02-07 07:14:11.789 7fbf66e42700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0 2020-02-07 07:14:11.789 7fbf66e42700 -1 mds.ceph-mon-01 *** got signal Terminated *** 2020-02-07 07:14:11.789 7fbf66e42700 1 mds.ceph-mon-01 suicide! Wanted state up:standby 2020-02-07 07:14:12.565 7fbf65e6a700 0 ms_deliver_dispatch: unhandled message 0x563fcb438d00 mdsmap(e 48465) v1 from mon.2 v1:10.3.78.32:6789/0 2020-02-07 07:25:16.782 7f26c39de2c0 0 set uid:gid to 64045:64045 (ceph:ceph) 2020-02-07 07:25:16.782 7f26c39de2c0 0 ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable), process ceph-mds, pid 3724 2020-02-07 07:25:16.782 7f26c39de2c0 0 pidfile_write: ignore empty --pid-file 2020-02-07 07:25:16.786 7f26b5326700 -1 NetHandler create_socket couldn't create socket (97) Address family not supported by protocol 2020-02-07 07:25:16.790 7f26b1b49700 1 mds.ceph-mon-01 Updating MDS map to version 48472 from mon.0 2020-02-07 07:25:17.691 7f26b1b49700 1 mds.ceph-mon-01 Updating MDS map to version 48473 from mon.0 2020-02-07 07:25:17.691 7f26b1b49700 1 mds.ceph-mon-01 Map has assigned me to become a standby 2020-02-07 07:29:50.306 7f26b2b21700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0 2020-02-07 07:29:50.306 7f26b2b21700 -1 mds.ceph-mon-01 *** got signal Terminated *** 2020-02-07 07:29:50.306 7f26b2b21700 1 mds.ceph-mon-01 suicide! Wanted state up:standby 2020-02-07 07:29:50.526 7f26b5b27700 1 mds.beacon.ceph-mon-01 discarding unexpected beacon reply down:dne seq 70 dne 2020-02-07 07:29:52.802 7f26b1b49700 0 ms_deliver_dispatch: unhandled message 0x55ef110ab200 mdsmap(e 48474) v1 from mon.0 v1:10.3.78.22:6789/0 Rebooting did not help I asked #CEPH OFTC and they suggested to bring up another "fresh" mds. I did that, and they do not start, going to standby. LOGS: 2020-02-07 07:12:46.696 7fe4b388b2c0 0 set uid:gid to 64045:64045 (ceph:ceph) 2020-02-07 07:12:46.696 7fe4b388b2c0 0 ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable), process ceph-mds, pid 74742 2020-02-07 07:12:46.696 7fe4b388b2c0 0 pidfile_write: ignore empty --pid-file 2020-02-07 07:12:46.704 7fe4a19f6700 1 mds.ceph-mon-02 Updating MDS map to version 48462 from mon.0 2020-02-07 07:12:47.456 7fe4a19f6700 1 mds.ceph-mon-02 Updating MDS map to version 48463 from mon.0 2020-02-07 07:12:47.456 7fe4a19f6700 1 mds.ceph-mon-02 Map has assigned me to become a standby 2020-02-07 07:14:16.615 7fe4a29ce700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0 2020-02-07 07:14:16.615 7fe4a29ce700 -1 mds.ceph-mon-02 *** got signal Terminated *** 2020-02-07 07:14:16.615 7fe4a29ce700 1 mds.ceph-mon-02 suicide! Wanted state up:standby 2020-02-07 07:14:16.947 7fe4a51d3700 1 mds.beacon.ceph-mon-02 discarding unexpected beacon reply down:dne seq 24 dne 2020-02-07 07:14:18.715 7fe4a19f6700 0 ms_deliver_dispatch: unhandled message 0x5602fbc6df80 mdsmap(e 48466) v1 from mon.0 v2:10.3.78.22:3300/0 2020-02-07 07:25:02.093 7f3c2f92a2c0 0 set uid:gid to 64045:64045 (ceph:ceph) 2020-02-07 07:25:02.093 7f3c2f92a2c0 0 ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable), process ceph-mds, pid 75471 2020-02-07 07:25:02.093 7f3c2f92a2c0 0 pidfile_write: ignore empty --pid-file 2020-02-07 07:25:02.097 7f3c1da95700 1 mds.ceph-mon-02 Updating MDS map to version 48471 from mon.2 2020-02-07 07:25:06.413 7f3c1da95700 1 mds.ceph-mon-02 Updating MDS map to version 48472 from mon.2 2020-02-07 07:25:06.413 7f3c1da95700 1 mds.ceph-mon-02 Map has assigned me to become a standby 2020-02-07 07:29:56.869 7f3c1ea6d700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0 2020-02-07 07:29:56.869 7f3c1ea6d700 -1 mds.ceph-mon-02 *** got signal Terminated *** 2020-02-07 07:29:56.869 7f3c1ea6d700 1 mds.ceph-mon-02 suicide! Wanted state up:standby 2020-02-07 07:29:58.113 7f3c1da95700 0 ms_deliver_dispatch: unhandled message 0x563c5df33f80 mdsmap(e 48475) v1 from mon.2 v2:10.3.78.32:3300/0 Here ceph status cluster: id: a8dde71d-ca7b-4cf5-bd38-8989c6a27011 health: HEALTH_ERR 1 filesystem is degraded 1 filesystem is offline 1 mds daemon damaged 2 daemons have recently crashed services: mon: 3 daemons, quorum ceph-mon-01,ceph-mon-02,ceph-mon-03 (age 41m) mgr: ceph-mon-02(active, since 41m), standbys: ceph-mon-03, ceph-mon-01 mds: pawsey-sync-fs:0/1, 1 damaged osd: 925 osds: 715 up (since 2h), 715 in (since 23h) rgw: 3 daemons active (radosgw-01, radosgw-02, radosgw-03) data: pools: 24 pools, 26569 pgs objects: 52.64M objects, 199 TiB usage: 685 TiB used, 6.7 PiB / 7.3 PiB avail pgs: 26513 active+clean 54 active+clean+scrubbing+deep 2 active+clean+scrubbing Ceph osd ls detail: https://pastebin.com/raw/bxi4HSa5 the metadata pool is on NVMe Can anyone give me some help? Any command run like journal repairs do not work as they expect the MDs to be up. Thanks Cheers -- Luca Cervigni Infrastructure Architect Tel. +61864368802 Pawsey Supercomputing Centre 1 Bryce Ave, Kensington WA 6151 Australia

4 years, 2 months

1
0
0 0

Ubuntu 18.04.4 Ceph 12.2.12

by Atherion

Hi Ceph Community. We currently have a luminous cluster running and some machines still on Ubuntu 14.04 We are looking to upgrade these machines to 18.04 but the only upgrade path for luminous with the ceph repo is through 16.04. It is doable to get to Mimic but then we have to upgrade all those machines to 16.04 but then we have to upgrade again to 18.04 when we get to Mimic, it is becoming a huge time sink. I did notice in the Ubuntu repos they have added 12.2.12 in 18.04.4 release. Is this a reliable build we can use? https://ubuntu.pkgs.org/18.04/ubuntu-proposed-main-amd64/ceph_12.2.12-0ubun… If so then we can go straight to 18.04.4 and not waste so much time. Best

4 years, 2 months

3
3
0 0

Re: Different memory usage on OSD nodes after update to Nautilus

by Massimo Sgaravatto

Thanks for your feedback The Ganglia graphs are available here: https://cernbox.cern.ch/index.php/s/0xBDVwNkRqcoGdF Replying to the other questions: - Free Memory in ganglia is derived from "MemFree" in /proc/meminfo - Memory Buffers in ganglia is derived from "Buffers" in /proc/meminfo - On this host, the OSDs are 6TB. On other hosts we have 10TB OSDs - "osd memory target" is set to ~ 4.5 GB (actually, while debugging this issue, I have just lowered the value to 3.2 GB) - "ceph tell osd.x heap stats" basically always reports 0 (or a very low value) for "Bytes in page heap freelist" and a heap release doesn't change the memory usage - I can agree that swap is antiquated. But so far it was simply not used and didn't cause any problems. At any rate I am now going to remove the swap (or setting the swappiness to 0). Thanks again ! Cheers, Massimo On Thu, Feb 6, 2020 at 6:28 PM Anthony D'Atri <aad(a)dreamsnake.net> wrote: > Attachments are usually filtered by mailing lists. Yours did not come > through. A URL to Skitch or some other hosting works better. > > Your kernel version sounds like RHEL / CentOS? I can say that memory > accounting definitely did change between upstream 3.19 and 4.9 > > > osd04-cephstorage1-gsc:~ # head /proc/meminfo > MemTotal: 197524684 kB > MemFree: 80388504 kB > MemAvailable: 86055708 kB > Buffers: 633768 kB > Cached: 4705408 kB > SwapCached: 0 kB > > Specifically, node_memory_Active as reported by node_exporter changes > dramatically, and MemAvailable is the more meaningful metric. What is your > “FreeMem” metric actually derived from? > > 64GB for 10 OSDs might be on the light side, how large are those OSDs? > > For sure swap is antiquated. If your systems have any swap provisioned at > all, you’re doing it wrong. I’ve had good results setting it to 1. > > Do `ceph daemon osd.xx heap stats`, see if your OSD processes have much > unused memory that has not been released to the OS. If they do, “heap > release” can be useful. > > > > > On Feb 6, 2020, at 9:08 AM, Massimo Sgaravatto < > massimo.sgaravatto(a)gmail.com> wrote: > > > > Dear all > > > > In the mid of January I updated my ceph cluster from Luminous to > Nautilus. > > > > Attached you can see the memory metrics collected on one OSD node (I see > > the very same behavior on all OSD hosts) graphed via Ganglia > > This is Centos 7 node, with 64 GB of RAM, hosting 10 OSDs. > > > > So before the update there were about 20 GB of FreeMem. > > Now FreeMem is basically 0, but I see 20 GB of Buffers, > > > > I guess this triggered some swapping, probably because I forgot to > > set vm.swappiness to 0 (it was set to 60, the default value). > > > > I was wondering if this the expected behavior > > > > PS: Actually besides updating ceph, I also updated all the other packages > > (yum update), so I am not sure that this different memory usage is > because > > of the ceph update > > For the record in this update the kernel was updated from 3.10.0-1062.1.2 > > to 3.10.0-1062.9.1 > > > > Thanks, Massimo > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > >

4 years, 2 months

1
0
0 0

Different memory usage on OSD nodes after update to Nautilus

by Massimo Sgaravatto

Dear all In the mid of January I updated my ceph cluster from Luminous to Nautilus. Attached you can see the memory metrics collected on one OSD node (I see the very same behavior on all OSD hosts) graphed via Ganglia This is Centos 7 node, with 64 GB of RAM, hosting 10 OSDs. So before the update there were about 20 GB of FreeMem. Now FreeMem is basically 0, but I see 20 GB of Buffers, I guess this triggered some swapping, probably because I forgot to set vm.swappiness to 0 (it was set to 60, the default value). I was wondering if this the expected behavior PS: Actually besides updating ceph, I also updated all the other packages (yum update), so I am not sure that this different memory usage is because of the ceph update For the record in this update the kernel was updated from 3.10.0-1062.1.2 to 3.10.0-1062.9.1 Thanks, Massimo

4 years, 2 months

1
0
0 0

RBD cephx read-only key

by Andras Pataki

I'm trying to set up a cephx key to mount RBD images read-only. I have the following two keys: [client.rbd] key = xxx caps mgr = "profile rbd" caps mon = "profile rbd" caps osd = "profile rbd pool=rbd_vm" [client.rbd-ro] key = xxx caps mgr = "profile rbd-read-only" caps mon = "profile rbd" caps osd = "profile rbd-read-only pool=rbd_vm" The following works: # rbd map --pool rbd_vm andras_test --name client.rbd /dev/rbd0 and so does this: # rbd map --pool rbd_vm andras_test --name client.rbd --read-only /dev/rbd0 but the using the rbd-ro key doesn't work: # rbd map --pool rbd_vm andras_test --name client.rbd-ro --read-only rbd: sysfs write failed In some cases useful info is found in syslog - try "dmesg | tail". rbd: map failed: (1) Operation not permitted the logs only have the following: [1281776.788709] libceph: mon4 10.128.150.14:6789 session established [1281776.801747] libceph: client88900164 fsid d7b33135-0940-4e48-8aa6-1d2026597c2f The back end of mimic 13.2.8, the kernel is the CentOS kernel 3.10.0-957.27.2.el7.x86_64 Any ideas what I'm doing wrong here? Andras

4 years, 2 months

2
2
0 0

Need info about ceph bluestore autorepair

by Mario Giammarco

Hello, if I have a pool with replica 3 what happens when one replica is corrupted? I suppose ceph detects bad replica using checksums and replace it with good one If I have a pool with replica 2 what happens? Thanks, Mario

4 years, 2 months

2
1
0 0

Stuck with an unavailable iscsi gateway

by jcharles＠provectio.fr

Hello I can't find, a way to resolve my problem. I lost a iscsi gateway in a pool of 4 gateway, there is 3 lefts. I can't delete the lost gateway from host and I can't change the Owner of the resource owned by the lost gateway. Finally, I have ressources which are inaccessible from clients and I can't reconfigure them because of the lost gateway. Please, tell me there is a way to remove a lost gateway and that I won't be stuck for ever. If I do delete compute04.adm.local it answers Failed : Gateway deletion failed, gateway(s) unavailable:compute04.adm.local(UNKNOWN state) I saw a reference of my problem in the thread "Error in add new ISCSI gateway" but unfortunatly, no answer seems to be avalaible. Thanks for any help

4 years, 2 months

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users February 2020