I recently upgraded from 13.2.2 to 13.2.8 and observe two changes that I struggle with:
- from release notes: The bluestore_cache_* options are no longer needed. They are replaced by osd_memory_target, defaulting to 4GB.
- the default for bluestore_allocator has changed from stupid to bitmap,
which seem to conflict each other, or at least I seem unable to achieve what I want.
I have a number of OSDs for which I would like to increase the cache size. In the past I used bluestore_cache_size=8G and it worked like a charm. I now changed that to osd_memory_target=8G without any effect. The usage stays at 4G and the virtual size is about 5G. I would expect both to be close to 8G. The read cache for these OSDs usually fills up within a few hours. The cluster is now running a few days with the new configs to no avail.
The documentation of osd_memory_target refers to tcmalloc a lot. Is this in conflict with allocator=bitmap? If so, what is the way to tune cache sizes (say if tcmalloc is not used/how to check?)? Are bluestore_cache_* indeed obsolete as the above release notes suggest, or is this not true?
Many thanks for your help.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hello everyone,
For the second time now, we have a warning on a Ceph cluster about a large
omap object.
This object is supposed to be in default.rgw.log, except after running
listomapkeys on every object of that pool (`for i in `rados
--cluster=ceph-par -p default.rgw.log ls`; do echo -n "$i:"; rados
--cluster=ceph-par -p default.rgw.log listomapkeys $i |wc -l; done`), we
get 0 for every object.
As I understand it, objects in that pool are short-lived so the object
which triggered the warning does not even exist anymore. Am I mistaken?
The first time, we tried quite a lot of things, like triggering a deep
scrub on the relevant pg, even looking into other pools (just in case) if
we could find object with large omap objects (and we did not).
The warning ended up going away on its own after a bit less than a week
iirc.
Is this a bug? Is there a way to clear this warning (if it is indeed about
a non-existing object)?
Thanks,
Hi,
I'd like to fix the crush tree and crush rule and would like to know the
correct steps and worst case scenario what can happen during the
maintenance.
Steps should be like:
1. Create the rack structured crush tree under root default
2. create the replicated crush rules
3. Move the nodes under the crush tree mixed with ssd and hdd
4. Apply the new replicated rules
...
??
Here is the tree:
https://pastebin.com/raw/CuuzuBsz
Thank you
Dear all
Running nautilus 14.2.7. The data in the FS are important and cannot be
lost.
Today I increased the PGS of the volume pool from 8k to 16k. The active
mds started reporting slow ops. (the filesystem is not in the volume
pool). After few hours the FS was very slow, I reduced the backfill to 1
and since the situation was not improving, I restarted the MDS (no other
standby MDSs. it was a single mds).
After that the crash. The mds does not goes back up with this error:
020-02-07 07:03:32.477 7fbf69647700 -1 NetHandler create_socket couldn't
create socket (97) Address family not supported by protocol
2020-02-07 07:03:32.541 7fbf65e6a700 1 mds.ceph-mon-01 Updating MDS map
to version 48461 from mon.2
2020-02-07 07:03:37.613 7fbf65e6a700 1 mds.ceph-mon-01 Updating MDS map
to version 48462 from mon.2
2020-02-07 07:03:37.613 7fbf65e6a700 1 mds.ceph-mon-01 Map has assigned
me to become a standby
2020-02-07 07:14:11.789 7fbf66e42700 -1 received signal: Terminated
from /sbin/init (PID: 1) UID: 0
2020-02-07 07:14:11.789 7fbf66e42700 -1 mds.ceph-mon-01 *** got signal
Terminated ***
2020-02-07 07:14:11.789 7fbf66e42700 1 mds.ceph-mon-01 suicide! Wanted
state up:standby
2020-02-07 07:14:12.565 7fbf65e6a700 0 ms_deliver_dispatch: unhandled
message 0x563fcb438d00 mdsmap(e 48465) v1 from mon.2 v1:10.3.78.32:6789/0
2020-02-07 07:25:16.782 7f26c39de2c0 0 set uid:gid to 64045:64045
(ceph:ceph)
2020-02-07 07:25:16.782 7f26c39de2c0 0 ceph version 14.2.7
(3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable), process
ceph-mds, pid 3724
2020-02-07 07:25:16.782 7f26c39de2c0 0 pidfile_write: ignore empty
--pid-file
2020-02-07 07:25:16.786 7f26b5326700 -1 NetHandler create_socket
couldn't create socket (97) Address family not supported by protocol
2020-02-07 07:25:16.790 7f26b1b49700 1 mds.ceph-mon-01 Updating MDS map
to version 48472 from mon.0
2020-02-07 07:25:17.691 7f26b1b49700 1 mds.ceph-mon-01 Updating MDS map
to version 48473 from mon.0
2020-02-07 07:25:17.691 7f26b1b49700 1 mds.ceph-mon-01 Map has assigned
me to become a standby
2020-02-07 07:29:50.306 7f26b2b21700 -1 received signal: Terminated
from /sbin/init (PID: 1) UID: 0
2020-02-07 07:29:50.306 7f26b2b21700 -1 mds.ceph-mon-01 *** got signal
Terminated ***
2020-02-07 07:29:50.306 7f26b2b21700 1 mds.ceph-mon-01 suicide! Wanted
state up:standby
2020-02-07 07:29:50.526 7f26b5b27700 1 mds.beacon.ceph-mon-01
discarding unexpected beacon reply down:dne seq 70 dne
2020-02-07 07:29:52.802 7f26b1b49700 0 ms_deliver_dispatch: unhandled
message 0x55ef110ab200 mdsmap(e 48474) v1 from mon.0 v1:10.3.78.22:6789/0
Rebooting did not help
I asked #CEPH OFTC and they suggested to bring up another "fresh" mds. I
did that, and they do not start, going to standby. LOGS:
2020-02-07 07:12:46.696 7fe4b388b2c0 0 set uid:gid to 64045:64045
(ceph:ceph)
2020-02-07 07:12:46.696 7fe4b388b2c0 0 ceph version 14.2.7
(3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable), process
ceph-mds, pid 74742
2020-02-07 07:12:46.696 7fe4b388b2c0 0 pidfile_write: ignore empty
--pid-file
2020-02-07 07:12:46.704 7fe4a19f6700 1 mds.ceph-mon-02 Updating MDS map
to version 48462 from mon.0
2020-02-07 07:12:47.456 7fe4a19f6700 1 mds.ceph-mon-02 Updating MDS map
to version 48463 from mon.0
2020-02-07 07:12:47.456 7fe4a19f6700 1 mds.ceph-mon-02 Map has assigned
me to become a standby
2020-02-07 07:14:16.615 7fe4a29ce700 -1 received signal: Terminated
from /sbin/init (PID: 1) UID: 0
2020-02-07 07:14:16.615 7fe4a29ce700 -1 mds.ceph-mon-02 *** got signal
Terminated ***
2020-02-07 07:14:16.615 7fe4a29ce700 1 mds.ceph-mon-02 suicide! Wanted
state up:standby
2020-02-07 07:14:16.947 7fe4a51d3700 1 mds.beacon.ceph-mon-02
discarding unexpected beacon reply down:dne seq 24 dne
2020-02-07 07:14:18.715 7fe4a19f6700 0 ms_deliver_dispatch: unhandled
message 0x5602fbc6df80 mdsmap(e 48466) v1 from mon.0 v2:10.3.78.22:3300/0
2020-02-07 07:25:02.093 7f3c2f92a2c0 0 set uid:gid to 64045:64045
(ceph:ceph)
2020-02-07 07:25:02.093 7f3c2f92a2c0 0 ceph version 14.2.7
(3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable), process
ceph-mds, pid 75471
2020-02-07 07:25:02.093 7f3c2f92a2c0 0 pidfile_write: ignore empty
--pid-file
2020-02-07 07:25:02.097 7f3c1da95700 1 mds.ceph-mon-02 Updating MDS map
to version 48471 from mon.2
2020-02-07 07:25:06.413 7f3c1da95700 1 mds.ceph-mon-02 Updating MDS map
to version 48472 from mon.2
2020-02-07 07:25:06.413 7f3c1da95700 1 mds.ceph-mon-02 Map has assigned
me to become a standby
2020-02-07 07:29:56.869 7f3c1ea6d700 -1 received signal: Terminated
from /sbin/init (PID: 1) UID: 0
2020-02-07 07:29:56.869 7f3c1ea6d700 -1 mds.ceph-mon-02 *** got signal
Terminated ***
2020-02-07 07:29:56.869 7f3c1ea6d700 1 mds.ceph-mon-02 suicide! Wanted
state up:standby
2020-02-07 07:29:58.113 7f3c1da95700 0 ms_deliver_dispatch: unhandled
message 0x563c5df33f80 mdsmap(e 48475) v1 from mon.2 v2:10.3.78.32:3300/0
Here ceph status
cluster:
id: a8dde71d-ca7b-4cf5-bd38-8989c6a27011
health: HEALTH_ERR
1 filesystem is degraded
1 filesystem is offline
1 mds daemon damaged
2 daemons have recently crashed
services:
mon: 3 daemons, quorum ceph-mon-01,ceph-mon-02,ceph-mon-03 (age 41m)
mgr: ceph-mon-02(active, since 41m), standbys: ceph-mon-03, ceph-mon-01
mds: pawsey-sync-fs:0/1, 1 damaged
osd: 925 osds: 715 up (since 2h), 715 in (since 23h)
rgw: 3 daemons active (radosgw-01, radosgw-02, radosgw-03)
data:
pools: 24 pools, 26569 pgs
objects: 52.64M objects, 199 TiB
usage: 685 TiB used, 6.7 PiB / 7.3 PiB avail
pgs: 26513 active+clean
54 active+clean+scrubbing+deep
2 active+clean+scrubbing
Ceph osd ls detail: https://pastebin.com/raw/bxi4HSa5
the metadata pool is on NVMe
Can anyone give me some help?
Any command run like journal repairs do not work as they expect the MDs
to be up.
Thanks
Cheers
--
Luca Cervigni
Infrastructure Architect
Tel. +61864368802
Pawsey Supercomputing Centre
1 Bryce Ave, Kensington WA 6151
Australia
Hi Ceph Community.
We currently have a luminous cluster running and some machines still on Ubuntu 14.04
We are looking to upgrade these machines to 18.04 but the only upgrade path for luminous with the ceph repo is through 16.04.
It is doable to get to Mimic but then we have to upgrade all those machines to 16.04 but then we have to upgrade again to 18.04 when we get to Mimic, it is becoming a huge time sink.
I did notice in the Ubuntu repos they have added 12.2.12 in 18.04.4 release. Is this a reliable build we can use?
https://ubuntu.pkgs.org/18.04/ubuntu-proposed-main-amd64/ceph_12.2.12-0ubun…
If so then we can go straight to 18.04.4 and not waste so much time.
Best
Thanks for your feedback
The Ganglia graphs are available here:
https://cernbox.cern.ch/index.php/s/0xBDVwNkRqcoGdF
Replying to the other questions:
- Free Memory in ganglia is derived from "MemFree" in /proc/meminfo
- Memory Buffers in ganglia is derived from "Buffers" in /proc/meminfo
- On this host, the OSDs are 6TB. On other hosts we have 10TB OSDs
- "osd memory target" is set to ~ 4.5 GB (actually, while debugging this
issue, I have just lowered the value to 3.2 GB)
- "ceph tell osd.x heap stats" basically always reports 0 (or a very low
value) for "Bytes in page heap freelist" and a heap release doesn't change
the memory usage
- I can agree that swap is antiquated. But so far it was simply not used
and didn't cause any problems. At any rate I am now going to remove the
swap (or setting the swappiness to 0).
Thanks again !
Cheers, Massimo
On Thu, Feb 6, 2020 at 6:28 PM Anthony D'Atri <aad(a)dreamsnake.net> wrote:
> Attachments are usually filtered by mailing lists. Yours did not come
> through. A URL to Skitch or some other hosting works better.
>
> Your kernel version sounds like RHEL / CentOS? I can say that memory
> accounting definitely did change between upstream 3.19 and 4.9
>
>
> osd04-cephstorage1-gsc:~ # head /proc/meminfo
> MemTotal: 197524684 kB
> MemFree: 80388504 kB
> MemAvailable: 86055708 kB
> Buffers: 633768 kB
> Cached: 4705408 kB
> SwapCached: 0 kB
>
> Specifically, node_memory_Active as reported by node_exporter changes
> dramatically, and MemAvailable is the more meaningful metric. What is your
> “FreeMem” metric actually derived from?
>
> 64GB for 10 OSDs might be on the light side, how large are those OSDs?
>
> For sure swap is antiquated. If your systems have any swap provisioned at
> all, you’re doing it wrong. I’ve had good results setting it to 1.
>
> Do `ceph daemon osd.xx heap stats`, see if your OSD processes have much
> unused memory that has not been released to the OS. If they do, “heap
> release” can be useful.
>
>
>
> > On Feb 6, 2020, at 9:08 AM, Massimo Sgaravatto <
> massimo.sgaravatto(a)gmail.com> wrote:
> >
> > Dear all
> >
> > In the mid of January I updated my ceph cluster from Luminous to
> Nautilus.
> >
> > Attached you can see the memory metrics collected on one OSD node (I see
> > the very same behavior on all OSD hosts) graphed via Ganglia
> > This is Centos 7 node, with 64 GB of RAM, hosting 10 OSDs.
> >
> > So before the update there were about 20 GB of FreeMem.
> > Now FreeMem is basically 0, but I see 20 GB of Buffers,
> >
> > I guess this triggered some swapping, probably because I forgot to
> > set vm.swappiness to 0 (it was set to 60, the default value).
> >
> > I was wondering if this the expected behavior
> >
> > PS: Actually besides updating ceph, I also updated all the other packages
> > (yum update), so I am not sure that this different memory usage is
> because
> > of the ceph update
> > For the record in this update the kernel was updated from 3.10.0-1062.1.2
> > to 3.10.0-1062.9.1
> >
> > Thanks, Massimo
> > _______________________________________________
> > ceph-users mailing list -- ceph-users(a)ceph.io
> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>
Dear all
In the mid of January I updated my ceph cluster from Luminous to Nautilus.
Attached you can see the memory metrics collected on one OSD node (I see
the very same behavior on all OSD hosts) graphed via Ganglia
This is Centos 7 node, with 64 GB of RAM, hosting 10 OSDs.
So before the update there were about 20 GB of FreeMem.
Now FreeMem is basically 0, but I see 20 GB of Buffers,
I guess this triggered some swapping, probably because I forgot to
set vm.swappiness to 0 (it was set to 60, the default value).
I was wondering if this the expected behavior
PS: Actually besides updating ceph, I also updated all the other packages
(yum update), so I am not sure that this different memory usage is because
of the ceph update
For the record in this update the kernel was updated from 3.10.0-1062.1.2
to 3.10.0-1062.9.1
Thanks, Massimo
I'm trying to set up a cephx key to mount RBD images read-only. I have
the following two keys:
[client.rbd]
key = xxx
caps mgr = "profile rbd"
caps mon = "profile rbd"
caps osd = "profile rbd pool=rbd_vm"
[client.rbd-ro]
key = xxx
caps mgr = "profile rbd-read-only"
caps mon = "profile rbd"
caps osd = "profile rbd-read-only pool=rbd_vm"
The following works:
# rbd map --pool rbd_vm andras_test --name client.rbd
/dev/rbd0
and so does this:
# rbd map --pool rbd_vm andras_test --name client.rbd --read-only
/dev/rbd0
but the using the rbd-ro key doesn't work:
# rbd map --pool rbd_vm andras_test --name client.rbd-ro --read-only
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (1) Operation not permitted
the logs only have the following:
[1281776.788709] libceph: mon4 10.128.150.14:6789 session established
[1281776.801747] libceph: client88900164 fsid
d7b33135-0940-4e48-8aa6-1d2026597c2f
The back end of mimic 13.2.8, the kernel is the CentOS kernel
3.10.0-957.27.2.el7.x86_64
Any ideas what I'm doing wrong here?
Andras
Hello,
if I have a pool with replica 3 what happens when one replica is corrupted?
I suppose ceph detects bad replica using checksums and replace it with good
one
If I have a pool with replica 2 what happens?
Thanks,
Mario
Hello
I can't find, a way to resolve my problem.
I lost a iscsi gateway in a pool of 4 gateway, there is 3 lefts. I can't delete the lost gateway from host and I can't change the Owner of the resource owned by the lost gateway.
Finally, I have ressources which are inaccessible from clients and I can't reconfigure them because of the lost gateway.
Please, tell me there is a way to remove a lost gateway and that I won't be stuck for ever.
If I do
delete compute04.adm.local
it answers
Failed : Gateway deletion failed, gateway(s) unavailable:compute04.adm.local(UNKNOWN state)
I saw a reference of my problem in the thread "Error in add new ISCSI gateway" but unfortunatly, no answer seems to be avalaible.
Thanks for any help