The OOM-killer is on the rampage and striking down hapless OSDs when
the cluster is under heavy client IO.
The memory target does not seem to be much of a limit, is this intentional?
root@cnx-11:~# ceph-conf --show-config|fgrep osd_memory_target
osd_memory_target = 4294967296
osd_memory_target_cgroup_limit_ratio = 0.800000
root@cnx-31:~# pmap 4327|fgrep total
total 6794892K
Are there any tips for controlling the OSD memory consumption?
The hosts involved have 128GB or 192GB memory, 12 x OSDs (SATA), so
even with 4GB per OSD there should be a large amount of free memory.
hi there,
recently, we've come across a lot of advice to only use replicated rados
pools as default- (ie: root-) data pools for cephfs¹.
unfortunately, we either skipped or blatantly ignored this advice while
creating our cephfs, so our default data pool is an erasure coded one
with k=2 and m=4, which _should_ be fine availability-wise. could anyone
elaborate on the impacts regarding the performance of the whole setup?
if a migration to a replicated pool is recommend: would a simple
ceph osd pool set $default_data crush_rule $something_replicated
suffice, or would you recommend a more elaborated approach, something
along the lines of taking the cephfs down, copy contents of default_pool
to default_new, rename default_new default_pool, taking the cephfs up again?
thank you very much & with kind regards,
t.
¹ - see, for instance, https://tracker.ceph.com/issues/42450 .
ceph version 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus
(stable)
I made a bucket named "test_lc" and ran `s3cmd expire
--expiry-date=2019-01-01 s3://test_lc` to set the lifecycle (2019-01-01 is
earlier than current date so every object will be removed).
Then I ran `radosgw-admin lc process`, the objects got deleted as expected,
and the status from `radosgw-admin lc list` is "completed". However, if I
upload some objects, and ran `radosgw-admin lc process` again, the objects
were not deleted.
Could you please tell me what the reason is and what I should do in this
case? Thanks in advance!
Hi,
is it possible to run MDS on a newer version than the monitoring nodes?
I mean we run monitoring nodes on 12.2.10 and would like to upgrade
the MDS to 12.2.13 is this possible?
Best,
Martin
Hi all,
today we observe that out of the sudden our standby-replay metadata
server continuously writes the following logs:
2020-02-13 11:56:50.216102 7fd2ad229700 1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2020-02-13 11:56:50.287699 7fd2ad229700 0 mds.beacon.dcucmds401
Skipping beacon heartbeat to monitors (last acked 100.836s ago); MDS
internal heartbeat is not healthy!
and it's memory is growing until no memory is available any more and
the service gets restarted and then stops. The funny thing is that on
the active MDS we are not seeing these log messages and any increase
of memory.
We are running ceph version 12.2.10 on all nodes of our Ceph cluster.
Any suggestions?
Best,
Martin
Hi,
The Ceph Berlin MeetUp is a community organized group that met
bi-monthly in the past years: https://www.meetup.com/Ceph-Berlin/
The meetups start at 6 pm and consist of one presentation or talk and a
following discussion. The discussion often takes place over dinner in a
nearby restaurant if available.
The next date would be March 23rd in four weeks, enough time to organize
a meetup. Before fixing that date I would like to ask everyone if
someone is willing to host our MeetUp. This is quite uncomplicated, we
just need a room for up to 20 people and a beamer for a little talk.
Catering is completely optional but very welcome. If March 23rd does not
fit your schedule please suggest another day.
So if you or your company in Berlin is able and willing to host the next
Ceph meetup please contact me.
If you have done something with Ceph in the last year and want to talk
about it, please also do not hesitate to contact me.
Kindest Regards
--
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin
https://www.heinlein-support.de
Tel: 030 / 405051-43
Fax: 030 / 405051-19
Amtsgericht Berlin-Charlottenburg - HRB 93818 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
Hello Team ,
I am getting frequent LARGE_OMAP_OBJECTS 1 large omap objects in one of my cephfs metadata pools , anyone can explain why would this pool getting into this state frequently and how could I prevent this in future ?
# ceph health detail
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool 'cephfs01-metadata'
Search the cluster log for 'Large omap object found' for more details.
Thanks ,
Uday