July 2020 - ceph-users - lists.ceph.io

by Frank Schilder

Hi all, on a mimic 13.2.8 cluster I observe a gradual increase of memory usage by OSD daemons, in particular, under heavy load. For our spinners I use osd_memory_target=2G. The daemons overrun the 2G in virt size rather quickly and grow to something like 4G virtual. The real memory consumption stays more or less around the 2G of the target. There are some overshoots, but these go down again during periods with less load. What I observe now is that the actual memory consumption slowly grows and OSDs start using more than 2G virtual memory. I see this as slowly growing swap usage despite having more RAM available (swappiness=10). This indicates allocated but unused memory or memory not accessed for a long time, usually a leak. Here some heap stats: Before restart: osd.101 tcmalloc heap stats:------------------------------------------------ MALLOC: 3438940768 ( 3279.6 MiB) Bytes in use by application MALLOC: + 5611520 ( 5.4 MiB) Bytes in page heap freelist MALLOC: + 257307352 ( 245.4 MiB) Bytes in central cache freelist MALLOC: + 357376 ( 0.3 MiB) Bytes in transfer cache freelist MALLOC: + 6727368 ( 6.4 MiB) Bytes in thread cache freelists MALLOC: + 25559040 ( 24.4 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 3734503424 ( 3561.5 MiB) Actual memory used (physical + swap) MALLOC: + 575946752 ( 549.3 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 4310450176 ( 4110.8 MiB) Virtual address space used MALLOC: MALLOC: 382884 Spans in use MALLOC: 35 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------ # ceph daemon osd.101 dump_mempools { "mempool": { "by_pool": { "bloom_filter": { "items": 0, "bytes": 0 }, "bluestore_alloc": { "items": 4691828, "bytes": 37534624 }, "bluestore_cache_data": { "items": 0, "bytes": 0 }, "bluestore_cache_onode": { "items": 51, "bytes": 28968 }, "bluestore_cache_other": { "items": 5761276, "bytes": 46292425 }, "bluestore_fsck": { "items": 0, "bytes": 0 }, "bluestore_txc": { "items": 67, "bytes": 46096 }, "bluestore_writing_deferred": { "items": 208, "bytes": 26037057 }, "bluestore_writing": { "items": 52, "bytes": 6789398 }, "bluefs": { "items": 9478, "bytes": 183720 }, "buffer_anon": { "items": 291450, "bytes": 28093473 }, "buffer_meta": { "items": 546, "bytes": 34944 }, "osd": { "items": 98, "bytes": 1139152 }, "osd_mapbl": { "items": 78, "bytes": 8204276 }, "osd_pglog": { "items": 341944, "bytes": 120607952 }, "osdmap": { "items": 10687217, "bytes": 186830528 }, "osdmap_mapping": { "items": 0, "bytes": 0 }, "pgmap": { "items": 0, "bytes": 0 }, "mds_co": { "items": 0, "bytes": 0 }, "unittest_1": { "items": 0, "bytes": 0 }, "unittest_2": { "items": 0, "bytes": 0 } }, "total": { "items": 21784293, "bytes": 461822613 } } } Right after restart + health_ok: osd.101 tcmalloc heap stats:------------------------------------------------ MALLOC: 1173996280 ( 1119.6 MiB) Bytes in use by application MALLOC: + 3727360 ( 3.6 MiB) Bytes in page heap freelist MALLOC: + 25493688 ( 24.3 MiB) Bytes in central cache freelist MALLOC: + 17101824 ( 16.3 MiB) Bytes in transfer cache freelist MALLOC: + 20301904 ( 19.4 MiB) Bytes in thread cache freelists MALLOC: + 5242880 ( 5.0 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 1245863936 ( 1188.1 MiB) Actual memory used (physical + swap) MALLOC: + 20488192 ( 19.5 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 1266352128 ( 1207.7 MiB) Virtual address space used MALLOC: MALLOC: 54160 Spans in use MALLOC: 33 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------ Am I looking at a memory leak here or are these heap stats expected? I don't mind the swap usage, it doesn't have impact. I'm just wondering if I need to restart OSDs regularly. The "leakage" above occurred within only 2 months. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

3 years, 5 months

4
25
0 0

NoSuchKey on key that is visible in s3 list/radosgw bk

by Mariusz Gronczewski

Hi, I've got a problem on Octopus (15.2.3, debian packages) install, bucket S3 index shows a file: s3cmd ls s3://upvid/255/38355 --recursive 2020-07-27 17:48 50584342 s3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4 radosgw-admin bi list also shows it { "type": "plain", "idx": "255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4", "entry": { "name": "255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4", "instance": "", "ver": { "pool": 11, "epoch": 853842 }, "locator": "", "exists": "true", "meta": { "category": 1, "size": 50584342, "mtime": "2020-07-27T17:48:27.203008Z", "etag": "2b31cc8ce8b1fb92a5f65034f2d12581-7", "storage_class": "", "owner": "filmweb-app", "owner_display_name": "filmweb app user", "content_type": "", "accounted_size": 50584342, "user_data": "", "appendable": "false" }, "tag": "_3ubjaztglHXfZr05wZCFCPzebQf-ZFP", "flags": 0, "pending_map": [], "versioned_epoch": 0 } }, but trying to download it via curl (I've set permissions to public0 only gets me <?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchKey</Code><BucketName>upvid</BucketName><RequestId>tx0000000000000000e716d-005f1f14cb-e478a-pl-war1</RequestId><HostId>e478a-pl-war1-pl</HostId></Error> (the actually nonexisting files shows access denied in same context) same with other tools: $ s3cmd get s3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4 /tmp download: 's3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4' -> '/tmp/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4' [1 of 1] ERROR: S3 error: 404 (NoSuchKey) cluster health is OK Any ideas what is happening here ? -- Mariusz Gronczewski, Administrator Efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 NOC: [+48] 22 380 10 20 E: admin(a)efigence.com

3 years, 5 months

3
3
0 0

Octopus OSDs dropping out of cluster: _check_auth_rotating possible clock skew, rotating keys expired way too early

by Wido den Hollander

Hi, On a recently deployed Octopus (15.2.2) cluster (240 OSDs) we are seeing OSDs randomly drop out of the cluster. Usually it's 2 to 4 OSDs spread out over different nodes. Each node has 16 OSDs and not all the failing OSDs are on the same node. The OSDs are marked as down and all they keep print in their logs: monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2020-06-04T07:57:17.706529-0400) Looking at their status through the admin socket: { "cluster_fsid": "68653193-9b84-478d-bc39-1a811dd50836", "osd_fsid": "87231b5d-ae5f-4901-93c5-18034381e5ec", "whoami": 206, "state": "active", "oldest_map": 73697, "newest_map": 75795, "num_pgs": 19 } The message brought me to my own ticket I created 2 years ago: https://tracker.ceph.com/issues/23460 The first thing I've checked is NTP/time. Double, triple check this. All the times are in sync on the cluster. Nothing wrong there. Again, it's not all the OSDs on a node failing. Just 1 or 2 dropping out. Restarting them brings them back right away and then within 24h some other OSDs will drop out. Has anybody seen this behavior with Octopus as well? Wido

3 years, 6 months

2
1
0 0

Solve the issue of time scarcity via assignment help in Kuwait

by kevin wick

Are you facing a scarcity of timings or lack of time while composing assignments for Kuwait Universities? Are you not sure while taking the help of anybody to compose your assignments? For every student, assignments sound necessary but time-consuming tasks. You have to manage your time for your projects to score high marks as you can’t ignore your assignments during your study tenure. In this situation, if you can’t manage your time and require reliable assistance for your important task, place your order for assignment help even in Kuwait. Two important things that you need to keep in mind while working on your assignments are time management and quality content. Most students fail to submit their assignments on time because they could not manage their time and can’t collect relevant information for drafting their academic papers. However, don’t burst out and suffer your marks because of any reason. Instead of it, connect with assignment writing service provider and get your work done on time if you have less time to write your assignment. Many students achieve high marks on their projects due to the assistance of experts and professional writers. When you quote order for assignment writing help, you will provide enough time to engage yourself in some other academic tasks. If you don’t have time for writing your assignments and have no idea how to collect particulars for your work, transfer your project to professionals. Experts know how to arrange relevant information for composing the effective academic papers so you will not lose your marks. Expert’s knowledge and experience will allow you to connect with the right source of information and help you to score high marks. So, if you have issues in writing your academic papers, don’t forget to check out the services of online academic writing. https://www.greatassignmenthelp.com/kw/

3 years, 6 months

3
2
1 0

Nautilus: rbd image stuck unaccessible after VM restart

by islepnev＠gmail.com

Hello, I’m running kvm virtualization with rbd storage, some images on rbd pool become efficiently unusable after VM restart. All I/O to problematic rbd image blocks infinitely. Checked that it is not a permission or locking problem. The bug was silent until we performed a planned restart of few VMs and some of VMs failed to start (kvm process timed out). It could be related to recent upgrades luminous to nautilus or proxmox 5 to 6. Ceph backend is clean, no observable problems, all mons/mgrs/osds up and running. Network is ok. Nothing in logs relevant to the problem. ceph version 14.2.6 (ba51347bdbe28c7c0e2e9172fa2983111137bb60) nautilus (stable) kernel 5.3.13-2-pve #1 SMP PVE 5.3.13-2 (Fri, 24 Jan 2020 09:49:36 +0100) x86_64 GNU/Linux HEALTH_OK No locks: # rbd status rbd-technet/vm-402-disk-0 Watchers: none # rbd status rbd-technet/vm-402-disk-1 Watchers: none Normal image vs problematic: # rbd object-map check rbd-technet/vm-402-disk-0 Object Map Check: 100% complete…done. # rbd object-map check rbd-technet/vm-402-disk-1 ^C disk-0 is good while disk-1 is effectively lost. Command hangs for many minutes with no visible activity, interrupted. rbd export runs without problems, however some data is lost after being imported back (ext4 errors). rbd deep copy worked for me. Copy looks good, no errors. # rbd info rbd-technet/vm-402-disk-1 rbd image 'vm-402-disk-1': size 16 GiB in 4096 objects order 22 (4 MiB objects) snapshot_count: 0 id: c600d06b8b4567 block_name_prefix: rbd_data.c600d06b8b4567 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling op_features: flags: create_timestamp: Fri Jan 31 17:50:50 2020 access_timestamp: Sat Mar 7 00:30:53 2020 modify_timestamp: Sat Mar 7 00:33:35 2020 journal: c600d06b8b4567 mirroring state: disabled What can be done to debug this problem? Thanks, Ilia.

3 years, 7 months

4
3
1 0

Choosing suitable SSD for Ceph cluster

by Hermann Himmelbauer

Hi, I am running a nice ceph (proxmox 4 / debian-8 / ceph 0.94.3) cluster on 3 nodes (supermicro X8DTT-HIBQF), 2 OSD each (2TB SATA harddisks), interconnected via Infiniband 40. Problem is that the ceph performance is quite bad (approx. 30MiB/s reading, 3-4 MiB/s writing ), so I thought about plugging into each node a PCIe to NVMe/M.2 adapter and install SSD harddisks. The idea is to have a faster ceph storage and also some storage extension. The question is now which SSDs I should use. If I understand it right, not every SSD is suitable for ceph, as is denoted at the links below: https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-i… or here: https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark In the first link, the Samsung SSD 950 PRO 512GB NVMe is listed as a fast SSD for ceph. As the 950 is not available anymore, I ordered a Samsung 970 1TB for testing, unfortunately, the "EVO" instead of PRO. Before equipping all nodes with these SSDs, I did some tests with "fio" as recommended, e.g. like this: fio --filename=/dev/DEVICE --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test The results are as the following: ----------------------- 1) Samsung 970 EVO NVMe M.2 mit PCIe Adapter Jobs: 1: read : io=26706MB, bw=445MiB/s, iops=113945, runt= 60001msec write: io=252576KB, bw=4.1MiB/s, iops=1052, runt= 60001msec Jobs: 4: read : io=21805MB, bw=432.7MiB/s, iops=93034, runt= 60001msec write: io=422204KB, bw=6.8MiB/s, iops=1759, runt= 60002msec Jobs: 10: read : io=26921MB, bw=448MiB/s, iops=114859, runt= 60001msec write: io=435644KB, bw=7MiB/s, iops=1815, runt= 60004msec ----------------------- So the read speed is impressive, but the write speed is really bad. Therefore I ordered the Samsung 970 PRO (1TB) as it has faster NAND chips (MLC instead of TLC). The results are, however even worse for writing: ----------------------- Samsung 970 PRO NVMe M.2 mit PCIe Adapter Jobs: 1: read : io=15570MB, bw=259.4MiB/s, iops=66430, runt= 60001msec write: io=199436KB, bw=3.2MiB/s, iops=830, runt= 60001msec Jobs: 4: read : io=48982MB, bw=816.3MiB/s, iops=208986, runt= 60001msec write: io=327800KB, bw=5.3MiB/s, iops=1365, runt= 60002msec Jobs: 10: read : io=91753MB, bw=1529.3MiB/s, iops=391474, runt= 60001msec write: io=343368KB, bw=5.6MiB/s, iops=1430, runt= 60005msec ----------------------- I did some research and found out, that the "--sync" flag sets the flag "O_DSYNC" which seems to disable the SSD cache which leads to these horrid write speeds. It seems that this relates to the fact that the write cache is only not disabled for SSDs which implement some kind of battery buffer that guarantees a data flush to the flash in case of a powerloss. However, It seems impossible to find out which SSDs do have this powerloss protection, moreover, these enterprise SSDs are crazy expensive compared to the SSDs above - moreover it's unclear if powerloss protection is even available in the NVMe form factor. So building a 1 or 2 TB cluster seems not really affordable/viable. So, can please anyone give me hints what to do? Is it possible to ensure that the write cache is not disabled in some way (my server is situated in a data center, so there will probably never be loss of power). Or is the link above already outdated as newer ceph releases somehow deal with this problem? Or maybe a later Debian release (10) will handle the O_DSYNC flag differently? Perhaps I should simply invest in faster (and bigger) harddisks and forget the SSD-cluster idea? Thank you in advance for any help, Best Regards, Hermann -- hermann(a)qwer.tk PGP/GPG: 299893C7 (on keyservers)

3 years, 7 months

12
18
0 0

cephadm - How to deploy ceph cluster with a partition on SSD for block.db

by klemen＠psi-net.si

I'm trying to deploy a ceph cluster with a cephadm tool. I've already successfully done all steps except adding OSDs. My testing equipment consists of three hosts. Each host has SSD storage, where OS is installed into. On that storage I created partition, which can be used as a ceph block.db. Hosts have also 2 additional HDs (spinning drives) for OSD data. On docs I couldn't find how to deploy such configuration. Do you have any hints, how to do that? Thanks for help!

3 years, 7 months

5
7
0 0

Release mental stress using Hong Kong Assignment help

by james wick

Many students can’t complete their assignments within the assigned date because of some unavoidable circumstances in Hong Kong. It’s true that your mental stress raises hindrance and disturbs your concentration for your work. Because of stress and tension, you will not have enough thoughts to make your work effective. This situation directs you to the platform of Assignment Help services even in Hong Kong. This is because an unstable mind could not generate the right ideas to compose a worthy assignment. Make your assignment informative and productive using the online writing services of assignment experts. Professionals have a better understanding of the subject and know how to frame all information in the right format. So, you can use online assignment help when you have issues in composing your academic papers irrespective of any subject. https://www.greatassignmenthelp.com/hk/

3 years, 7 months

4
3
0 0

add debian buster stable support for ceph-deploy

by Jelle de Jong

Hello everybody, Can somebody add support for Debian buster and ceph-deploy: https://tracker.ceph.com/issues/42870 Highly appreciated, Regards, Jelle de Jong

3 years, 7 months

6
8
0 0

[Ceph Octopus 15.2.3 ] MDS crashed suddenly

by carlimeunier＠gmail.com

Hi, I made a fresh install of Ceph Octopus 15.2.3 recently. And after a few days, the 2 standby MDS suddenly crashed with segmentation fault error. I try to restart it but it does not start. Here is the error : -20> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: _renew_subs -19> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: _send_mon_message to mon.2 at v1:172.31.36.98:6789/0 -18> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: handle_get_version_reply finishing 0x559dcf9530c0 version 269 -17> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: handle_get_version_reply finishing 0x559dcfa87520 version 269 -16> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: handle_get_version_reply finishing 0x559dcfa875c0 version 269 -15> 2020-07-17T13:50:27.888+0000 7fc8c6c51700 10 monclient: handle_get_version_reply finishing 0x559dcfa871c0 version 269 -14> 2020-07-17T13:50:27.888+0000 7fc8c8c55700 10 monclient: get_auth_request con 0x559dcfada000 auth_method 0 -13> 2020-07-17T13:50:27.888+0000 7fc8c9456700 10 monclient: get_auth_request con 0x559dcfada800 auth_method 0 -12> 2020-07-17T13:50:27.892+0000 7fc8bfc43700 1 mds.282966.journaler.mdlog(ro) recover start -11> 2020-07-17T13:50:27.892+0000 7fc8bfc43700 1 mds.282966.journaler.mdlog(ro) read_head -10> 2020-07-17T13:50:27.892+0000 7fc8bfc43700 4 mds.0.log Waiting for journal 0x200 to recover... -9> 2020-07-17T13:50:27.893+0000 7fc8c0444700 1 mds.282966.journaler.mdlog(ro) _finish_read_head loghead(trim 4194304, expire 4231216, write 4329405, stream_format 1). probing for end of log (from 4329405)... -8> 2020-07-17T13:50:27.893+0000 7fc8c0444700 1 mds.282966.journaler.mdlog(ro) probing for end of the log -7> 2020-07-17T13:50:27.893+0000 7fc8c0444700 1 mds.282966.journaler.mdlog(ro) _finish_probe_end write_pos = 4329949 (header had 4329405). recovered. -6> 2020-07-17T13:50:27.893+0000 7fc8bfc43700 4 mds.0.log Journal 0x200 recovered. -5> 2020-07-17T13:50:27.893+0000 7fc8bfc43700 4 mds.0.log Recovered journal 0x200 in format 1 -4> 2020-07-17T13:50:27.893+0000 7fc8bfc43700 2 mds.0.0 Booting: 1: loading/discovering base inodes -3> 2020-07-17T13:50:27.893+0000 7fc8bfc43700 0 mds.0.cache creating system inode with ino:0x100 -2> 2020-07-17T13:50:27.894+0000 7fc8bfc43700 0 mds.0.cache creating system inode with ino:0x1 -1> 2020-07-17T13:50:27.894+0000 7fc8c0444700 2 mds.0.0 Booting: 2: replaying mds log 0> 2020-07-17T13:50:27.896+0000 7fc8bec41700 -1 *** Caught signal (Segmentation fault) ** in thread 7fc8bec41700 thread_name:md_log_replay Here is the cluster information : # ceph status cluster: id: dd024fe1-4996-4fed-ba57-03090e53724d health: HEALTH_WARN 20 daemons have recently crashed services: mon: 3 daemons, quorum 2,0,1 (age 2d) mgr: mgr.0(active, since 9d), standbys: mgr.2, mgr.1 mds: cephfs:1 {0=node0=up:active} 1 up:standby-replay 1 up:standby osd: 3 osds: 3 up (since 28h), 3 in (since 9d) task status: scrub status: mds.node0: idle mds.node2: idle data: pools: 3 pools, 49 pgs objects: 29 objects, 170 KiB usage: 3.0 GiB used, 41 TiB / 41 TiB avail pgs: 49 active+clean io: client: 853 B/s rd, 1 op/s rd, 0 op/s wr There is only 1 client connected to the cluster. Please, does anyone have any idea? Thanks

3 years, 7 months

3
4
0 0

2024

2023

2022

2021

2020

2019

ceph-users July 2020