- ceph-users - lists.ceph.io

by Kees Meijs

Hi list, I'm in the middle of an OpenStack migration (obviously Ceph backed) and stumble into some huge virtual machines. To ensure downtime is kept to a minimum, I'm thinking of using Ceph's snapshot features using rbd export-diff and import-diff. However, is it safe (or even supported) to do this across versions? The source cluster is running 10.2.11 and destination is 12.2.11 Thanks in advance! Regards, Kees -- https://nefos.nl/contact Nefos IT bv Ambachtsweg 25 (industrienummer 4217) 5627 BZ Eindhoven Nederland KvK 66494931 /Aanwezig op maandag, dinsdag, woensdag en vrijdag/

3 years, 11 months

5
11
0 0

Remove or recreate damaged PG in erasure coding pool

by Francois Legrand

Hello, We run nautilus 14.2.8 ceph cluster. After a big crash in which we lost some disks we had a PG down (Erasure coding 3+2 pool) and trying to fix it we followed this https://medium.com/opsops/recovering-ceph-from-reduced-data-availability-3-… As the PG was reported with 0 objects we first marked a shard as complete with ceph-objectstore-tool and restart the osd. The pg thus went active but reported lost objects ! As we consider the datas on this pg as lost, we try to get rid of this with ceph pg 30.3 mark_unfound_lost delete. This produced some logs like (~3 lines/hour): 2020-05-12 14:45:05.251830 osd.103 (osd.103) 886 : cluster [ERR] 30.3s0 Unexpected Error: recovery ending with 41: {30:c000e27d:::rbd_data.34.c963b6314efb84.000000000 0000100:head=435293'2 flags = delete,30:c01f1248:::rbd_data.34.7f0c0d1df22f45.0000000000000325:head=435293'3 flags = delete,30:c05e82b2:::rbd_data.34.674d063bdc66d2.0 000000000000015:head=435293'4 flags = delete,30:c0b2d8e7:::rbd_data.34.6bc88749c741cb.00000000000007d0:head=435293'5 flags = delete,30:c0c3e20e:::rbd_data.34.674d063b dc66d2.00000000000000fb:head=435293'6 flags = delete,30:c0c89740:::rbd_data.34.a7f2202210bb39.0000000000000bbc:head=435293'7 flags = delete,30:c0e59ffa:::rbd_data.34. 7f0c0d1df22f45.00000000000002fb:head=435293'8 flags = delete,30:c0e72bf4:::rbd_data.34.7f0c0d1df22f45.00000000000000fa:head=435293'9 flags = delete,30:c10ab507:::rbd_ data.34.80695c646d9535.0000000000000327:head=435293'10 flags = delete,30:c219e412:::rbd_data.34.a7f2202210bb39.0000000000000fa0:head=435293'11 flags = delete,30:c29ae ba3:::rbd_data.34.8038585a0eb9f6.0000000000000eb2:head=435293'12 flags = delete,30:c29fae09:::rbd_data.34.674d063bdc66d2.000000000000148a:head=435293'13 flags = delet e,30:c2b77a99:::rbd_data.34.7f0c0d1df22f45.000000000000031d:head=435293'14 flags = delete,30:c2c8598f:::rbd_data.34.674d063bdc66d2.00000000000002f5:head=435293'15 fla gs = delete,30:c2dd39fe:::rbd_data.34.6494fb1b0f88bf.000000000000030b:head=435293'16 flags = delete,30:c2f6ce39:::rbd_data.34.806ab864459ae5.0000000000000109:head=435 293'17 flags = delete,30:c2f8a62f:::rbd_data.34.ed0c58ebdc770f.000000000000002a:head=435293'18 flags = delete,30:c306cd86:::rbd_data.34.ed0c58ebdc770f.000000000000020 5:head=435293'19 flags = delete,30:c30f5230:::rbd_data.34.7f0c0d1df22f45.00000000000002f5:head=435293'20 flags = delete,30:c32b81df:::rbd_data.34.c79f6d1f78a707.00000 00000000100:head=435293'21 flags = delete,30:c3374080:::rbd_data.34.7f217e33dd742c.00000000000007d0:head=435293'22 flags = delete,30:c3cdbeb5:::rbd_data.34.674dcefe97 f606.0000000000000109:head=435293'23 flags = delete,30:c3cdd149:::rbd_data.34.674dcefe97f606.0000000000000019:head=435293'24 flags = delete,30:c40946c0:::rbd_data.34. ded8d21a9d3d8f.00000000000002a8:head=435293'25 flags = delete,30:c42ed4fd:::rbd_data.34.a6985314ad8dad.0000000000000200:head=435293'26 flags = delete,30:c483a99b:::rb d_data.34.ed0c58ebdc770f.0000000000000a00:head=435293'27 flags = delete,30:c49f09d6:::rbd_data.34.7e1c1abf436885.0000000000000bb8:head=435293'28 flags = delete,30:c51 5a4e8:::rbd_data.34.ed0c58ebdc770f.0000000000000106:head=435293'29 flags = delete,30:c5181a8e:::rbd_data.34.9385d45172fa0f.000000000000020c:head=435293'30 flags = del ete,30:c531de44:::rbd_data.34.6bc88749c741cb.0000000000000102:head=435293'31 flags = delete,30:c5427518:::rbd_data.34.806ab864459ae5.00000000000006db:head=435293'32 f lags = delete,30:c5693b53:::rbd_data.34.6494fb1b0f88bf.000000000000148a:head=435293'33 flags = delete,30:c5804bc9:::rbd_data.34.ed0cb8730e020c.0000000000000105:head=4 35293'34 flags = delete,30:c598117e:::rbd_data.34.7f0811fbac0b9d.0000000000000327:head=435293'35 flags = delete,30:c5a64fbd:::rbd_data.34.c963b6314efb84.0000000000000 010:head=435293'36 flags = delete,30:c5f9e0e5:::rbd_data.34.ed0c58ebdc770f.0000000000000f01:head=435293'37 flags = delete,30:c5ffe1d8:::rbd_data.34.6bc88749c741cb.000 0000000000abe:head=435293'38 flags = delete,30:c6ecfaa1:::rbd_data.34.9385d45172fa0f.0000000000000002:head=435293'39 flags = delete,30:c755550f:::rbd_data.34.6494fb1b 0f88bf.0000000000000106:head=435293'40 flags = delete,30:c7a730f4:::rbd_data.34.7f217e33dd742c.00000000000006e1:head=435293'41 flags = delete,30:c7aa79f7:::rbd_data.3 4.674dcefe97f606.0000000000000108:head=435293'42 flags = delete} But yesterday it started to flood the logs (~9 GB of logs/day !) with lines like : 2020-05-14 10:36:03.851258 osd.29 [ERR] Error -2 reading object 30:c24a0173:::rbd_data.34.806ab864459ae5.000000000000022d:head 2020-05-14 10:36:03.851333 osd.29 [ERR] Error -2 reading object 30:c4a41972:::rbd_data.34.6bc88749c741cb.0000000000000320:head 2020-05-14 10:36:03.851382 osd.29 [ERR] Error -2 reading object 30:c543da6f:::rbd_data.34.80695c646d9535.0000000000000dce:head 2020-05-14 10:36:03.859900 osd.29 [ERR] Error -2 reading object 30:c24a0173:::rbd_data.34.806ab864459ae5.000000000000022d:head 2020-05-14 10:36:03.859979 osd.29 [ERR] Error -2 reading object 30:c4a41972:::rbd_data.34.6bc88749c741cb.0000000000000320:head We think that the best would probably to completely delete this pg. Is that possible without totally breaking the pool ? How ? Do we need to recreate the pg manually (or ceph will do it automatically) ? Thanks for you help. F.

3 years, 11 months

1
0
0 0

ceph orch ps => osd <unknown> (Octopus 15.2.1)

by Ml Ml

Hello, any idea what´s wrong with my osd.34+35? root@ceph01:~# ceph orch ps NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID (...) osd.34 ceph04 running - - <unknown> <unknown> <unknown> <unknown> osd.35 ceph04 running - - <unknown> <unknown> <unknown> <unknown> (...) root 9411 0.3 0.1 4471132 55732 ? Ssl May04 42:43 /usr/sbin/dockerd -H fd:// root 9429 0.2 0.0 4644008 28456 ? Ssl May04 29:15 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml --log-level info root 15536 0.0 0.0 774832 3980 ? Sl May04 0:48 \_ docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/a2f6 167 15553 4.7 12.4 5905888 4068064 ? Ssl May04 654:21 | \_ /usr/bin/ceph-osd -n osd.34 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true root 17168 0.0 0.0 848628 3736 ? Sl May04 0:50 \_ docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/5135 167 17186 4.6 12.7 6006116 4179644 ? Ssl May04 632:40 | \_ /usr/bin/ceph-osd -n osd.35 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true ceph osd tree -11 10.68115 host ceph04 34 hdd 2.67029 osd.34 up 0.90002 1.00000 35 hdd 2.67029 osd.35 up 0.80005 1.00000 44 hdd 2.67029 osd.44 up 0.95001 1.00000 45 hdd 2.67029 osd.45 up 1.00000 1.00000 Thanks, Michael

3 years, 11 months

1
0
0 0

OSD weight on Luminous

by Florent B.

Hi, I have something strange on a Ceph Luminous cluster. All OSDs have the same size, the same weight, and one of them is used at 88% by Ceph (osd.3) while others are around 40 to 50% usage : # ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE DATA OMAP META AVAIL %USE VAR PGS 2 hdd 0.49179 1.00000 504GiB 264GiB 263GiB 63.7MiB 960MiB 240GiB 52.34 1.14 81 13 hdd 0.49179 1.00000 504GiB 267GiB 266GiB 55.7MiB 1.37GiB 236GiB 53.09 1.16 94 20 hdd 0.49179 1.00000 504GiB 235GiB 234GiB 62.5MiB 962MiB 268GiB 46.70 1.02 99 21 hdd 0.49179 1.00000 504GiB 306GiB 305GiB 65.2MiB 991MiB 198GiB 60.75 1.32 87 22 hdd 0.49179 1.00000 504GiB 185GiB 184GiB 51.9MiB 972MiB 318GiB 36.83 0.80 73 23 hdd 0.49179 1.00000 504GiB 167GiB 166GiB 60.9MiB 963MiB 337GiB 33.07 0.72 80 24 hdd 0.49179 1.00000 504GiB 235GiB 234GiB 67.5MiB 956MiB 268GiB 46.74 1.02 90 25 hdd 0.49179 1.00000 504GiB 183GiB 182GiB 68.8MiB 955MiB 321GiB 36.32 0.79 100 3 hdd 0.49179 1.00000 504GiB 442GiB 440GiB 77.5MiB 1.15GiB 61.9GiB 87.70 1.91 103 26 hdd 0.49179 1.00000 504GiB 220GiB 219GiB 61.2MiB 963MiB 283GiB 43.78 0.95 80 29 hdd 0.49179 1.00000 504GiB 298GiB 296GiB 77.4MiB 1013MiB 206GiB 59.09 1.29 106 30 hdd 0.49179 1.00000 504GiB 183GiB 182GiB 60.2MiB 964MiB 321GiB 36.32 0.79 88 10 hdd 0.49179 1.00000 504GiB 176GiB 175GiB 56.5MiB 968MiB 327GiB 35.02 0.76 85 11 hdd 0.49179 1.00000 504GiB 209GiB 208GiB 62.5MiB 961MiB 295GiB 41.42 0.90 89 0 hdd 0.49179 1.00000 504GiB 253GiB 252GiB 55.7MiB 968MiB 251GiB 50.18 1.09 76 1 hdd 0.49179 1.00000 504GiB 199GiB 198GiB 60.4MiB 964MiB 305GiB 39.51 0.86 92 16 hdd 0.49179 1.00000 504GiB 219GiB 218GiB 58.2MiB 966MiB 284GiB 43.51 0.95 85 17 hdd 0.49179 1.00000 504GiB 231GiB 230GiB 69.0MiB 955MiB 272GiB 45.97 1.00 97 14 hdd 0.49179 1.00000 504GiB 210GiB 209GiB 61.0MiB 963MiB 293GiB 41.72 0.91 74 15 hdd 0.49179 1.00000 504GiB 182GiB 181GiB 50.7MiB 973MiB 322GiB 36.10 0.79 72 18 hdd 0.49179 1.00000 504GiB 297GiB 296GiB 53.7MiB 978MiB 206GiB 59.03 1.29 87 19 hdd 0.49179 1.00000 504GiB 125GiB 124GiB 61.9MiB 962MiB 379GiB 24.81 0.54 82 TOTAL 10.8TiB 4.97TiB 4.94TiB 1.33GiB 21.4GiB 5.85TiB 45.91 MIN/MAX VAR: 0.54/1.91 STDDEV: 12.80 Is it a normal situation ? Is there any way to let Ceph handle this alone or am I forced to reweight the OSD manually ? Thank you. Florent

3 years, 11 months

2
1
0 0

Re: Memory usage of OSD

by Mark Nelson

Coincidentally Adam on our core team just reported this morning that he saw extremely high bluestore_cache_other memory usage while running compression performance tests as well. That may indicate we have a memory leak related to the compression code. I doubt setting the memory_target to 3GiB will help in the long run as that will just attempt to compensate by decreasing the other caches until nothing else can be shrunk. Adam said he's planning to investigate so hopefully we will know more soon. Mark On 5/13/20 10:52 AM, Rafał Wądołowski wrote: > Mark, > Unfortunetly I closed terminal with mempool. But there was a lot of > bytes used by bluestore_cache_other. That was the highest value (about > 85%). The onode cache takes about 10%. PGlog and osdmaps was okey, low > values. I saw some ideas that maybe compression_mode force in pool can > make a mess. > One more thing, we are running stupid allocator. Right now I am > decrease the osd_memory_target to 3GiB and will wait if ram problem > occurs. > > > > Regards, > > */Rafał Wądołowski/* > > ------------------------------------------------------------------------ > *From:* Mark Nelson <mnelson(a)redhat.com> > *Sent:* Wednesday, May 13, 2020 3:30 PM > *To:* ceph-users(a)ceph.io <ceph-users(a)ceph.io> > *Subject:* [ceph-users] Re: Memory usage of OSD > On 5/13/20 12:43 AM, RafaĹ‚ WÄ…doĹ‚owski wrote: > > Hi, > > I noticed strange situation in one of our clusters. The OSD deamons > are taking too much RAM. > > We are running 12.2.12 and have default configuration of > osd_memory_target (4GiB). > > Heap dump shows: > > > > osd.2969 dumping heap profile now. > > ------------------------------------------------ > > MALLOC: 6381526944 ( 6085.9 MiB) Bytes in use by application > > MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist > > MALLOC: + 173373288 ( 165.3 MiB) Bytes in central cache freelist > > MALLOC: + 17163520 ( 16.4 MiB) Bytes in transfer cache freelist > > MALLOC: + 95339512 ( 90.9 MiB) Bytes in thread cache freelists > > MALLOC: + 28995744 ( 27.7 MiB) Bytes in malloc metadata > > MALLOC: ------------ > > MALLOC: = 6696399008 ( 6386.2 MiB) Actual memory used (physical + > swap) > > MALLOC: + 218267648 ( 208.2 MiB) Bytes released to OS (aka unmapped) > > MALLOC: ------------ > > MALLOC: = 6914666656 ( 6594.3 MiB) Virtual address space used > > MALLOC: > > MALLOC: 408276 Spans in use > > MALLOC: 75 Thread heaps in use > > MALLOC: 8192 Tcmalloc page size > > ------------------------------------------------ > > Call ReleaseFreeMemory() to release freelist memory to the OS (via > madvise()). > > Bytes released to the OS take up virtual address space but no > physical memory. > > > > IMO "Bytes in use by application" should be less than > osd_memory_target. Am I correct? > > I checked heap dump with google-pprof and got following results. > > Total: 149.4 MB > > 60.5 40.5% 40.5% 60.5 40.5% > rocksdb::UncompressBlockContentsForCompressionType > > 34.2 22.9% 63.4% 34.2 22.9% > ceph::buffer::create_aligned_in_mempool > > 11.9 7.9% 71.3% 12.1 8.1% > std::_Rb_tree::_M_emplace_hint_unique > > 10.7 7.1% 78.5% 71.2 47.7% rocksdb::ReadBlockContents > > > > Does it mean that most of RAM is used by rocksdb? > > > It looks like your heap dump is only accounting for 149.4MB of the > memory so probably not representative across the whole ~6.5G. Instead > could you try dumping the mempools via "ceph daemon osd.2969 > dump_mempools"? > > > > > > How can I take a deeper look into memory usage ? > > > Beyond looking at the mempools, you can see the bluestore cache > allocation information by either enabling debug bluestore and debug > priority_cache_manager 5, or potentially looking at the PCM perf > counters (I'm not sure if those were in 14.2.12 though). Between the > heap data, mempool data, and priority cache records, it should become > clearer what's going on. > > > Mark > > > > > > > > Regards, > > > > RafaĹ‚ WÄ…doĹ‚owski > > > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

3 years, 11 months

4
3
0 0

ceph osd set-require-min-compat-client jewel failure

by 潘东元

hi,every one, my ceph version 12.2.12，I want to set require min compat client luminous,I use command #ceph osd set-require-min-compat-client luminous but ceph report:Error EPERM: cannot set require_min_compat_client to luminous: 4 connected client(s) look like jewel (missing 0xa00000000200000); add --yes-i-really-mean-it to do it anyway [root@node-1 ~]# ceph features { "mon": { "group": { "features": "0x3ffddff8eeacfffb", "release": "luminous", "num": 3 } }, "osd": { "group": { "features": "0x3ffddff8eeacfffb", "release": "luminous", "num": 15 } }, "client": { "group": { "features": "0x40106b84a842a52", "release": "jewel", "num": 4 }, "group": { "features": "0x3ffddff8eeacfffb", "release": "luminous", "num": 168 } } } so,I run command: [root@node-1 gyt]# ceph osd set-require-min-compat-client luminous --yes-i-really-mean-it set require_min_compat_client to luminous but now,I want to set require min compat client jewel,I use command： [root@node-1 gyt]# ceph osd set-require-min-compat-client jewel Error EPERM: osdmap current utilizes features that require luminous; cannot set require_min_compat_client below that to jewel what‘s the way we are set luminous chang to jewel？

3 years, 11 months

3
2
0 0

What is ceph doing after sync

by Zhenshi Zhou

Hi, I deployed a multi-site in order to sync data from a cluster to anther. The data is fully synced(I suppose) and the cluster has no traffic at present. Everything seems fine. However, the sync status is not what I expected. Is there any step after data transfer? Can I change the master zone to my new zone? Can I stop the old cluster? sudo radosgw-admin sync status realm bde4bb56-fbca-4ef8-a979-935dbf109b78 (cn) zonegroup d25ae683-cdb8-4227-be45-ebaf0aed6050 (beijing) zone 313c8244-fe4d-4d46-bf9b-0e33e46be041 (newzone) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: f70a5eb9-d88d-42fd-ab4e-d300e97094de (oldzone) syncing full sync: 1/128 shards full sync: 0 buckets to sync incremental sync: 127/128 shards data is behind on 14 shards behind shards: [3,21,42,54,55,62,71,75,92,95,104,106,108,122]

3 years, 11 months

1
0
0 0

Re: Memory usage of OSD

by Mark Nelson

On 5/13/20 12:43 AM, RafaĹ WÄdoĹowski wrote: > Hi, > I noticed strange situation in one of our clusters. The OSD deamons are taking too much RAM. > We are running 12.2.12 and have default configuration of osd_memory_target (4GiB). > Heap dump shows: > > osd.2969 dumping heap profile now. > ------------------------------------------------ > MALLOC: 6381526944 ( 6085.9 MiB) Bytes in use by application > MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist > MALLOC: + 173373288 ( 165.3 MiB) Bytes in central cache freelist > MALLOC: + 17163520 ( 16.4 MiB) Bytes in transfer cache freelist > MALLOC: + 95339512 ( 90.9 MiB) Bytes in thread cache freelists > MALLOC: + 28995744 ( 27.7 MiB) Bytes in malloc metadata > MALLOC: ------------ > MALLOC: = 6696399008 ( 6386.2 MiB) Actual memory used (physical + swap) > MALLOC: + 218267648 ( 208.2 MiB) Bytes released to OS (aka unmapped) > MALLOC: ------------ > MALLOC: = 6914666656 ( 6594.3 MiB) Virtual address space used > MALLOC: > MALLOC: 408276 Spans in use > MALLOC: 75 Thread heaps in use > MALLOC: 8192 Tcmalloc page size > ------------------------------------------------ > Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). > Bytes released to the OS take up virtual address space but no physical memory. > > IMO "Bytes in use by application" should be less than osd_memory_target. Am I correct? > I checked heap dump with google-pprof and got following results. > Total: 149.4 MB > 60.5 40.5% 40.5% 60.5 40.5% rocksdb::UncompressBlockContentsForCompressionType > 34.2 22.9% 63.4% 34.2 22.9% ceph::buffer::create_aligned_in_mempool > 11.9 7.9% 71.3% 12.1 8.1% std::_Rb_tree::_M_emplace_hint_unique > 10.7 7.1% 78.5% 71.2 47.7% rocksdb::ReadBlockContents > > Does it mean that most of RAM is used by rocksdb? It looks like your heap dump is only accounting for 149.4MB of the memory so probably not representative across the whole ~6.5G. Instead could you try dumping the mempools via "ceph daemon osd.2969 dump_mempools"? > > How can I take a deeper look into memory usage ? Beyond looking at the mempools, you can see the bluestore cache allocation information by either enabling debug bluestore and debug priority_cache_manager 5, or potentially looking at the PCM perf counters (I'm not sure if those were in 14.2.12 though). Between the heap data, mempool data, and priority cache records, it should become clearer what's going on. Mark > > > Regards, > > RafaĹ WÄdoĹowski > > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io >

3 years, 11 months

2
1
0 0

Erasure coded pool queries

by Biswajeet Patra

Hi, I have created an erasure coded pool and the below default parameters related to stripe sizes are present. "osd_pool_erasure_code_stripe_width": "4096" --> 4KB "rgw_obj_stripe_size": "4194304" --> 4MB Let say the k+m values are 10+5 for the erasure pool, and we upload an object of let say, size <4MB & another object of size >4MB. How will ceph break the object into chunks and store it & how many stripes will be created? Regards, Biswajeet -- *-----------------------------------------------------------------------------------------* *This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this email. Please notify the sender immediately by email if you have received this email by mistake and delete this email from your system. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.***** **** *Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the organization. Any information on shares, debentures or similar instruments, recommended product pricing, valuations and the like are for information purposes only. It is not meant to be an instruction or recommendation, as the case may be, to buy or to sell securities, products, services nor an offer to buy or sell securities, products or services unless specifically stated to be so on behalf of the Flipkart group. Employees of the Flipkart group of companies are expressly required not to make defamatory statements and not to infringe or authorise any infringement of copyright or any other legal right by email communications. Any such communication is contrary to organizational policy and outside the scope of the employment of the individual concerned. The organization will not accept any liability in respect of such communication, and the employee responsible will be personally liable for any damages or other liability arising.***** **** *Our organization accepts no liability for the content of this email, or for the consequences of any actions taken on the basis of the information *provided,* unless that information is subsequently confirmed in writing. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.* _-----------------------------------------------------------------------------------------_

3 years, 11 months

2
1
0 0

Disproportionate Metadata Size

by Denis Krienbühl

Hi On one of our Ceph clusters, some OSDs have been marked as full. Since this is a staging cluster that does not have much data on it, this is strange. Looking at the full OSDs through “ceph osd df” I figured out that the space is mostly used by metadata: SIZE: 122 GiB USE: 118 GiB DATA: 2.4 GiB META: 116 GiB We run mimic, and for the affected OSDs we use a db device (nvme) in addition to the primary device (hdd). In the logs we see the following errors: 2020-05-12 17:10:26.089 7f183f604700 1 bluefs _allocate failed to allocate 0x400000 on bdev 1, free 0x0; fallback to bdev 2 2020-05-12 17:10:27.113 7f183f604700 1 bluestore(/var/lib/ceph/osd/ceph-8) _balance_bluefs_freespace gifting 0x180a000000~400000 to bluefs 2020-05-12 17:10:27.153 7f183f604700 1 bluefs add_block_extent bdev 2 0x180a000000~400000 We assume it is an issue with Rocksdb, as the following call will quickly fix the problem: ceph daemon osd.8 compact The question is, why is this happening? I would think that “compact" is something that runs automatically from time to time, but I’m not sure. Is it on us to run this regularly? Any pointers are welcome. I’m quite new to Ceph :) Cheers, Denis

3 years, 11 months

3
3
0 0

2024

2023

2022

2021

2020

2019

ceph-users