February 2020 - ceph-users

by Raymond Clotfelter

I have 30 or so OSDs on a cluster with 240 that just keep crashing. Below is the last part of one of the log files showing the crash, can anyone please help me read this to figure out what is going on and how to correct it? When I start the OSDs they generally seem to work for 5-30 minutes, and then one by one they will start dropping out with logs similar to this. Thanks. -29> 2020-02-04 06:00:23.447 7fe300d41700 5 osd.168 pg_epoch: 459335 pg[6.1217s2( v 443432'5 (0'0,443432'5] local-lis/les=459328/459329 n=1 ec=260267/6574 lis/c 459331/428950 les/c/f 459332/440468/290442 459333/459334/459294) [2147483647,107,168,2147483647,102]/[81,107,168,89,102]p81(0) r=2 lpr=459334 pi=[428950,459334)/82 crt=443432'5 lcod 0'0 remapped NOTIFY mbc={}] exit Started/Stray 1.017145 6 0.000323 -28> 2020-02-04 06:00:23.447 7fe300d41700 5 osd.168 pg_epoch: 459335 pg[6.1217s2( v 443432'5 (0'0,443432'5] local-lis/les=459328/459329 n=1 ec=260267/6574 lis/c 459331/428950 les/c/f 459332/440468/290442 459333/459334/459294) [2147483647,107,168,2147483647,102]/[81,107,168,89,102]p81(0) r=2 lpr=459334 pi=[428950,459334)/82 crt=443432'5 lcod 0'0 remapped NOTIFY mbc={}] enter Started/ReplicaActive -27> 2020-02-04 06:00:23.447 7fe300d41700 5 osd.168 pg_epoch: 459335 pg[6.1217s2( v 443432'5 (0'0,443432'5] local-lis/les=459328/459329 n=1 ec=260267/6574 lis/c 459331/428950 les/c/f 459332/440468/290442 459333/459334/459294) [2147483647,107,168,2147483647,102]/[81,107,168,89,102]p81(0) r=2 lpr=459334 pi=[428950,459334)/82 crt=443432'5 lcod 0'0 remapped NOTIFY mbc={}] enter Started/ReplicaActive/RepNotRecovering -26> 2020-02-04 06:00:23.455 7fe309d53700 3 osd.168 459335 handle_osd_map epochs [459335,459335], i have 459335, src has [403399,459335] -25> 2020-02-04 06:00:23.455 7fe309d53700 3 osd.168 459335 handle_osd_map epochs [459335,459335], i have 459335, src has [403399,459335] -24> 2020-02-04 06:00:23.455 7fe309d53700 3 osd.168 459335 handle_osd_map epochs [459335,459335], i have 459335, src has [403399,459335] -23> 2020-02-04 06:00:23.459 7fe309d53700 3 osd.168 459335 handle_osd_map epochs [459335,459335], i have 459335, src has [403399,459335] -22> 2020-02-04 06:00:23.459 7fe309d53700 3 osd.168 459335 handle_osd_map epochs [459335,459335], i have 459335, src has [403399,459335] -21> 2020-02-04 06:00:23.459 7fe309d53700 3 osd.168 459335 handle_osd_map epochs [459335,459335], i have 459335, src has [403399,459335] -20> 2020-02-04 06:00:23.459 7fe309d53700 3 osd.168 459335 handle_osd_map epochs [459335,459335], i have 459335, src has [403399,459335] -19> 2020-02-04 06:00:23.463 7fe309d53700 3 osd.168 459335 handle_osd_map epochs [459335,459335], i have 459335, src has [403399,459335] -18> 2020-02-04 06:00:23.463 7fe309d53700 3 osd.168 459335 handle_osd_map epochs [459335,459335], i have 459335, src has [403399,459335] -17> 2020-02-04 06:00:23.471 7fe309d53700 3 osd.168 459335 handle_osd_map epochs [459335,459335], i have 459335, src has [403399,459335] -16> 2020-02-04 06:00:23.471 7fe309d53700 3 osd.168 459335 handle_osd_map epochs [459335,459335], i have 459335, src has [403399,459335] -15> 2020-02-04 06:00:23.471 7fe300d41700 5 osd.168 pg_epoch: 459335 pg[6.1217s2( v 443432'5 (0'0,443432'5] local-lis/les=459334/459335 n=1 ec=260267/6574 lis/c 459334/428950 les/c/f 459335/440468/290442 459333/459334/459294) [2147483647,107,168,2147483647,102]/[81,107,168,89,102]p81(0) r=2 lpr=459334 pi=[428950,459334)/82 luod=0'0 crt=443432'5 lcod 0'0 active+remapped mbc={}] exit Started/ReplicaActive/RepNotRecovering 0.021923 2 0.000098 -14> 2020-02-04 06:00:23.471 7fe300d41700 5 osd.168 pg_epoch: 459335 pg[6.1217s2( v 443432'5 (0'0,443432'5] local-lis/les=459334/459335 n=1 ec=260267/6574 lis/c 459334/428950 les/c/f 459335/440468/290442 459333/459334/459294) [2147483647,107,168,2147483647,102]/[81,107,168,89,102]p81(0) r=2 lpr=459334 pi=[428950,459334)/82 luod=0'0 crt=443432'5 lcod 0'0 active+remapped mbc={}] enter Started/ReplicaActive/RepWaitRecoveryReserved -13> 2020-02-04 06:00:23.471 7fe300d41700 5 osd.168 pg_epoch: 459335 pg[6.1217s2( v 443432'5 (0'0,443432'5] local-lis/les=459334/459335 n=1 ec=260267/6574 lis/c 459334/428950 les/c/f 459335/440468/290442 459333/459334/459294) [2147483647,107,168,2147483647,102]/[81,107,168,89,102]p81(0) r=2 lpr=459334 pi=[428950,459334)/82 luod=0'0 crt=443432'5 lcod 0'0 active+remapped mbc={}] exit Started/ReplicaActive/RepWaitRecoveryReserved 0.000137 1 0.000080 -12> 2020-02-04 06:00:23.471 7fe300d41700 5 osd.168 pg_epoch: 459335 pg[6.1217s2( v 443432'5 (0'0,443432'5] local-lis/les=459334/459335 n=1 ec=260267/6574 lis/c 459334/428950 les/c/f 459335/440468/290442 459333/459334/459294) [2147483647,107,168,2147483647,102]/[81,107,168,89,102]p81(0) r=2 lpr=459334 pi=[428950,459334)/82 luod=0'0 crt=443432'5 lcod 0'0 active+remapped mbc={}] enter Started/ReplicaActive/RepRecovering -11> 2020-02-04 06:00:23.471 7fe309d53700 3 osd.168 459335 handle_osd_map epochs [459335,459335], i have 459335, src has [403399,459335] -10> 2020-02-04 06:00:23.475 7fe309d53700 3 osd.168 459335 handle_osd_map epochs [459335,459335], i have 459335, src has [403399,459335] -9> 2020-02-04 06:00:23.483 7fe300540700 5 osd.168 pg_epoch: 459335 pg[6.1961s0( v 436281'1 (0'0,436281'1] local-lis/les=459334/459335 n=0 ec=260267/6574 lis/c 459334/406589 les/c/f 459335/436403/290442 459333/459334/459334) [168,2147483647,2147483647,196,151]/[168,116,60,196,151]p168(0) r=0 lpr=459334 pi=[436277,459334)/33 crt=436281'1 lcod 0'0 mlcod 0'0 active+degraded+remapped m=30 mbc={0={(0+0)=30},1={(0+2)=30},2={(0+3)=30},3={(1+1)=30},4={(0+0)=30}}] exit Started/Primary/Active/Activating 0.034400 28 0.002950 -8> 2020-02-04 06:00:23.483 7fe300540700 5 osd.168 pg_epoch: 459335 pg[6.1961s0( v 436281'1 (0'0,436281'1] local-lis/les=459334/459335 n=0 ec=260267/6574 lis/c 459334/406589 les/c/f 459335/436403/290442 459333/459334/459334) [168,2147483647,2147483647,196,151]/[168,116,60,196,151]p168(0) r=0 lpr=459334 pi=[436277,459334)/33 crt=436281'1 lcod 0'0 mlcod 0'0 active+degraded+remapped m=30 mbc={0={(0+0)=30},1={(0+2)=30},2={(0+3)=30},3={(1+1)=30},4={(0+0)=30}}] enter Started/Primary/Active/WaitLocalRecoveryReserved -7> 2020-02-04 06:00:23.483 7fe300540700 5 osd.168 pg_epoch: 459335 pg[6.1961s0( v 436281'1 (0'0,436281'1] local-lis/les=459334/459335 n=0 ec=260267/6574 lis/c 459334/406589 les/c/f 459335/436403/290442 459333/459334/459334) [168,2147483647,2147483647,196,151]/[168,116,60,196,151]p168(0) r=0 lpr=459334 pi=[436277,459334)/33 crt=436281'1 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded+remapped m=30 mbc={0={(0+0)=30},1={(0+2)=30},2={(0+3)=30},3={(1+1)=30},4={(0+0)=30}}] exit Started/Primary/Active/WaitLocalRecoveryReserved 0.000213 1 0.000221 -6> 2020-02-04 06:00:23.483 7fe300540700 5 osd.168 pg_epoch: 459335 pg[6.1961s0( v 436281'1 (0'0,436281'1] local-lis/les=459334/459335 n=0 ec=260267/6574 lis/c 459334/406589 les/c/f 459335/436403/290442 459333/459334/459334) [168,2147483647,2147483647,196,151]/[168,116,60,196,151]p168(0) r=0 lpr=459334 pi=[436277,459334)/33 crt=436281'1 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded+remapped m=30 mbc={0={(0+0)=30},1={(0+2)=30},2={(0+3)=30},3={(1+1)=30},4={(0+0)=30}}] enter Started/Primary/Active/WaitRemoteRecoveryReserved -5> 2020-02-04 06:00:23.483 7fe300540700 5 osd.168 pg_epoch: 459335 pg[6.1961s0( v 436281'1 (0'0,436281'1] local-lis/les=459334/459335 n=0 ec=260267/6574 lis/c 459334/406589 les/c/f 459335/436403/290442 459333/459334/459334) [168,2147483647,2147483647,196,151]/[168,116,60,196,151]p168(0) r=0 lpr=459334 pi=[436277,459334)/33 crt=436281'1 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded+remapped m=30 mbc={0={(0+0)=30},1={(0+2)=30},2={(0+3)=30},3={(1+1)=30},4={(0+0)=30}}] exit Started/Primary/Active/WaitRemoteRecoveryReserved 0.002796 5 0.000231 -4> 2020-02-04 06:00:23.483 7fe300540700 5 osd.168 pg_epoch: 459335 pg[6.1961s0( v 436281'1 (0'0,436281'1] local-lis/les=459334/459335 n=0 ec=260267/6574 lis/c 459334/406589 les/c/f 459335/436403/290442 459333/459334/459334) [168,2147483647,2147483647,196,151]/[168,116,60,196,151]p168(0) r=0 lpr=459334 pi=[436277,459334)/33 crt=436281'1 lcod 0'0 mlcod 0'0 active+recovery_wait+degraded+remapped m=30 mbc={0={(0+0)=30},1={(0+2)=30},2={(0+3)=30},3={(1+1)=30},4={(0+0)=30}}] enter Started/Primary/Active/Recovering -3> 2020-02-04 06:00:23.491 7fe309d53700 3 osd.168 459335 handle_osd_map epochs [459335,459335], i have 459335, src has [403399,459335] -2> 2020-02-04 06:00:23.519 7fe309d53700 3 osd.168 459335 handle_osd_map epochs [459335,459335], i have 459335, src has [403399,459335] -1> 2020-02-04 06:00:23.779 7fe300540700 -1 /build/ceph-14.2.7/src/osd/ECBackend.cc: In function 'void ECBackend::do_read_op(ECBackend::ReadOp&)' thread 7fe300540700 time 2020-02-04 06:00:23.774430 /build/ceph-14.2.7/src/osd/ECBackend.cc: 1742: FAILED ceph_assert(!need_attrs) ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x5579b9ecce4c] 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x5579b9ecd027] 3: (ECBackend::do_read_op(ECBackend::ReadOp&)+0xf76) [0x5579ba295106] 4: (ECBackend::send_all_remaining_reads(hobject_t const&, ECBackend::ReadOp&)+0x4d1) [0x5579ba2a6cf1] 5: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*, ZTracer::Trace const&)+0xcf6) [0x5579ba2a7f26] 6: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0xbc) [0x5579ba2a8a8c] 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0x5579ba17f757] 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x705) [0x5579ba12dee5] 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1bf) [0x5579b9f4ff2f] 10: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x5579ba206e82] 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbf5) [0x5579b9f6ea05] 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x5579ba5714cc] 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5579ba574690] 14: (()+0x7fa3) [0x7fe31ff5cfa3] 15: (clone()+0x3f) [0x7fe31fb0c4cf] 0> 2020-02-04 06:00:23.787 7fe300540700 -1 *** Caught signal (Aborted) ** in thread 7fe300540700 thread_name:tp_osd_tp ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable) 1: (()+0x12730) [0x7fe31ff67730] 2: (gsignal()+0x10b) [0x7fe31fa4a7bb] 3: (abort()+0x121) [0x7fe31fa35535] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x5579b9ecce9d] 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x5579b9ecd027] 6: (ECBackend::do_read_op(ECBackend::ReadOp&)+0xf76) [0x5579ba295106] 7: (ECBackend::send_all_remaining_reads(hobject_t const&, ECBackend::ReadOp&)+0x4d1) [0x5579ba2a6cf1] 8: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*, ZTracer::Trace const&)+0xcf6) [0x5579ba2a7f26] 9: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0xbc) [0x5579ba2a8a8c] 10: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0x5579ba17f757] 11: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x705) [0x5579ba12dee5] 12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1bf) [0x5579b9f4ff2f] 13: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x5579ba206e82] 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbf5) [0x5579b9f6ea05] 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x5579ba5714cc] 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5579ba574690] 17: (()+0x7fa3) [0x7fe31ff5cfa3] 18: (clone()+0x3f) [0x7fe31fb0c4cf] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 rgw_sync 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 4/ 5 memdb 1/ 5 kinetic 1/ 5 fuse 1/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace 1/ 5 prioritycache -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.168.log --- end dump of recent events --- -- ray

4 years, 2 months

1
0
0 0

recovery_unfound

by Jake Grimmett

Dear All, Due to a mistake in my "rolling restart" script, one of our ceph clusters now has a number of unfound objects: There is an 8+2 erasure encoded data pool, 3x replicated metadata pool, all data is stored as cephfs. root@ceph7 ceph-archive]# ceph health HEALTH_ERR 24/420880027 objects unfound (0.000%); Possible data damage: 14 pgs recovery_unfound; Degraded data redundancy: 64/4204261148 objects degraded (0.000%), 14 pgs degraded "ceph health detail" gives me a handle on which pgs are affected. e.g: pg 5.f2f has 2 unfound objects pg 5.5c9 has 2 unfound objects pg 5.4c1 has 1 unfound objects and so on... plus more entries of this type: pg 5.6d is active+recovery_unfound+degraded, acting [295,104,57,442,240,338,219,33,150,382], 1 unfound pg 5.3fa is active+recovery_unfound+degraded, acting [343,147,21,131,315,63,214,365,264,437], 2 unfound pg 5.41d is active+recovery_unfound+degraded, acting [20,104,190,377,52,141,418,358,240,289], 1 unfound Digging deeper into one of the bad pg, we see the oid for the two unfound objects: root@ceph7 ceph-archive]# ceph pg 5.f2f list_unfound { "num_missing": 4, "num_unfound": 2, "objects": [ { "oid": { "oid": "1000ba25e49.00000207", "key": "", "snapid": -2, "hash": 854007599, "max": 0, "pool": 5, "namespace": "" }, "need": "22541'3088478", "have": "0'0", "flags": "none", "locations": [ "189(8)", "263(9)" ] }, { "oid": { "oid": "1000bb25a5b.00000091", "key": "", "snapid": -2, "hash": 3637976879, "max": 0, "pool": 5, "namespace": "" }, "need": "22541'3088476", "have": "0'0", "flags": "none", "locations": [ "189(8)", "263(9)" ] } ], "more": false } While it would be nice to recover the data, this cluster is only used for storing backups. As all OSD are up and running, presumably the data blocks are permanently lost? If it's hard / impossible to recover the data, presumably we should now consider using "ceph pg 5.f2f mark_unfound_lost delete" on each affected pg? Finally, can we use the oid to identify the affected files? best regards, Jake -- Jake Grimmett MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK.

4 years, 2 months

2
2
0 0

ceph positions

by Frank R

Hi all, I really hope this isn't seen as spam. I am looking to find a position where I can focus on Linux storage/Ceph. If anyone is currently looking please let me know. Linkedin profile frankritchie. Thanks, Frank

4 years, 2 months

2
1
0 0

cephf_metadata: Large omap object found

by Yoann Moulin

Hello, I have this message on my new ceph cluster in Nautilus. I have a cephfs with a copy of ~100TB in progress. > /var/log/ceph/artemis.log:2020-02-03 16:22:49.970437 osd.66 (osd.66) 1137 : cluster [WRN] Large omap object found. Object: 8:579bf162:::mds3_openfiles.0:head PG: 8.468fd9ea (8.2a) Key count: 206548 Size (bytes): 6691941 > /var/log/ceph/artemis-osd.66.log:2020-02-03 16:22:49.966 7fe77af62700 0 log_channel(cluster) log [WRN] : Large omap object found. Object: 8:579bf162:::mds3_openfiles.0:head PG: 8.468fd9ea (8.2a) Key count: 206548 Size (bytes): 6691941 I found this thread about a similar issue in the archives of the list https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/JUFYDCQ2AHF… But I'm not sure what I can do in my situation, can I increase osd_deep_scrub_large_omap_object_key_threshold or it's a bad idea? Thanks for your help. Here some useful (I guess) information: > Filesystem Size Used Avail Use% Mounted on > 10.90.37.4,10.90.37.6,10.90.37.8:/ 329T 32T 297T 10% /artemis > artemis@icitsrv5:~$ ceph -s > cluster: > id: 815ea021-7839-4a63-9dc1-14f8c5feecc6 > health: HEALTH_WARN > 1 large omap objects > > services: > mon: 3 daemons, quorum iccluster003,iccluster005,iccluster007 (age 2w) > mgr: iccluster021(active, since 7h), standbys: iccluster009, iccluster023 > mds: cephfs:5 5 up:active > osd: 120 osds: 120 up (since 5d), 120 in (since 5d) > rgw: 8 daemons active (iccluster003.rgw0, iccluster005.rgw0, iccluster007.rgw0, iccluster013.rgw0, iccluster015.rgw0, iccluster019.rgw0, iccluster021.rgw0, iccluster023.rgw0) > > data: > pools: 10 pools, 2161 pgs > objects: 72.02M objects, 125 TiB > usage: 188 TiB used, 475 TiB / 662 TiB avail > pgs: 2157 active+clean > 4 active+clean+scrubbing+deep > > io: > client: 31 KiB/s rd, 803 KiB/s wr, 31 op/s rd, 184 op/s wr > artemis@icitsrv5:~$ ceph health detail > HEALTH_WARN 1 large omap objects > LARGE_OMAP_OBJECTS 1 large omap objects > 1 large objects found in pool 'cephfs_metadata' > Search the cluster log for 'Large omap object found' for more details. > artemis@icitsrv5:~$ ceph fs status > cephfs - 3 clients > ====== > +------+--------+--------------+---------------+-------+-------+ > | Rank | State | MDS | Activity | dns | inos | > +------+--------+--------------+---------------+-------+-------+ > | 0 | active | iccluster015 | Reqs: 0 /s | 251k | 251k | > | 1 | active | iccluster001 | Reqs: 3 /s | 20.2k | 19.1k | > | 2 | active | iccluster017 | Reqs: 1 /s | 116k | 112k | > | 3 | active | iccluster019 | Reqs: 0 /s | 263k | 263k | > | 4 | active | iccluster013 | Reqs: 123 /s | 16.3k | 16.3k | > +------+--------+--------------+---------------+-------+-------+ > +-----------------+----------+-------+-------+ > | Pool | type | used | avail | > +-----------------+----------+-------+-------+ > | cephfs_metadata | metadata | 13.9G | 135T | > | cephfs_data | data | 51.3T | 296T | > +-----------------+----------+-------+-------+ > +-------------+ > | Standby MDS | > +-------------+ > +-------------+ > MDS version: ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable) > root@iccluster019:~# ceph --cluster artemis daemon osd.13 config show | grep large_omap > "osd_deep_scrub_large_omap_object_key_threshold": "200000", > "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824", > artemis@icitsrv5:~$ rados -p cephfs_metadata listxattr mds3_openfiles.0 > artemis@icitsrv5:~$ rados -p cephfs_metadata getomapheader mds3_openfiles.0 > header (42 bytes) : > 00000000 13 00 00 00 63 65 70 68 20 66 73 20 76 6f 6c 75 |....ceph fs volu| > 00000010 6d 65 20 76 30 31 31 01 01 0d 00 00 00 14 63 00 |me v011.......c.| > 00000020 00 00 00 00 00 01 00 00 00 00 |..........| > 0000002a Best regards, -- Yoann Moulin EPFL IC-IT

4 years, 2 months

2
1
0 0

v14.2.7 Nautilus released

by Abhishek Lekshmanan

This is the seventh update to the Ceph Nautilus release series. This is a hotfix release primarily fixing a couple of security issues. We recommend that all users upgrade to this release. Notable Changes --------------- * CVE-2020-1699: Fixed a path traversal flaw in Ceph dashboard that could allow for potential information disclosure (Ernesto Puerta) * CVE-2020-1700: Fixed a flaw in RGW beast frontend that could lead to denial of service from an unauthenticated client (Or Friedmann) Blog Link: https://ceph.io/releases/v14-2-7-nautilus-released/ Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at http://download.ceph.com/tarballs/ceph-14.2.7.tar.gz * For packages, see http://docs.ceph.com/docs/master/install/get-packages/ * Release git sha1: 3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8 -- Abhishek Lekshmanan SUSE Software Solutions Germany GmbH

4 years, 2 months

1
0
0 0

v12.2.13 Luminous released

by Abhishek Lekshmanan

We're happy to announce 13th bug fix release of the Luminous v12.2.x long term stable release series. We recommend that all users upgrade to this release. Many thanks to all the contributors, in particular Yuri & Nathan, in getting this release out of the door. This shall be the last release of the Luminous series. For a detailed release notes, please check out the official blog entry at https://ceph.io/releases/v12-2-13-luminous-released/ Notable Changes --------------- * Ceph now packages python bindings for python3.6 instead of python3.4, because EPEL7 recently switched from python3.4 to python3.6 as the native python3. see the announcement[1] for more details on the background of this change. * We now have telemetry support via a ceph-mgr module. The telemetry module is absolutely on an opt-in basis, and is meant to collect generic cluster information and push it to a central endpoint. By default, we're pushing it to a project endpoint at https://telemetry.ceph.com/report, but this is customizable using by setting the 'url' config option with:: ceph telemetry config-set url '<your url>' You will have to opt-in on sharing your information with:: ceph telemetry on You can view exactly what information will be reported first with:: ceph telemetry show Should you opt-in, your information will be licensed under the Community Data License Agreement - Sharing - Version 1.0, which you can read at https://cdla.io/sharing-1-0/ The telemetry module reports information about CephFS file systems, including: - how many MDS daemons (in total and per file system) - which features are (or have been) enabled - how many data pools - approximate file system age (year + month of creation) - how much metadata is being cached per file system As well as: - whether IPv4 or IPv6 addresses are used for the monitors - whether RADOS cache tiering is enabled (and which mode) - whether pools are replicated or erasure coded, and which erasure code profile plugin and parameters are in use - how many RGW daemons, zones, and zonegroups are present; which RGW frontends are in use - aggregate stats about the CRUSH map, like which algorithms are used, how big buckets are, how many rules are defined, and what tunables are in use * A health warning is now generated if the average osd heartbeat ping time exceeds a configurable threshold for any of the intervals computed. The OSD computes 1 minute, 5 minute and 15 minute intervals with average, minimum and maximum values. New configuration option `mon_warn_on_slow_ping_ratio` specifies a percentage of `osd_heartbeat_grace` to determine the threshold. A value of zero disables the warning. New configuration option `mon_warn_on_slow_ping_time` specified in milliseconds over-rides the computed value, causes a warning when OSD heartbeat pings take longer than the specified amount. New admin command `ceph daemon mgr.# dump_osd_network [threshold]` command will list all connections with a ping time longer than the specified threshold or value determined by the config options, for the average for any of the 3 intervals. New admin command `ceph daemon osd.# dump_osd_network [threshold]` will do the same but only including heartbeats initiated by the specified OSD. * The configuration value `osd_calc_pg_upmaps_max_stddev` used for upmap balancing has been removed. Instead use the mgr balancer config `upmap_max_deviation` which now is an integer number of PGs of deviation from the target PGs per OSD. This can be set with a command like `ceph config set mgr mgr/balancer/upmap_max_deviation 2`. The default `upmap_max_deviation` is 1. There are situations where crush rules would not allow a pool to ever have completely balanced PGs. For example, if crush requires 1 replica on each of 3 racks, but there are fewer OSDs in 1 of the racks. In those cases, the configuration value can be increased. Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at http://download.ceph.com/tarballs/ceph-12.2.13.tar.gz * For packages, see http://docs.ceph.com/docs/master/install/get-packages/ * Release git sha1: 584a20eb0237c657dc0567da126be145106aa47e [1]: https://lists.fedoraproject.org/archives/list/epel-announce@lists.fedorapro… -- Abhishek Lekshmanan SUSE Software Solutions Germany GmbH GF: Felix Imendörffer HRB 21284 (AG Nürnberg)

4 years, 2 months

1
0
0 0

cpu and memory for OSD server

by Wyatt Chun

We have 18 Sata disks (each 2TB) on a physical server, each disk with an OSD deployed. I am not sure how much CPU and memory resources should be prepared for this server. Does each OSD require a physical CPU? and how to calculate memory usage? Thanks.

4 years, 2 months

1
0
0 0

ceph fs dir-layouts and sub-directory mounts

by Frank Schilder

I would like to (in this order) - set the data pool for the root "/" of a ceph-fs to a custom value, say "P" (not the initial data pool used in fs new) - create a sub-directory of "/", for example "/a" - mount the sub-directory "/a" with a client key with access restricted to "/a" The client will not be able to see the dir layout attribute set at "/", its not mounted. Will the data of this client still go to the pool "P", that is, does "/a" inherit the dir layout transparently to the client when following the steps above? Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

4 years, 2 months

3
6
0 0

TR: Understand ceph df details

by CUZA Frédéric

Turns out it is probably orphans. We are running ceph luminous : 12.2.12 And the orphans find is stuck in the stage : "iterate_bucket_index" on shard "0" for 2 days now. Anyone is facing this issue ? Regards, De : ceph-users <ceph-users-bounces(a)lists.ceph.com<mailto:ceph-users-bounces@lists.ceph.com>> Envoyé : 21 January 2020 10:10 À : ceph-users(a)lists.ceph.com<mailto:ceph-users@lists.ceph.com> Objet : [ceph-users] Understand ceph df details Hi everyone, I'm trying to understand where is the difference between the command : ceph df details And the result I'm getting when I run this script : total_bytes=0 while read user; do echo $user bytes=$(radosgw-admin user stats --uid=${user} | grep total_bytes_rounded | tr -dc "0-9") if [ ! -z ${bytes} ]; then total_bytes=$((total_bytes + bytes)) pretty_bytes=$(echo "scale=2; $bytes / 1000^4" | bc) echo " ($bytes B) $pretty_bytes TiB" fi pretty_total_bytes=$(echo "scale=2; $total_bytes / 1000^4" | bc) done <<< "$(radosgw-admin user list | jq -r .[])" echo "" echo "Total : ($total_bytes B) $pretty_total_bytes TiB" When I run df I get this : default.rgw.buckets.data 70 N/A N/A 226TiB 89.23 27.2TiB 61676992 61.68M 2.05GiB 726MiB 677TiB And when I use my script I don't have the same result : Total : (207579728699392 B) 207.57 TiB It means that I have 20 TiB somewhere but I can't find and must of all understand where this 20 TiB. Does anyone have an explanation ? Fi : [root@ceph_monitor01 ~]# radosgw-admin gc list -include-all | grep oid | wc -l 23

4 years, 2 months

2
1
0 0

Questions on Erasure Coding

by Dave Hall

Hello. Thanks to advice from bauen1 I now have OSDs on Debian/Nautilus and have been able to move on to MDS and CephFS. Also, looking around in the Dashboard I noticed the options for Crush Failure Domain and further that it's possible to select 'OSD'. As I mentioned earlier our cluster is fairly small at this point (3 hosts, 24 OSDs) , but we want to get as much usable storage as possible until we can get more nodes. SInce the nodes are brand new we are probably more concerned about disk failures than about node failures for the next few months. If I interpret Crush Failure Domain = OSD, this means it's possible to create pools that behave somewhat similar to RAID 6 - something like 8 + 2 except dispersed across multiple nodes. With the pool spread around like this loosing any one disk shouldn't put the cluster into read-only mode - if a disk did fail, would the cluster re-balance and reconstruct the lost data until the failed OSD was replaced. Does this make sense? Or is it just wishful thinking. Thanks. -Dave -- Dave Hall Binghamton University

4 years, 2 months

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users February 2020