December 2019 - ceph-users

by Karsten Nielsen

Thank you Yan, Zheng for the help to get my cephfs back in working order by providing a source version that had the fix in it to get the root inode fixed. (https://tracker.ceph.com/issues/42675) - Karsten -----Original message----- From: Yan, Zheng <ukernel(a)gmail.com> Sent: Tue 12-11-2019 11:55 Subject: Re: [ceph-users] Re: mds crash loop To: Karsten Nielsen <karsten(a)foo-bar.dk>; CC: ceph-users(a)ceph.io; > On Tue, Nov 12, 2019 at 6:18 PM Karsten Nielsen <karsten(a)foo-bar.dk> wrote: > > > > -----Original message----- > > From: Karsten Nielsen <karsten(a)foo-bar.dk> > > Sent: Tue 12-11-2019 10:30 > > Subject: [ceph-users] Re: mds crash loop > > To: Yan, Zheng <ukernel(a)gmail.com>; > > CC: ceph-users(a)ceph.io; > > > -----Original message----- > > > From: Yan, Zheng <ukernel(a)gmail.com> > > > Sent: Mon 11-11-2019 15:09 > > > Subject: Re: [ceph-users] Re: mds crash loop > > > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > > > CC: ceph-users(a)ceph.io; > > > > On Mon, Nov 11, 2019 at 5:09 PM Karsten Nielsen <karsten(a)foo-bar.dk> > wrote: > > > > > > > > > > I started a job that moved some files around in the cephfs cluster that > > > > resulted in the mds to go back into the crash loop. > > > > > Logs are here: > > > > > http://s3.foo-bar.dk/mds-dumps/mds.log-20191111 > > > > > > > > > > Any help would be appriciated. > > > > > > > > > > > > > looks like snaptable is corrupted. > > > > > > > > nautilus version (14.2.2) of ‘cephfs-data-scan scan_links’ can fix > > > > snaptable. hopefully it will fix your issue. > > > > > > > > you don't need to upgrade whole cluster. Just install nautilus in a > > > > temp machine or compile ceph from source. > > > > > > I did run the command that you suggested, it did not unfortunately fix the > > > problem. > > > > > > http://s3.foo-bar.dk/mds-dumps/mds.log-20191112 > > > > > > > > > The output from the command is this: > > > > sudo docker exec -it rgw2 cephfs-data-scan scan_links > > 2019-11-12 08:46:27.025 7fe775dd7d80 -1 datascan.scan_links: Remove > duplicated ino 0x0x100026d17d4 from 0x100013b0d3d/latest.log > > 2019-11-12 08:46:28.665 7fe775dd7d80 -1 datascan.load_table: unable to read > mds table 'mds1_inotable': (2) No such file or directory > > 2019-11-12 08:46:28.665 7fe775dd7d80 -1 mds.1.inotable: erasing 0x20000000000 > to 0x2000000d665 > > 2019-11-12 08:46:28.793 7fe775dd7d80 -1 datascan.load_table: unable to read > mds table 'mds2_inotable': (2) No such file or directory > > 2019-11-12 08:46:28.793 7fe775dd7d80 -1 mds.2.inotable: erasing 0x30000000000 > to 0x300000228f5 > > 2019-11-12 08:46:29.345 7fe775dd7d80 -1 mds.0.snap updating last_snap 1 -> 3 > > > > > > please run ceph-mds with debug_mds=20, and send the crash log to me. > > Thanks > Yan, Zheng > > > > > > > > > > > > > > > > > > > > > > > > - Karsten > > > > > > > > > > -----Original message----- > > > > > From: Yan, Zheng <ukernel(a)gmail.com> > > > > > Sent: Thu 07-11-2019 14:20 > > > > > Subject: Re: [ceph-users] Re: mds crash loop > > > > > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > > > > > CC: ceph-users(a)ceph.io; > > > > > > On Thu, Nov 7, 2019 at 6:40 PM Karsten Nielsen <karsten(a)foo-bar.dk> > wrote: > > > > > > > > > > > > > > That is awesome. > > > > > > > > > > > > > > Now I just need to figure out where the lost+found files needs to > go. > > > > > > > And what happened to the missing objects for the dirs. > > > > > > > > > > > > > > > > > > > lost+found files are likely files that were deleted. you can keep the > > > > > > lost+found dir for a while, then delete the 'lost+found' directory. > > > > > > > > > > > > for 'missing object' dirs, mv all of them to a temp directory, such as > > > > > > /mnt/cephfs/missing_obj_dirs. > > > > > > Then run command 'ceph daemon mds.x scrub_patch /missing_obj_dirs > > > > > > force recursive repair'. wait a minute, the rm -rf > > > > > > /mnt/cephfs/missing_obj_dirs > > > > > > > > > > > > > Any tool that is able to do that ? > > > > > > > > > > > > > > Thanks > > > > > > > - Karsten > > > > > > > > > > > > > > -----Original message----- > > > > > > > From: Yan, Zheng <ukernel(a)gmail.com> > > > > > > > Sent: Thu 07-11-2019 09:22 > > > > > > > Subject: Re: [ceph-users] Re: mds crash loop > > > > > > > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > > > > > > > CC: ceph-users(a)ceph.io; > > > > > > > > I have tracked down the root cause. See > > > > > > https://tracker.ceph.com/issues/42675 > > > > > > > > > > > > > > > > Regards > > > > > > > > Yan, Zheng > > > > > > > > > > > > > > > > On Thu, Nov 7, 2019 at 4:01 PM Karsten Nielsen > <karsten(a)foo-bar.dk> > > > > wrote: > > > > > > > > > > > > > > > > > > -----Original message----- > > > > > > > > > From: Yan, Zheng <ukernel(a)gmail.com> > > > > > > > > > Sent: Thu 07-11-2019 07:21 > > > > > > > > > Subject: Re: [ceph-users] Re: mds crash loop > > > > > > > > > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > > > > > > > > > CC: ceph-users(a)ceph.io; > > > > > > > > > > On Thu, Nov 7, 2019 at 5:50 AM Karsten Nielsen > > > <karsten(a)foo-bar.dk> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > -----Original message----- > > > > > > > > > > > From: Yan, Zheng <ukernel(a)gmail.com> > > > > > > > > > > > Sent: Wed 06-11-2019 14:16 > > > > > > > > > > > Subject: Re: [ceph-users] mds crash loop > > > > > > > > > > > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > > > > > > > > > > > CC: ceph-users(a)ceph.io; > > > > > > > > > > > > On Wed, Nov 6, 2019 at 4:42 PM Karsten Nielsen > > > > <karsten(a)foo-bar.dk> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original message----- > > > > > > > > > > > > > From: Yan, Zheng <ukernel(a)gmail.com> > > > > > > > > > > > > > Sent: Wed 06-11-2019 08:15 > > > > > > > > > > > > > Subject: Re: [ceph-users] mds crash loop > > > > > > > > > > > > > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > > > > > > > > > > > > > CC: ceph-users(a)ceph.io; > > > > > > > > > > > > > > On Tue, Nov 5, 2019 at 5:29 PM Karsten Nielsen > > > > > > <karsten(a)foo-bar.dk> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Last week I upgraded my ceph cluster from luminus to > > > mimic > > > > > > 13.2.6 > > > > > > > > > > > > > > > It was running fine for a while but yesterday my mds > > > went > > > > > > into a > > > > > > > > crash > > > > > > > > > > > > loop. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have 1 active and 1 standby mds for my cephfs > both of > > > > which > > > > > > is > > > > > > > > > > running > > > > > > > > > > > > the > > > > > > > > > > > > > > same crash loop. > > > > > > > > > > > > > > > I am running ceph based on > > > > > > https://hub.docker.com/r/ceph/daemon > > > > > > > > > > version > > > > > > > > > > > > > > v3.2.7-stable-3.2-minic-centos-7-x86_64 with a etcd kv > > > > store. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Log details are: https://paste.debian.net/1113943/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > please try again with debug_mds=20. Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yan, Zheng > > > > > > > > > > > > > > > > > > > > > > > > > > Yes I have set that and had to move to pastebin.com as > > > debian > > > > > > > > apperently > > > > > > > > > > only > > > > > > > > > > > > supports 150k > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://pastebin.com/Gv7c5h54 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looks like on-disk root inode is corrupted. have you > > > > encountered any > > > > > > > > > > > > unusually things during the upgrade? > > > > > > > > > > > > > > > > > > > > > > > > please run 'rados -p <cephfs metadata pool> stat > > > > 1.00000000.inode' , > > > > > > > > > > > > check if the object is modified before or after the > 'luminous > > > -> > > > > > > > > > > > > 13.2.6' upgrade. > > > > > > > > > > > > To fix the corrupted object. Run 'cephfs-data-scan init > > > > > > > > > > > > --force-init'. Then restart mds. After mds become active, > run > > > > 'ceph > > > > > > > > > > > > daemon mds.x scrub_path / force repair' > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I followed the steps I got the mds started but now a lot of > > > files > > > > are > > > > > > in > > > > > > > > > > lost+found 24283 and I have these errors in the mds log > > > > > > > > > > > > > > > > > > > > > 'cephfs-data-scan init --force-init' does not move files into > > > > > > > > > > lost+found. have you ever run other 'cephfs-data-scan foo' > > > command > > > > or > > > > > > > > > > 'cephfs-journal-tool foo' command? > > > > > > > > > > > > > > > > > > I have had a similar problem with the cluster before where I > went > > > > through > > > > > > the > > > > > > > > cycle of: > > > > > > > > > https://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/ -> > Using > > > an > > > > > > > > alternate metadata pool for recovery > > > > > > > > > > > > > > > > > > I did run the cephfs-journal-tool journal reset command, mostly > > > > because > > > > > > > > cephfs is not that utilized so I thought it was safe to do as > after > > > the > > > > > > upgrade > > > > > > > > the cluster has not been used much, so data lose would be minimal > - > > > > > > apparently > > > > > > > > I was wrong. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2019-11-06 20:20:18.215 7f0bd9090700 1 mds.0.32011 cluster > > > > recovered. > > > > > > > > > > > 2019-11-06 20:20:19.019 7f0bd2dfa700 0 > > > > mds.0.cache.dir(0x100013acfcb) > > > > > > > > > > _fetched missing object for [dir 0x100013acfcb > > > > > > > > /nextcloud/custom_apps/carnet/ > > > > > > > > > > [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741888|fetching > f() > > > n() > > > > > > > > > > hs=0+0,ss=0+0 | waiter=1 authpin=1 0x55d4dc4f5100] > > > > > > > > > > > 2019-11-06 20:20:19.019 7f0bd2dfa700 -1 > log_channel(cluster) log > > > > > > [ERR] : > > > > > > > > dir > > > > > > > > > > 0x100013acfcb object missing on disk; some files may be lost > > > > > > > > > > (/nextcloud/custom_apps/carnet) > > > > > > > > > > > 2019-11-06 20:20:19.275 7f0bd2dfa700 0 > > > > mds.0.cache.dir(0x100013a3156) > > > > > > > > > > _fetched missing object for [dir 0x100013a3156 > > > > > > /nextcloud/custom_apps/mail/ > > > > > > > > > > [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741888|fetching > f() > > > n() > > > > > > > > > > hs=0+0,ss=0+0 | waiter=1 authpin=1 0x55d4dcc40000] > > > > > > > > > > > 2019-11-06 20:20:19.275 7f0bd2dfa700 -1 > log_channel(cluster) log > > > > > > [ERR] : > > > > > > > > dir > > > > > > > > > > 0x100013a3156 object missing on disk; some files may be lost > > > > > > > > > > (/nextcloud/custom_apps/mail) > > > > > > > > > > > 2019-11-06 20:20:19.371 7f0bd2dfa700 0 > > > > mds.0.cache.dir(0x100013abb3c) > > > > > > > > > > _fetched missing object for [dir 0x100013abb3c > > > > > > > > > > /nextcloud/custom_apps/passwords/ [2,head] auth v=0 cv=0/0 > > > ap=1+0+0 > > > > > > > > > > state=1073741888|fetching f() n() hs=0+0,ss=0+0 | waiter=1 > > > authpin=1 > > > > > > > > > > 0x55d4dcc40700] > > > > > > > > > > > 2019-11-06 20:20:19.371 7f0bd2dfa700 -1 > log_channel(cluster) log > > > > > > [ERR] : > > > > > > > > dir > > > > > > > > > > 0x100013abb3c object missing on disk; some files may be lost > > > > > > > > > > (/nextcloud/custom_apps/passwords) > > > > > > > > > > > 2019-11-06 20:20:19.383 7f0bd2dfa700 0 > > > > mds.0.cache.dir(0x100013a9b9b) > > > > > > > > > > _fetched missing object for [dir 0x100013a9b9b > > > > > > > > > > /nextcloud/custom_apps/phonetrack/ [2,head] auth v=0 cv=0/0 > > > ap=1+0+0 > > > > > > > > > > state=1073741888|fetching f() n() hs=0+0,ss=0+0 | waiter=1 > > > authpin=1 > > > > > > > > > > 0x55d4dcc40e00] > > > > > > > > > > > 2019-11-06 20:20:19.383 7f0bd2dfa700 -1 > log_channel(cluster) log > > > > > > [ERR] : > > > > > > > > dir > > > > > > > > > > 0x100013a9b9b object missing on disk; some files may be lost > > > > > > > > > > (/nextcloud/custom_apps/phonetrack) > > > > > > > > > > > 2019-11-06 20:20:19.431 7f0bd2dfa700 0 > > > > mds.0.cache.dir(0x100013a2659) > > > > > > > > > > _fetched missing object for [dir 0x100013a2659 > > > > > > > > > > /nextcloud/custom_apps/richdocuments/ [2,head] auth v=0 cv=0/0 > > > > ap=1+0+0 > > > > > > > > > > state=1073741888|fetching f() n() hs=0+0,ss=0+0 | waiter=1 > > > authpin=1 > > > > > > > > > > 0x55d4dcc41500] > > > > > > > > > > > 2019-11-06 20:20:19.431 7f0bd2dfa700 -1 > log_channel(cluster) log > > > > > > [ERR] : > > > > > > > > dir > > > > > > > > > > 0x100013a2659 object missing on disk; some files may be lost > > > > > > > > > > (/nextcloud/custom_apps/richdocuments) > > > > > > > > > > > 2019-11-06 20:20:22.360 7f0bd9090700 1 mds.k8s-node-01 > > > Updating > > > > MDS > > > > > > map > > > > > > > > to > > > > > > > > > > version 32015 from mon.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Karsten > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for any hints > > > > > > > > > > > > > > > - Karsten > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > > > > > > > > > > > > > To unsubscribe send an email to > ceph-users-leave(a)ceph.io > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > > > > > > > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > > > > > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > > > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > >

4 years, 4 months

1
0
0 0

Recommended procedure to modify Crush Map

by Thomas Schneider

Hi, my current Crush Map includes multiple roots representing different disks. There are multiple crush rules, one for each pool. And a pool represents a disk type: hdd, ssd, nvme Question: What is the recommended procedure to modify the Crush Map in order to define only one root and "transfer" all other roots to additional disk types? THX

4 years, 4 months

1
0
0 0

Error in add new ISCSI gateway

by Gesiel Galvão Bernardes

Hi everyone, I have a problem trying to add an ISCSI gateway. The following error is generated when trying to add the new gateway: iscsi-target...-igw/gateways> create ceph-iscsi3 192.168.201.3 Adding gateway, sync'ing 3 disk(s) and 2 client(s) Failed : /etc/ceph/iscsi-gateway.cfg on ceph-iscsi3 does not match the local version. Correct and retry request But the file iscsi-gateway.cfg is exactly the same in all gateways. Selinux is disabled, permissios are OK too. I using ceph 13.2.6. Can anyone help-me? Regards Gesiel

4 years, 4 months

3
10
0 0

Building a petabyte cluster from scratch

by Fabien Sirjean

Hi Ceph users ! After years of using Ceph, we plan to build soon a new cluster bigger than what we've done in the past. As the project is still in reflection, I'd like to have your thoughts on our planned design : any feedback is welcome :) ## Requirements * ~1 PB usable space for file storage, extensible in the future * The files are mostly "hot" data, no cold storage * Purpose : storage for big files being essentially used on windows workstations (10G access) * Performance is better :) ## Global design * 8+3 Erasure Coded pool * ZFS on RBD, exposed via samba shares (cluster with failover) ## Hardware * 1 rack (multi-site would be better, of course...) * OSD nodes : 14 x supermicro servers * 24 usable bays in 2U rackspace * 16 x 10 TB nearline SAS HDD (8 bays for future needs) * 2 x Xeon Silver 4212 (12C/24T) * 128 GB RAM * 4 x 40G QSFP+ * Networking : 2 x Cisco N3K 3132Q or 3164Q * 2 x 40G per server for ceph network (LACP/VPC for HA) * 2 x 40G per server for public network (LACP/VPC for HA) * QSFP+ DAC cables ## Sizing If we've done the maths well, we expect to have : * 2.24 PB of raw storage, extensible to 3.36 PB by adding HDD * 1.63 PB expected usable space with 8+3 EC, extensible to 2.44 PB * ~1 PB of usable space if we want to keep the OSD use under 66% to allow loosing nodes without problem, extensible to 1.6 PB (same condition) ## Reflections * We're used to run mons and mgrs daemons on a few of our OSD nodes, without any issue so far : is this a bad idea for a big cluster ? * We thought using cache tiering on an SSD pool, but a large part of the PB is used on a daily basis, so we expect the cache to be not so effective and really expensive ? * Could a 2x10G network be enough ? * ZFS on Ceph ? Any thoughts ? * What about CephFS ? We'd like to use RBD diff for backups but it looks impossible to use snapshot diff with Cephfs ? Thanks for reading, and sharing your experiences ! F.

4 years, 4 months

12
16
0 0

Re: Can min_read_recency_for_promote be -1

by Romit Misra

Hi Robert, I am not quite sure if I get your question correct, but what I understand is that you want the inbound writes to land on the cache tier, which presumably would be on a faster media, possibily a ssd. From there you would want it to trickle down to the base tier, which is a EC pool hosted on HDD. Some of the pointers I have :- It is better to have seperate media for base and cache , HDD and SSD respectively. If the intent is never to promote to cache tier on Read, you could set it to a high number such as 3, and at the same time, Make the bloom filter window small.( This basically translates into if the object has been read X number of times in past y seconds) Keep in mind the larger the window, the more the size of the bloom filter, and hence you would see a increase in osd memory usgae. I have patch somewhere lurking which disables the promotes, let me check on the same, if this is for a specific case. If your intent is to have a constant decay rate from the Cache tier to the base tier, here is what you could do.:- 1.Set the Max Objects on the Cache tier TO X 2.Set the Max Size to say Y, this would be normally 60-70 percent of the total cache tier capacity. 3.The flushes would start happening on the first trigger of the above thresholds. 4. You could set the evict age roughly double the time, you expect the data will hit the base tier. 5.Lastly have you tried running cosbench or any related tool, to qualify the IOPS of your base tier with EC enabled, you may. Or require the cache tier at all. 6. There are substantial overheads of a cache tier maintenance, the major being absence of throttles on how the flush happens. 7.A thundering herd of write requests can cause a huge amount of flush to happen to the base tier. 8.IMHO it is suitable and predictable for loads where number of ingress requests can be predicted and there is some kind of rate limiting on the same. Hope this helps Thanks Romit On Tue, 3 Dec 2019, 04:11 , <ceph-users-request(a)ceph.io> wrote: > Send ceph-users mailing list submissions to > ceph-users(a)ceph.io > > To subscribe or unsubscribe via email, send a message with subject or > body 'help' to > ceph-users-request(a)ceph.io > > You can reach the person managing the list at > ceph-users-owner(a)ceph.io > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of ceph-users digest..." > > Today's Topics: > > 1. Re: ceph node crashed with these errors "kernel: ceph: > build_snap_context" (maybe now it is urgent?) > (Ilya Dryomov) > 2. Re: ceph node crashed with these errors "kernel: ceph: > build_snap_context" (maybe now it is urgent?) > (Marc Roos) > 3. Re: ceph node crashed with these errors "kernel: ceph: > build_snap_context" (maybe now it is urgent?) > (Marc Roos) > 4. Re: Possible data corruption with 14.2.3 and 14.2.4 > (Simon Ironside) > 5. Re: ceph node crashed with these errors "kernel: ceph: > build_snap_context" (maybe now it is urgent?) > (Marc Roos) > 6. Can min_read_recency_for_promote be -1 (Robert LeBlanc) > > > ---------------------------------------------------------------------- > > Date: Mon, 2 Dec 2019 14:59:05 +0100 > From: Ilya Dryomov <idryomov(a)gmail.com> > Subject: [ceph-users] Re: ceph node crashed with these errors "kernel: > ceph: build_snap_context" (maybe now it is urgent?) > To: Marc Roos <M.Roos(a)f1-outsourcing.eu> > Cc: ceph-users <ceph-users(a)ceph.io>, jlayton <jlayton(a)kernel.org> > Message-ID: > < > CAOi1vP-uyxeaKvuxUQbe2nsuXH9-f6_QxcggOCv6LrCBzugJOw(a)mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > On Mon, Dec 2, 2019 at 1:23 PM Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote: > > > > > > > > I guess this is related? kworker 100% > > > > > > [Mon Dec 2 13:05:27 2019] SysRq : Show backtrace of all active CPUs > > [Mon Dec 2 13:05:27 2019] sending NMI to all CPUs: > > [Mon Dec 2 13:05:27 2019] NMI backtrace for cpu 0 skipped: idling at pc > > 0xffffffffb0581e94 > > [Mon Dec 2 13:05:27 2019] NMI backtrace for cpu 1 skipped: idling at pc > > 0xffffffffb0581e94 > > [Mon Dec 2 13:05:27 2019] NMI backtrace for cpu 2 skipped: idling at pc > > 0xffffffffb0581e94 > > [Mon Dec 2 13:05:27 2019] NMI backtrace for cpu 3 skipped: idling at pc > > 0xffffffffb0581e94 > > [Mon Dec 2 13:05:27 2019] NMI backtrace for cpu 4 > > [Mon Dec 2 13:05:27 2019] CPU: 4 PID: 426200 Comm: kworker/4:2 Not > > tainted 3.10.0-1062.4.3.el7.x86_64 #1 > > [Mon Dec 2 13:05:27 2019] Hardware name: Supermicro > > X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0b 05/27/2014 > > [Mon Dec 2 13:05:27 2019] Workqueue: ceph-msgr ceph_con_workfn > > [libceph] > > [Mon Dec 2 13:05:27 2019] task: ffffa0c8e1240000 ti: ffffa0ccb6364000 > > task.ti: ffffa0ccb6364000 > > [Mon Dec 2 13:05:27 2019] RIP: 0010:[<ffffffffc08d7db9>] > > [<ffffffffc08d7db9>] cmpu64_rev+0x19/0x20 [ceph] > > [Mon Dec 2 13:05:27 2019] RSP: 0018:ffffa0ccb6367a20 EFLAGS: 00000202 > > [Mon Dec 2 13:05:27 2019] RAX: 0000000000000001 RBX: 0000000000000038 > > RCX: 0000000000000008 > > [Mon Dec 2 13:05:27 2019] RDX: 0000000000025c33 RSI: ffffa0cbbe380050 > > RDI: ffffa0cbbe380030 > > [Mon Dec 2 13:05:27 2019] RBP: ffffa0ccb6367a20 R08: 0000000000000018 > > R09: 00000000000013ed > > [Mon Dec 2 13:05:27 2019] R10: 0000000000000002 R11: ffffe94994f8e000 > > R12: ffffa0cbbe380030 > > [Mon Dec 2 13:05:27 2019] R13: ffffffffc08d7da0 R14: ffffa0cbbe380018 > > R15: ffffa0cbbe380050 > > [Mon Dec 2 13:05:27 2019] FS: 0000000000000000(0000) > > GS:ffffa0d2cfb00000(0000) knlGS:0000000000000000 > > [Mon Dec 2 13:05:27 2019] CS: 0010 DS: 0000 ES: 0000 CR0: > > 0000000080050033 > > [Mon Dec 2 13:05:27 2019] CR2: 000055a7c413fcb9 CR3: 0000001813010000 > > CR4: 00000000000607e0 > > [Mon Dec 2 13:05:27 2019] Call Trace: > > [Mon Dec 2 13:05:27 2019] [<ffffffffb019303f>] sort+0x1af/0x260 > > [Mon Dec 2 13:05:27 2019] [<ffffffffb0192e60>] ? u32_swap+0x10/0x10 > > [Mon Dec 2 13:05:27 2019] [<ffffffffc08d807b>] > > build_snap_context+0x12b/0x290 [ceph] > > [Mon Dec 2 13:05:27 2019] [<ffffffffc08d820c>] > > rebuild_snap_realms+0x2c/0x90 [ceph] > > [Mon Dec 2 13:05:27 2019] [<ffffffffc08d822b>] > > rebuild_snap_realms+0x4b/0x90 [ceph] > > [Mon Dec 2 13:05:27 2019] [<ffffffffc08d91fc>] > > ceph_update_snap_trace+0x3ec/0x530 [ceph] > > [Mon Dec 2 13:05:27 2019] [<ffffffffc08e2239>] > > handle_reply+0x359/0xc60 [ceph] > > [Mon Dec 2 13:05:27 2019] [<ffffffffc08e48ba>] dispatch+0x11a/0xb00 > > [ceph] > > [Mon Dec 2 13:05:27 2019] [<ffffffffb042e56a>] ? > > kernel_recvmsg+0x3a/0x50 > > [Mon Dec 2 13:05:27 2019] [<ffffffffc05fcff4>] try_read+0x544/0x1300 > > [libceph] > > [Mon Dec 2 13:05:27 2019] [<ffffffffafee13ce>] ? > > account_entity_dequeue+0xae/0xd0 > > [Mon Dec 2 13:05:27 2019] [<ffffffffafee4d5c>] ? > > dequeue_entity+0x11c/0x5e0 > > [Mon Dec 2 13:05:27 2019] [<ffffffffb042e417>] ? > > kernel_sendmsg+0x37/0x50 > > [Mon Dec 2 13:05:27 2019] [<ffffffffc05fdfb4>] > > ceph_con_workfn+0xe4/0x1530 [libceph] > > [Mon Dec 2 13:05:27 2019] [<ffffffffb057f568>] ? > > __schedule+0x448/0x9c0 > > [Mon Dec 2 13:05:27 2019] [<ffffffffafebe21f>] > > process_one_work+0x17f/0x440 > > [Mon Dec 2 13:05:27 2019] [<ffffffffafebf336>] > > worker_thread+0x126/0x3c0 > > [Mon Dec 2 13:05:27 2019] [<ffffffffafebf210>] ? > > manage_workers.isra.26+0x2a0/0x2a0 > > [Mon Dec 2 13:05:27 2019] [<ffffffffafec61f1>] kthread+0xd1/0xe0 > > [Mon Dec 2 13:05:27 2019] [<ffffffffafec6120>] ? > > insert_kthread_work+0x40/0x40 > > [Mon Dec 2 13:05:27 2019] [<ffffffffb058cd37>] > > ret_from_fork_nospec_begin+0x21/0x21 > > [Mon Dec 2 13:05:27 2019] [<ffffffffafec6120>] ? > > insert_kthread_work+0x40/0x40 > > [Mon Dec 2 13:05:27 2019] Code: 87 c8 fc ff ff 5d 0f 94 c0 0f b6 c0 c3 > > 0f 1f 44 00 00 66 66 66 66 90 48 8b 16 48 39 17 b8 01 00 00 00 55 48 89 > > e5 72 08 0f 97 c0 <0f> b6 c0 f7 d8 5d c3 66 66 66 66 90 55 f6 05 ed 92 > > 02 00 04 48 > > [Mon Dec 2 13:05:27 2019] NMI backtrace for cpu 5 > > Yes, seems related. I'm not sure how it relates to an upgrade to > nautilus, but as I mentioned in a different message, with thousands of > snapshots you are in a dangerous territory anyway. > > Thanks, > > Ilya > > ------------------------------ > > Date: Mon, 2 Dec 2019 15:06:54 +0100 > From: "Marc Roos" <M.Roos(a)f1-outsourcing.eu> > Subject: [ceph-users] Re: ceph node crashed with these errors "kernel: > ceph: build_snap_context" (maybe now it is urgent?) > To: idryomov <idryomov(a)gmail.com> > Cc: ceph-users <ceph-users(a)ceph.io>, jlayton <jlayton(a)kernel.org> > Message-ID: <"H000007100158998.1575295614.sx.f1-outsourcing.eu*"@MHS> > Content-Type: text/plain; charset="UTF-8" > > > > >> > > >> >ISTR there were some anti-spam measures put in place. Is your > account > >> >waiting for manual approval? If so, David should be able to help. > >> > >> Yes if I remember correctly I get waiting approval when I try to log > in. > >> > >> >> > >> >> > >> >> > >> >> Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9287 > >> >> ffff911a9a26bd00 fail -12 > >> >> Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9283 > >> > > >> > > >> >It is failing to allocate memory. "low load" isn't very specific, > >> >can you describe the setup and the workload in more detail? > >> > >> 4 nodes (osd, mon combined), the 4th node has local cephfs mount, > which > >> is rsync'ing some files from vm's. 'low load' I have sort of test > setup, > >> going to production. Mostly the nodes are below a load of 1 (except > when > >> the concurrent rsync starts) > >> > >> >How many snapshots do you have? > >> > >> Don't know how to count them. I have script running on a 2000 dirs. > If > >> one of these dirs is not empty it creates a snapshot. So in theory I > >> could have 2000 x 7 days = 14000 snapshots. > >> (btw the cephfs snapshots are in a different tree than the rsync is > >> using) > > > >Is there a reason you are snapshotting each directory individually > >instead of just snapshotting a common parent? > > Yes because I am not sure the snapshot frequency on all folders is going > to be the same. > > >If you have thousands of snapshots, you may eventually hit a different > >bug: > > > >https://tracker.ceph.com/issues/21420 > >https://docs.ceph.com/docs/master/cephfs/experimental-features/#snapsh > ots > > > >Be aware that each set of 512 snapshots amplify your writes by 4K in > >terms of network consumption. With 14000 snapshots, a 4K write would > >need to transfer ~109K worth of snapshot metadata to carry itself out. > > > > Also when I am not even writing to a tree with snapshots enabled? I am > rsyncing to dir3 > > . > ├── dir1 > │ ├── dira > │ │ └── .snap > │ ├── dirb > │ ├── dirc > │ │ └── .snap > │ └── dird > │ └── .snap > ├── dir2 > └── dir3 > > ------------------------------ > > Date: Mon, 2 Dec 2019 16:29:07 +0100 > From: "Marc Roos" <M.Roos(a)f1-outsourcing.eu> > Subject: [ceph-users] Re: ceph node crashed with these errors "kernel: > ceph: build_snap_context" (maybe now it is urgent?) > To: idryomov <idryomov(a)gmail.com> > Cc: ceph-users <ceph-users(a)ceph.io>, jlayton <jlayton(a)kernel.org> > Message-ID: <"H000007100158aca.1575300547.sx.f1-outsourcing.eu*"@MHS> > Content-Type: text/plain; charset="UTF-8" > > > I can confirm that removing all the snapshots seems to resolve the > problem. > > A - I would propose a redesign of something like that snapshots from > below the mountpoint are only taken into account and not snapshots in > the entire filesystem. That should fix a lot of issues > > B - That reminds me about this mv command, that does not move data > across different pools in the fs. I would like to see this. Because it > is the logical thing to expect. > > > > > > > >> > > >> >ISTR there were some anti-spam measures put in place. Is your > account >> >waiting for manual approval? If so, David should be able > to help. > >> > >> Yes if I remember correctly I get waiting approval when I try to log > in. > >> > >> >> > >> >> > >> >> > >> >> Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9287 > >> >> ffff911a9a26bd00 fail -12 >> >> Dec 1 03:14:36 c04 kernel: > ceph: build_snap_context 100020c9283 >> > >> > >> >It is failing > to allocate memory. "low load" isn't very specific, >> >can you > describe the setup and the workload in more detail? > >> > >> 4 nodes (osd, mon combined), the 4th node has local cephfs mount, > which >> is rsync'ing some files from vm's. 'low load' I have sort of > test setup, >> going to production. Mostly the nodes are below a load > of 1 (except when >> the concurrent rsync starts) >> >> >How many > snapshots do you have? > >> > >> Don't know how to count them. I have script running on a 2000 dirs. > If > >> one of these dirs is not empty it creates a snapshot. So in theory I > >> could have 2000 x 7 days = 14000 snapshots. > >> (btw the cephfs snapshots are in a different tree than the rsync is > >> using) > >Is there a reason you are snapshotting each directory > individually >instead of just snapshotting a common parent? > > Yes because I am not sure the snapshot frequency on all folders is going > to be the same. > > >If you have thousands of snapshots, you may eventually hit a different > >bug: > > > >https://tracker.ceph.com/issues/21420 > >https://docs.ceph.com/docs/master/cephfs/experimental-features/#snapsh > ots > > > >Be aware that each set of 512 snapshots amplify your writes by 4K in > >terms of network consumption. With 14000 snapshots, a 4K write would > >need to transfer ~109K worth of snapshot metadata to carry itself out. > > > > Also when I am not even writing to a tree with snapshots enabled? I am > rsyncing to dir3 > > . > ├── dir1 > │ ├── dira > │ │ └── .snap > │ ├── dirb > │ ├── dirc > │ │ └── .snap > │ └── dird > │ └── .snap > ├── dir2 > └── dir3 > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > email to ceph-users-leave(a)ceph.io > > > ------------------------------ > > Date: Mon, 2 Dec 2019 15:54:54 +0000 > From: Simon Ironside <sironside(a)caffetine.org> > Subject: [ceph-users] Re: Possible data corruption with 14.2.3 and > 14.2.4 > To: ceph-users(a)ceph.io > Message-ID: <21d057e9-0088-4847-6d40-19cf2c848395(a)caffetine.org> > Content-Type: text/plain; charset=utf-8; format=flowed > > Any word on 14.2.5? Nervously waiting here . . . > > Thanks, > Simon. > > On 18/11/2019 11:29, Simon Ironside wrote: > > > I will sit tight and wait for 14.2.5. > > > > Thanks again, > > Simon. > > ------------------------------ > > Date: Mon, 2 Dec 2019 19:32:03 +0100 > From: "Marc Roos" <M.Roos(a)f1-outsourcing.eu> > Subject: [ceph-users] Re: ceph node crashed with these errors "kernel: > ceph: build_snap_context" (maybe now it is urgent?) > To: ceph-users <ceph-users(a)ceph.io>, lhenriques <lhenriques(a)suse.com> > Message-ID: <"H000007100158b41.1575311519.sx.f1-outsourcing.eu*"@MHS> > Content-Type: text/plain; charset="ISO-8859-1" > > > Yes Luis, good guess!! ;) > > > > -----Original Message----- > Cc: ceph-users > Subject: Re: [ceph-users] ceph node crashed with these errors "kernel: > ceph: build_snap_context" (maybe now it is urgent?) > > On Mon, Dec 02, 2019 at 10:27:21AM +0100, Marc Roos wrote: > > > > I have been asking before[1]. Since Nautilus upgrade I am having > > these, with a total node failure as a result(?). Was not expecting > > this in my 'low load' setup. Maybe now someone can help resolving > > this? I am also waiting quite some time to get access at > > https://tracker.ceph.com/issues. > > Just a wild guess: do you have a lot of snapshots (> ~400)? If so, > that's probably the problem. See [1] and [2]. > > [1] > https://docs.ceph.com/docs/master/cephfs/experimental-features/#snapshots > [2] https://tracker.ceph.com/issues/21420 > > Cheers, > -- > Luís > > > > > > > Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9287 > > ffff911a9a26bd00 fail -12 Dec 1 03:14:36 c04 kernel: ceph: > > build_snap_context 100020c9283 ffff911d34e69d00 fail -12 Dec 1 > > 03:14:36 c04 kernel: ceph: build_snap_context 100020c9276 > > ffff911d34e69c00 fail -12 Dec 1 03:14:36 c04 kernel: ceph: > > build_snap_context 100020c926c ffff912068b92c00 fail -12 Dec 1 > > 03:14:36 c04 kernel: ceph: build_snap_context 100020c9268 > > ffff912068b93000 fail -12 Dec 1 03:14:36 c04 kernel: ceph: > > build_snap_context 100020c926d ffff912068b92900 fail -12 Dec 1 > > 03:14:36 c04 kernel: ceph: build_snap_context 100020c928a > > ffff912118e5be00 fail -12 Dec 1 03:14:36 c04 kernel: ceph: > > build_snap_context 100020c9272 ffff9119950d9500 fail -12 Dec 1 > > 03:14:36 c04 kernel: ceph: build_snap_context 100020c9269 > > ffff911940f3d000 fail -12 Dec 1 03:14:36 c04 kernel: ceph: > > build_snap_context 100020c9270 ffff911748427c00 fail -12 Dec 1 > > 03:14:36 c04 kernel: ceph: build_snap_context 100020c926b > > ffff91169b000600 fail -12 Dec 1 03:14:36 c04 kernel: ceph: > > build_snap_context 100020c9281 ffff91169b000500 fail -12 Dec 1 > > 03:14:36 c04 kernel: ceph: build_snap_context 100020c9288 > > ffff9115844d2500 fail -12 Dec 1 03:14:36 c04 kernel: ceph: > > build_snap_context 100020c927d ffff9115844d2e00 fail -12 Dec 1 > > 03:14:36 c04 kernel: ceph: build_snap_context 100020c9280 > > ffff91186401b000 fail -12 Dec 1 03:14:36 c04 kernel: ceph: > > build_snap_context 100020c9267 ffff9121535ecc00 fail -12 Dec 1 > > 03:14:36 c04 kernel: ceph: build_snap_context 100020c927c > > ffff9121cecb1e00 fail -12 Dec 1 03:14:36 c04 kernel: ceph: > > build_snap_context 100020c9271 ffff9121cecb0400 fail -12 Dec 1 > > 03:14:36 c04 kernel: ceph: build_snap_context 100020c9279 > > ffff911d26646300 fail -12 Dec 1 03:14:36 c04 kernel: ceph: > > build_snap_context 100020c927f ffff911d26646900 fail -12 Dec 1 > > 03:14:36 c04 kernel: ceph: build_snap_context 100020c9275 > > ffff9121cecb1700 fail -12 Dec 1 03:14:36 c04 kernel: ceph: > > build_snap_context 100020c9259 ffff91170c9f6600 fail -12 Dec 1 > > 03:14:36 c04 kernel: ceph: build_snap_context 100020c9257 > > ffff9118ef2a8000 fail -12 Dec 1 03:14:36 c04 kernel: ceph: > > build_snap_context 100020c924e ffff911a1e091800 fail -12 Dec 1 > > 03:14:36 c04 kernel: ceph: build_snap_context 100020c9262 > > ffff911a1e090c00 fail -12 Dec 1 03:14:36 c04 kernel: ceph: > > build_snap_context 100020c9266 ffff9115e3859500 fail -12 Dec 1 > > 03:14:36 c04 kernel: ceph: build_snap_context 100020c924f > > ffff9118aefd1300 fail -12 Dec 1 03:14:36 c04 kernel: ceph: > > build_snap_context 100020c925f ffff91170c9f6100 fail -12 Dec 1 > > 03:14:36 c04 kernel: ceph: build_snap_context 100020c9252 > > ffff9115e3859800 fail -12 Dec 1 03:14:36 c04 kernel: ceph: > > build_snap_context 100020c9256 ffff912045dc5300 fail -12 Dec 1 > > 03:14:36 c04 kernel: ceph: build_snap_context 100020c9254 > > ffff91170c9f6900 fail -12 Dec 1 03:14:36 c04 kernel: ceph: > > build_snap_context 100020c9261 ffff91170c9f7100 fail -12 Dec 1 > > 03:14:36 c04 kernel: ceph: build_snap_context 100020d4ec4 > > ffff9118aefd0000 fail -12 > > > > [1] > > https://www.mail-archive.com/ceph-users@ceph.io/msg01088.html > > https://www.mail-archive.com/ceph-users@ceph.io/msg00969.html > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > > email to ceph-users-leave(a)ceph.io > > > ------------------------------ > > Date: Mon, 2 Dec 2019 14:39:26 -0800 > From: Robert LeBlanc <robert(a)leblancnet.us> > Subject: [ceph-users] Can min_read_recency_for_promote be -1 > To: ceph-users <ceph-users(a)ceph.io> > Message-ID: > < > CAANLjFoecdW7oBh78L3dNO83C-DpDmqXw-kKtT+ShNKXjsqKJg(a)mail.gmail.com> > Content-Type: multipart/alternative; > boundary="00000000000024d4be0598c041cd" > > --00000000000024d4be0598c041cd > Content-Type: text/plain; charset="UTF-8" > > I'd like to configure a cache tier to act as a write buffer, so that if > writes come in, it promotes objects, but reads never promote an object. We > have a lot of cold data so we would like to tier down to an EC pool > (CephFS) after a period of about 30 days to save space. The storage tier > and the 'cache' tier would be on the same spindles, so the only performance > improvement would be from the faster writes with replication. So we don't > want to really move data between tiers. > > The idea would be to not promote on read since EC read performance is good > enough and have writes go to the cache tier where the data may be 'hot' for > a week or so, then get cold. > > It seems that we would only need one hit_set and if -1 can't be set for > min_read_recency_for_promote, I could probably use 2 which would never hit > because there is only one set, but that may error too. The follow up is how > big a set should be as it only really tells if an object "may" be in cache > and does not determine when things are flushed, so it really only matters > how out-of-date we are okay with the bloom filter being out of date, right? > So we could have it be a day long if we are okay with that stale rate? Is > there any advantage to having a longer period for a bloom filter? Now, I'm > starting to wonder if I even need a bloom filter for this use case, can I > get tiering to work without it and only use > cache_min_flush_age/cach_min_evict_age since I don't care about promoting > when there are X hits in Y time? > > Thanks > ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > --00000000000024d4be0598c041cd > Content-Type: text/html; charset="UTF-8" > Content-Transfer-Encoding: quoted-printable > > <div dir=3D"ltr">I'd like to configure a cache tier to act as a write > b= > uffer, so that if writes come in, it promotes objects, but reads never > prom= > ote an object. We have a lot of cold data so we would like to tier down to > = > an EC pool (CephFS) after a period of about 30 days to save space. The > stor= > age tier and the 'cache' tier would be on the same spindles, so > the= > only performance improvement would be from the faster writes with > replicat= > ion. So we don't want to really move data between > tiers.<div><br></div>= > <div>The idea would be to not promote on read since EC read performance is > = > good enough and have writes go to the cache tier where the data may be > &#39= > ;hot' for a week or so, then get cold.</div><div><br></div><div>It > seem= > s that we would only need one hit_set and if -1 can't be set for > min_re= > ad_recency_for_promote, I could probably use 2 which would never hit > becaus= > e there is only one set, but that may error too. The follow up is how big > a= > set should be as it only really tells if an object "may" be in > c= > ache and does not determine when things are flushed, so it really only > matt= > ers how out-of-date we are okay with the bloom filter being out of date, > ri= > ght? So we could have it be a day long if we are okay with that stale > rate?= > Is there any advantage to having a longer period for a bloom filter? Now, > = > I'm starting to wonder if I even need a bloom filter for this use > case,= > can I get tiering to work without it and only use > cache_min_flush_age/cach= > _min_evict_age since I don't care about promoting when there are X > hits= > in Y time?</div><div><br></div><div>Thanks<br clear=3D"all"><div><div dir= > =3D"ltr" class=3D"gmail_signature" > data-smartmail=3D"gmail_signature">-----= > -----------<br>Robert LeBlanc<br>PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 = > =C2=A0C70E E654 3BB2 FA62 B9F1</div></div></div></div> > > --00000000000024d4be0598c041cd-- > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > > > ------------------------------ > > End of ceph-users Digest, Vol 83, Issue 5 > ***************************************** > -- *-----------------------------------------------------------------------------------------* *This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this email. Please notify the sender immediately by email if you have received this email by mistake and delete this email from your system. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.***** **** *Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the organization. Any information on shares, debentures or similar instruments, recommended product pricing, valuations and the like are for information purposes only. It is not meant to be an instruction or recommendation, as the case may be, to buy or to sell securities, products, services nor an offer to buy or sell securities, products or services unless specifically stated to be so on behalf of the Flipkart group. Employees of the Flipkart group of companies are expressly required not to make defamatory statements and not to infringe or authorise any infringement of copyright or any other legal right by email communications. Any such communication is contrary to organizational policy and outside the scope of the employment of the individual concerned. The organization will not accept any liability in respect of such communication, and the employee responsible will be personally liable for any damages or other liability arising.***** **** *Our organization accepts no liability for the content of this email, or for the consequences of any actions taken on the basis of the information *provided,* unless that information is subsequently confirmed in writing. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.* _-----------------------------------------------------------------------------------------_

4 years, 4 months

2
1
0 0

Behavior of EC pool when a host goes offline

by majia xiao

Hi all, We have a Ceph（version 12.2.4）cluster that adopts EC pools, and it consists of 10 hosts for OSDs. The corresponding commands to create the EC pool are listed as follows: ceph osd erasure-code-profile set profile_jerasure_4_3_reed_sol_van \ plugin=jerasure \ k=4 \ m=3 \ technique=reed_sol_van \ packetsize=2048 \ crush-device-class=hdd \ crush-failure-domain=host ceph osd pool create pool_jerasure_4_3_reed_sol_van 2048 2048 erasure profile_jerasure_4_3_reed_sol_van Since that the EC pool's crush-failure-domain is configured to be "host", we just disable the network interfaces of some hosts (using "ifdown" command) to verify the functionality of the EC pool. And here are the phenomena we have observed: First of all, the IO rate (of "rados bench", which we used for benchmark) drops immediately to 0 when one host goes offline. Secondly, it takes a lot of time (around 100 seconds) for Ceph to detect corresponding OSDs on that host are down. Finally, once the Ceph has detected all offline OSDs, the EC pool seems to act normally and it is ready for IO operations again. So, here are my questions: 1. Is this normal that the IO rate drops to 0 immediately even though there is only one host goes offline? 2. How to make Ceph reduce the time needed to detect failed OSDs? Thanks for any help. Best regards, Majia Xiao

4 years, 4 months

2
1
0 0

how to speed up mount a ceph fs when a node unusual down in ceph cluster

by hfx＠portsip.cn

I created one ceph cluster. node-1: mon, mgr, osd.0, mds node-2: mon, mgr, osd.1, mds node-3: mon, mgr, osd.2, mds When the cluster is working normally, using command "mount -t ceph <node-*-ip:6789>:/ /mnt -o name=admin,secret=<admin client secret>" to mount is ok. But when a node unusual down(like poweroff), and using same command to mount, it will be hanging long time(maybe more then 1 minutes). I trying to configure "mds reconnect timeout = 0" in ceph.conf, mount time has been shortened. My Question: What are the configurations that affect the ceph file system mount in this scenario? BRs. hfx(a)portsip.cn

4 years, 4 months

1
0
0 0

Possible data corruption with 14.2.3 and 14.2.4

by Sage Weil

Hi everyone, We've identified a data corruption bug[1], first introduced[2] (by yours truly) in 14.2.3 and affecting both 14.2.3 and 14.2.4. The corruption appears as a rocksdb checksum error or assertion that looks like os/bluestore/fastbmap_allocator_impl.h: 750: FAILED ceph_assert(available >= allocated) or in some cases a rocksdb checksum error. It only affects BlueStore OSDs that have a separate 'db' or 'wal' device. We have a fix[3] that is working its way through testing, and will expedite the next Nautilus point release (14.2.5) once it is ready. If you are running 14.2.2 or 14.2.1 and use BlueStore OSDs with separate 'db' volumes, you should consider waiting to upgrade until 14.2.5 is released. A big thank you to Igor Fedotov and several *extremely* helpful users who managed to reproduce and track down this problem! sage [1] https://tracker.ceph.com/issues/42223 [2] https://github.com/ceph/ceph/commit/096033b9d931312c0688c2eea7e14626bfde0ad… [3] https://github.com/ceph/ceph/pull/31621

4 years, 4 months

5
8
0 0

Can min_read_recency_for_promote be -1

by Robert LeBlanc

I'd like to configure a cache tier to act as a write buffer, so that if writes come in, it promotes objects, but reads never promote an object. We have a lot of cold data so we would like to tier down to an EC pool (CephFS) after a period of about 30 days to save space. The storage tier and the 'cache' tier would be on the same spindles, so the only performance improvement would be from the faster writes with replication. So we don't want to really move data between tiers. The idea would be to not promote on read since EC read performance is good enough and have writes go to the cache tier where the data may be 'hot' for a week or so, then get cold. It seems that we would only need one hit_set and if -1 can't be set for min_read_recency_for_promote, I could probably use 2 which would never hit because there is only one set, but that may error too. The follow up is how big a set should be as it only really tells if an object "may" be in cache and does not determine when things are flushed, so it really only matters how out-of-date we are okay with the bloom filter being out of date, right? So we could have it be a day long if we are okay with that stale rate? Is there any advantage to having a longer period for a bloom filter? Now, I'm starting to wonder if I even need a bloom filter for this use case, can I get tiering to work without it and only use cache_min_flush_age/cach_min_evict_age since I don't care about promoting when there are X hits in Y time? Thanks ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1

4 years, 4 months

2
1
0 0

ceph node crashed with these errors "kernel: ceph: build_snap_context" (maybe now it is urgent?)

by Marc Roos

I have been asking before[1]. Since Nautilus upgrade I am having these, with a total node failure as a result(?). Was not expecting this in my 'low load' setup. Maybe now someone can help resolving this? I am also waiting quite some time to get access at https://tracker.ceph.com/issues. Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9287 ffff911a9a26bd00 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9283 ffff911d34e69d00 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9276 ffff911d34e69c00 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c926c ffff912068b92c00 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9268 ffff912068b93000 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c926d ffff912068b92900 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c928a ffff912118e5be00 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9272 ffff9119950d9500 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9269 ffff911940f3d000 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9270 ffff911748427c00 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c926b ffff91169b000600 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9281 ffff91169b000500 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9288 ffff9115844d2500 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c927d ffff9115844d2e00 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9280 ffff91186401b000 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9267 ffff9121535ecc00 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c927c ffff9121cecb1e00 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9271 ffff9121cecb0400 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9279 ffff911d26646300 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c927f ffff911d26646900 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9275 ffff9121cecb1700 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9259 ffff91170c9f6600 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9257 ffff9118ef2a8000 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c924e ffff911a1e091800 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9262 ffff911a1e090c00 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9266 ffff9115e3859500 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c924f ffff9118aefd1300 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c925f ffff91170c9f6100 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9252 ffff9115e3859800 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9256 ffff912045dc5300 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9254 ffff91170c9f6900 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9261 ffff91170c9f7100 fail -12 Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020d4ec4 ffff9118aefd0000 fail -12 [1] https://www.mail-archive.com/ceph-users@ceph.io/msg01088.html https://www.mail-archive.com/ceph-users@ceph.io/msg00969.html

4 years, 4 months

2
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users December 2019