November 2019 - ceph-users

by Dennis Højgaard | Powerhosting Support

OK, where to start. I have been debugging intensively the last two days, but can't seem to wrap my head around the performance issues we see in one of our two hyperconverged (ceph) proxmox clusters. Let me introduce our two clusters and some of the debugging results. *1. Cluster for internal purposes (performs as expected)* 3 x Supermicro servers with identical specs: CPU: 1 x 7-7700K CPU @ 4.20GHz (1 Socket) 4 cores / 8 threads RAM: 64 GB RAM OSDs: 4 per node. 1 per SSD (Intel S4610) (12 OSDs in all) 1 x 10GbE RJ45 nic. MTU 9000 No bonding _A total of 3 servers with a total of 12 OSDs_ Network: 1 x Unifi Switch 16 XG *2. Cluster for VPS's for customers (performs much worse than internal)* 3 x Dell R630 with the following specs: CPU: 2 x E5-2697 v3 @ 2.60GHz (2 Sockets) 28 cores / 56 threads RAM 256GB OSDs: 10 per node. 1 per SSD (Intel S4610) 1 x 10GbE SFP+ nic with 2 ports bonded via LACP (bond-xmit-hash-policy layer3+4). MTU 9000 2 x Supermicro X11SRM-VF with the following specs: CPU 1 x 1 W-2145 CPU @ 3.70GHz (1 Socket) 8 cores / 16 threads RAM: 256 GB OSDs 8 per node. 1 per SSD (Intel S4610) 1 x 10GbE SFP+ nic with 2 ports bonded via LACP (bond-xmit-hash-policy layer3+4). MTU 9000 1 x Dell R630 with the following specs: CPU 2 x CPU E5-2696 v4 @ 2.20GHz (2 Sockets) 44 cores / 88 threads RAM: 256 GB OSDs 8 per node. 1 per SSD (Intel S4610) 1 x 10GbE SFP+ nic with 2 ports bonded via LACP (bond-xmit-hash-policy layer3+4). MTU 9000 _A total of 6 servers with a total of 54 OSDs_ Network: 2 x Dell N4032F 10GbE SFP+ Switch connected with MLAG. Each node is connected to each switch. To get a fair comparison i made the following fio tests one one host in each cluster on a rbd block device that i created: *1. cluster:* |fio --randrepeat=1 --ioengine=libaio --sync=1 --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64 fio-3.12 Starting 1 process test: Laying out IO file (1 file / 4096MiB) Jobs: 1 (f=1): [w(1)][100.0%][w=25.0MiB/s][w=6409 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=869177: Wed Nov 13 11:15:59 2019 write: IOPS=4158, BW=16.2MiB/s (17.0MB/s)(4096MiB/252126msec); 0 zone resets bw ( KiB/s): min= 2075, max=32968, per=99.96%, avg=16627.60, stdev=9635.42, samples=504 iops : min= 518, max= 8242, avg=4156.88, stdev=2408.86, samples=504 cpu : usr=0.53%, sys=3.81%, ctx=109599, majf=0, minf=7 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=0,1048576,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): WRITE: bw=16.2MiB/s (17.0MB/s), 16.2MiB/s-16.2MiB/s (17.0MB/s-17.0MB/s), io=4096MiB (4295MB), run=252126-252126msec Disk stats (read/write): rbd0: ios=46/1221898, merge=0/1870438, ticks=25/4654920, in_queue=1980016, util=84.70%| *2. cluster* |fio --randrepeat=1 --ioengine=libaio --sync=1 --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64 fio-3.12 Starting 1 process Jobs: 1 (f=1): [w(1)][99.9%][w=7024KiB/s][w=1756 IOPS][eta 00m:01s] test: (groupid=0, jobs=1): err= 0: pid=794096: Wed Nov 13 11:25:56 2019 write: IOPS=1353, BW=5415KiB/s (5545kB/s)(4096MiB/774601msec); 0 zone resets bw ( KiB/s): min= 40, max=30600, per=100.00%, avg=5420.24, stdev=3710.17, samples=1547 iops : min= 10, max= 7650, avg=1355.06, stdev=927.54, samples=1547 cpu : usr=0.16%, sys=1.19%, ctx=100028, majf=0, minf=8 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=0,1048576,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): WRITE: bw=5415KiB/s (5545kB/s), 5415KiB/s-5415KiB/s (5545kB/s-5545kB/s), io=4096MiB (4295MB), run=774601-774601msec Disk stats (read/write): rbd0: ios=0/1222639, merge=0/1784089, ticks=0/12124812, in_queue=9514280, util=45.14%| And identical rados bench tests: 1. cluster:https://i.imgur.com/AdARCA6.png 2. clusterhttps://i.imgur.com/Di7mYQh.png I have fio tested all disks. I have tested the network. I can't seem to find the reason why performance on my 2. cluster is relatively poor compared to 1. cluster. -- Dennis Højgaard Powerhosting Support *t:* +45 7222 4457 |*e:* dh(a)powerhosting.dk |*w:* https://powerhosting.dk

4 years, 6 months

1
0
0 0

xattrs on snapshots

by Toby Darling

Hi Folks I'm writing a backup/sync process for our ceph cluster. The process takes a snapshot of the system that's being backed up and rsync's from that to another ceph cluster I was hoping to use the snapshot xattrs to verify a successful backup by comparing ceph.dir.rbytes and ceph.dir.rentries between the snapshot and backup target. However, while files added to the source directory after the snapshot is taken don't appear in the snapshot, ceph.dir.rbytes and ceph.dir.rctime are updated in the xattrs of both. $ fallocate -l 1K test1 $ getfattr -n ceph.dir.rbytes . # file: . ceph.dir.rbytes="1024" $ mkdir .snap/testsnap $ fallocate -l 2K test2 $ getfattr -n ceph.dir.rbytes . .snap/testsnap # file: . ceph.dir.rbytes="3072" # file: .snap/testsnap ceph.dir.rbytes="3072" $ getfattr -n ceph.dir.rctime . .snap/testsnap # file: . ceph.dir.rctime="1573558448.09736131125" # file: .snap/testsnap ceph.dir.rctime="1573558448.09736131125" $ tree -a --si . .snap . |-- [1.0k] test1 `-- [2.0k] test2 .snap `-- [3.1k] testsnap `-- [1.0k] test1 1 directory, 3 files Have I misunderstood? Is this expected behaviour? Cheers Toby -- Toby Darling, Scientific Computing (2N249) MRC Laboratory of Molecular Biology Francis Crick Avenue Cambridge Biomedical Campus Cambridge CB2 0QH Phone 01223 267070

4 years, 6 months

3
3
0 0

ceph clients and cluster map

by Frank R

Hi all, When the cluster map changes say due to a failed OSD how are the clients with established sessions notified? thx Frank

4 years, 6 months

1
0
0 0

Invisible braces

by bluetoothdental123＠gmail.com

https://www.bluetoothdentalclinic.co.in/invisalign Bluetooth dental clinic provides the cheapest invisible braces or invisalign in bangalore and the best clinic to get clear braces in india.

4 years, 6 months

1
0
0 0

Re: mds crash loop

by Karsten Nielsen

-----Original message----- From: Karsten Nielsen <karsten(a)foo-bar.dk> Sent: Tue 12-11-2019 10:30 Subject: [ceph-users] Re: mds crash loop To: Yan, Zheng <ukernel(a)gmail.com>; CC: ceph-users(a)ceph.io; > -----Original message----- > From: Yan, Zheng <ukernel(a)gmail.com> > Sent: Mon 11-11-2019 15:09 > Subject: Re: [ceph-users] Re: mds crash loop > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > CC: ceph-users(a)ceph.io; > > On Mon, Nov 11, 2019 at 5:09 PM Karsten Nielsen <karsten(a)foo-bar.dk> wrote: > > > > > > I started a job that moved some files around in the cephfs cluster that > > resulted in the mds to go back into the crash loop. > > > Logs are here: > > > http://s3.foo-bar.dk/mds-dumps/mds.log-20191111 > > > > > > Any help would be appriciated. > > > > > > > looks like snaptable is corrupted. > > > > nautilus version (14.2.2) of ‘cephfs-data-scan scan_links’ can fix > > snaptable. hopefully it will fix your issue. > > > > you don't need to upgrade whole cluster. Just install nautilus in a > > temp machine or compile ceph from source. > > I did run the command that you suggested, it did not unfortunately fix the > problem. > > http://s3.foo-bar.dk/mds-dumps/mds.log-20191112 > The output from the command is this: sudo docker exec -it rgw2 cephfs-data-scan scan_links 2019-11-12 08:46:27.025 7fe775dd7d80 -1 datascan.scan_links: Remove duplicated ino 0x0x100026d17d4 from 0x100013b0d3d/latest.log 2019-11-12 08:46:28.665 7fe775dd7d80 -1 datascan.load_table: unable to read mds table 'mds1_inotable': (2) No such file or directory 2019-11-12 08:46:28.665 7fe775dd7d80 -1 mds.1.inotable: erasing 0x20000000000 to 0x2000000d665 2019-11-12 08:46:28.793 7fe775dd7d80 -1 datascan.load_table: unable to read mds table 'mds2_inotable': (2) No such file or directory 2019-11-12 08:46:28.793 7fe775dd7d80 -1 mds.2.inotable: erasing 0x30000000000 to 0x300000228f5 2019-11-12 08:46:29.345 7fe775dd7d80 -1 mds.0.snap updating last_snap 1 -> 3 > > > > > > > > > > > - Karsten > > > > > > -----Original message----- > > > From: Yan, Zheng <ukernel(a)gmail.com> > > > Sent: Thu 07-11-2019 14:20 > > > Subject: Re: [ceph-users] Re: mds crash loop > > > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > > > CC: ceph-users(a)ceph.io; > > > > On Thu, Nov 7, 2019 at 6:40 PM Karsten Nielsen <karsten(a)foo-bar.dk> wrote: > > > > > > > > > > That is awesome. > > > > > > > > > > Now I just need to figure out where the lost+found files needs to go. > > > > > And what happened to the missing objects for the dirs. > > > > > > > > > > > > > lost+found files are likely files that were deleted. you can keep the > > > > lost+found dir for a while, then delete the 'lost+found' directory. > > > > > > > > for 'missing object' dirs, mv all of them to a temp directory, such as > > > > /mnt/cephfs/missing_obj_dirs. > > > > Then run command 'ceph daemon mds.x scrub_patch /missing_obj_dirs > > > > force recursive repair'. wait a minute, the rm -rf > > > > /mnt/cephfs/missing_obj_dirs > > > > > > > > > Any tool that is able to do that ? > > > > > > > > > > Thanks > > > > > - Karsten > > > > > > > > > > -----Original message----- > > > > > From: Yan, Zheng <ukernel(a)gmail.com> > > > > > Sent: Thu 07-11-2019 09:22 > > > > > Subject: Re: [ceph-users] Re: mds crash loop > > > > > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > > > > > CC: ceph-users(a)ceph.io; > > > > > > I have tracked down the root cause. See > > > > https://tracker.ceph.com/issues/42675 > > > > > > > > > > > > Regards > > > > > > Yan, Zheng > > > > > > > > > > > > On Thu, Nov 7, 2019 at 4:01 PM Karsten Nielsen <karsten(a)foo-bar.dk> > > wrote: > > > > > > > > > > > > > > -----Original message----- > > > > > > > From: Yan, Zheng <ukernel(a)gmail.com> > > > > > > > Sent: Thu 07-11-2019 07:21 > > > > > > > Subject: Re: [ceph-users] Re: mds crash loop > > > > > > > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > > > > > > > CC: ceph-users(a)ceph.io; > > > > > > > > On Thu, Nov 7, 2019 at 5:50 AM Karsten Nielsen > <karsten(a)foo-bar.dk> > > > > wrote: > > > > > > > > > > > > > > > > > > -----Original message----- > > > > > > > > > From: Yan, Zheng <ukernel(a)gmail.com> > > > > > > > > > Sent: Wed 06-11-2019 14:16 > > > > > > > > > Subject: Re: [ceph-users] mds crash loop > > > > > > > > > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > > > > > > > > > CC: ceph-users(a)ceph.io; > > > > > > > > > > On Wed, Nov 6, 2019 at 4:42 PM Karsten Nielsen > > <karsten(a)foo-bar.dk> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > -----Original message----- > > > > > > > > > > > From: Yan, Zheng <ukernel(a)gmail.com> > > > > > > > > > > > Sent: Wed 06-11-2019 08:15 > > > > > > > > > > > Subject: Re: [ceph-users] mds crash loop > > > > > > > > > > > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > > > > > > > > > > > CC: ceph-users(a)ceph.io; > > > > > > > > > > > > On Tue, Nov 5, 2019 at 5:29 PM Karsten Nielsen > > > > <karsten(a)foo-bar.dk> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > Last week I upgraded my ceph cluster from luminus to > mimic > > > > 13.2.6 > > > > > > > > > > > > > It was running fine for a while but yesterday my mds > went > > > > into a > > > > > > crash > > > > > > > > > > loop. > > > > > > > > > > > > > > > > > > > > > > > > > > I have 1 active and 1 standby mds for my cephfs both of > > which > > > > is > > > > > > > > running > > > > > > > > > > the > > > > > > > > > > > > same crash loop. > > > > > > > > > > > > > I am running ceph based on > > > > https://hub.docker.com/r/ceph/daemon > > > > > > > > version > > > > > > > > > > > > v3.2.7-stable-3.2-minic-centos-7-x86_64 with a etcd kv > > store. > > > > > > > > > > > > > > > > > > > > > > > > > > Log details are: https://paste.debian.net/1113943/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > please try again with debug_mds=20. Thanks > > > > > > > > > > > > > > > > > > > > > > > > Yan, Zheng > > > > > > > > > > > > > > > > > > > > > > Yes I have set that and had to move to pastebin.com as > debian > > > > > > apperently > > > > > > > > only > > > > > > > > > > supports 150k > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://pastebin.com/Gv7c5h54 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looks like on-disk root inode is corrupted. have you > > encountered any > > > > > > > > > > unusually things during the upgrade? > > > > > > > > > > > > > > > > > > > > please run 'rados -p <cephfs metadata pool> stat > > 1.00000000.inode' , > > > > > > > > > > check if the object is modified before or after the 'luminous > -> > > > > > > > > > > 13.2.6' upgrade. > > > > > > > > > > To fix the corrupted object. Run 'cephfs-data-scan init > > > > > > > > > > --force-init'. Then restart mds. After mds become active, run > > 'ceph > > > > > > > > > > daemon mds.x scrub_path / force repair' > > > > > > > > > > > > > > > > > > > > > > > > > > > > I followed the steps I got the mds started but now a lot of > files > > are > > > > in > > > > > > > > lost+found 24283 and I have these errors in the mds log > > > > > > > > > > > > > > > > > 'cephfs-data-scan init --force-init' does not move files into > > > > > > > > lost+found. have you ever run other 'cephfs-data-scan foo' > command > > or > > > > > > > > 'cephfs-journal-tool foo' command? > > > > > > > > > > > > > > I have had a similar problem with the cluster before where I went > > through > > > > the > > > > > > cycle of: > > > > > > > https://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/ -> Using > an > > > > > > alternate metadata pool for recovery > > > > > > > > > > > > > > I did run the cephfs-journal-tool journal reset command, mostly > > because > > > > > > cephfs is not that utilized so I thought it was safe to do as after > the > > > > upgrade > > > > > > the cluster has not been used much, so data lose would be minimal - > > > > apparently > > > > > > I was wrong. > > > > > > > > > > > > > > > > > > > > > > > > 2019-11-06 20:20:18.215 7f0bd9090700 1 mds.0.32011 cluster > > recovered. > > > > > > > > > 2019-11-06 20:20:19.019 7f0bd2dfa700 0 > > mds.0.cache.dir(0x100013acfcb) > > > > > > > > _fetched missing object for [dir 0x100013acfcb > > > > > > /nextcloud/custom_apps/carnet/ > > > > > > > > [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741888|fetching f() > n() > > > > > > > > hs=0+0,ss=0+0 | waiter=1 authpin=1 0x55d4dc4f5100] > > > > > > > > > 2019-11-06 20:20:19.019 7f0bd2dfa700 -1 log_channel(cluster) log > > > > [ERR] : > > > > > > dir > > > > > > > > 0x100013acfcb object missing on disk; some files may be lost > > > > > > > > (/nextcloud/custom_apps/carnet) > > > > > > > > > 2019-11-06 20:20:19.275 7f0bd2dfa700 0 > > mds.0.cache.dir(0x100013a3156) > > > > > > > > _fetched missing object for [dir 0x100013a3156 > > > > /nextcloud/custom_apps/mail/ > > > > > > > > [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741888|fetching f() > n() > > > > > > > > hs=0+0,ss=0+0 | waiter=1 authpin=1 0x55d4dcc40000] > > > > > > > > > 2019-11-06 20:20:19.275 7f0bd2dfa700 -1 log_channel(cluster) log > > > > [ERR] : > > > > > > dir > > > > > > > > 0x100013a3156 object missing on disk; some files may be lost > > > > > > > > (/nextcloud/custom_apps/mail) > > > > > > > > > 2019-11-06 20:20:19.371 7f0bd2dfa700 0 > > mds.0.cache.dir(0x100013abb3c) > > > > > > > > _fetched missing object for [dir 0x100013abb3c > > > > > > > > /nextcloud/custom_apps/passwords/ [2,head] auth v=0 cv=0/0 > ap=1+0+0 > > > > > > > > state=1073741888|fetching f() n() hs=0+0,ss=0+0 | waiter=1 > authpin=1 > > > > > > > > 0x55d4dcc40700] > > > > > > > > > 2019-11-06 20:20:19.371 7f0bd2dfa700 -1 log_channel(cluster) log > > > > [ERR] : > > > > > > dir > > > > > > > > 0x100013abb3c object missing on disk; some files may be lost > > > > > > > > (/nextcloud/custom_apps/passwords) > > > > > > > > > 2019-11-06 20:20:19.383 7f0bd2dfa700 0 > > mds.0.cache.dir(0x100013a9b9b) > > > > > > > > _fetched missing object for [dir 0x100013a9b9b > > > > > > > > /nextcloud/custom_apps/phonetrack/ [2,head] auth v=0 cv=0/0 > ap=1+0+0 > > > > > > > > state=1073741888|fetching f() n() hs=0+0,ss=0+0 | waiter=1 > authpin=1 > > > > > > > > 0x55d4dcc40e00] > > > > > > > > > 2019-11-06 20:20:19.383 7f0bd2dfa700 -1 log_channel(cluster) log > > > > [ERR] : > > > > > > dir > > > > > > > > 0x100013a9b9b object missing on disk; some files may be lost > > > > > > > > (/nextcloud/custom_apps/phonetrack) > > > > > > > > > 2019-11-06 20:20:19.431 7f0bd2dfa700 0 > > mds.0.cache.dir(0x100013a2659) > > > > > > > > _fetched missing object for [dir 0x100013a2659 > > > > > > > > /nextcloud/custom_apps/richdocuments/ [2,head] auth v=0 cv=0/0 > > ap=1+0+0 > > > > > > > > state=1073741888|fetching f() n() hs=0+0,ss=0+0 | waiter=1 > authpin=1 > > > > > > > > 0x55d4dcc41500] > > > > > > > > > 2019-11-06 20:20:19.431 7f0bd2dfa700 -1 log_channel(cluster) log > > > > [ERR] : > > > > > > dir > > > > > > > > 0x100013a2659 object missing on disk; some files may be lost > > > > > > > > (/nextcloud/custom_apps/richdocuments) > > > > > > > > > 2019-11-06 20:20:22.360 7f0bd9090700 1 mds.k8s-node-01 > Updating > > MDS > > > > map > > > > > > to > > > > > > > > version 32015 from mon.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Karsten > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for any hints > > > > > > > > > > > > > - Karsten > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > > > > > > > > > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > > > > > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > > > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > > > > > > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io >

4 years, 6 months

2
1
0 0

diesel home delivery

by powerbaba48＠gmail.com

diesel home delivery bangalore: Get Diesel fuel to your generators and machines with no restrictions on minimum order and save money by avoiding fuel theft. Home delivery of fuel in Bangalore is simple, call 8884144444 or place order online. https://powerbaba.com

4 years, 6 months

1
0
0 0

french language training in bangalore

by speakingindiabangalore＠gmail.com

BONJOUR! Speakeng India Provides Best French Language Classes in Bangalore,With Our Certified Experts & Get 100% Assistance. http://www.speakengindia.com/french/

4 years, 6 months

1
0
0 0

wedding makeup

by Glossandglassblr＠Gmail.Com

The GlossnGlass are the best bridal makeup artist in bangalore go beyond just creating immaculate makeup looks. We also train and produce the finest makeup artists through our professional makeup artist courses.With a team of expert trainers from the makeup industry, our makeup academy offers various intensive professional and personal grooming courses. https://www.glossnglass.com/

4 years, 6 months

1
0
0 0

Bangalore b schools

by abbssm22＠gmail.com

<a href=https://www.abbssm.edu.in>Bangalore b schools</a>:best pgdm college in bangalore: Welcome to your future. We are the difference that makes you special. This is not just an institution, it's a springboard, and it's a catalyst. At ABBS, you will challenge yourself to learn, develop and r4e-engineer yourself to meet the demands of the country and the world. You are the future, and it is our responsibility to nurture the future.The MBA/PGDM at ABBS is specifically designed to prepare graduates in the emerging markets around the globe. The course is a transformative journey-offering unparalleled opportunity along with access to the best global management knowledge, corporate internships and placements from the finest companies in the market.this is one of the top 10 pgdm colleges in bangalore.

4 years, 6 months

1
0
0 0

Re: mds crash loop

by Karsten Nielsen

-----Original message----- From: Yan, Zheng <ukernel(a)gmail.com> Sent: Mon 11-11-2019 15:09 Subject: Re: [ceph-users] Re: mds crash loop To: Karsten Nielsen <karsten(a)foo-bar.dk>; CC: ceph-users(a)ceph.io; > On Mon, Nov 11, 2019 at 5:09 PM Karsten Nielsen <karsten(a)foo-bar.dk> wrote: > > > > I started a job that moved some files around in the cephfs cluster that > resulted in the mds to go back into the crash loop. > > Logs are here: > > http://s3.foo-bar.dk/mds-dumps/mds.log-20191111 > > > > Any help would be appriciated. > > > > looks like snaptable is corrupted. > > nautilus version (14.2.2) of ‘cephfs-data-scan scan_links’ can fix > snaptable. hopefully it will fix your issue. > > you don't need to upgrade whole cluster. Just install nautilus in a > temp machine or compile ceph from source. I did run the command that you suggested, it did not unfortunately fix the problem. http://s3.foo-bar.dk/mds-dumps/mds.log-20191112 > > > > > - Karsten > > > > -----Original message----- > > From: Yan, Zheng <ukernel(a)gmail.com> > > Sent: Thu 07-11-2019 14:20 > > Subject: Re: [ceph-users] Re: mds crash loop > > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > > CC: ceph-users(a)ceph.io; > > > On Thu, Nov 7, 2019 at 6:40 PM Karsten Nielsen <karsten(a)foo-bar.dk> wrote: > > > > > > > > That is awesome. > > > > > > > > Now I just need to figure out where the lost+found files needs to go. > > > > And what happened to the missing objects for the dirs. > > > > > > > > > > lost+found files are likely files that were deleted. you can keep the > > > lost+found dir for a while, then delete the 'lost+found' directory. > > > > > > for 'missing object' dirs, mv all of them to a temp directory, such as > > > /mnt/cephfs/missing_obj_dirs. > > > Then run command 'ceph daemon mds.x scrub_patch /missing_obj_dirs > > > force recursive repair'. wait a minute, the rm -rf > > > /mnt/cephfs/missing_obj_dirs > > > > > > > Any tool that is able to do that ? > > > > > > > > Thanks > > > > - Karsten > > > > > > > > -----Original message----- > > > > From: Yan, Zheng <ukernel(a)gmail.com> > > > > Sent: Thu 07-11-2019 09:22 > > > > Subject: Re: [ceph-users] Re: mds crash loop > > > > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > > > > CC: ceph-users(a)ceph.io; > > > > > I have tracked down the root cause. See > > > https://tracker.ceph.com/issues/42675 > > > > > > > > > > Regards > > > > > Yan, Zheng > > > > > > > > > > On Thu, Nov 7, 2019 at 4:01 PM Karsten Nielsen <karsten(a)foo-bar.dk> > wrote: > > > > > > > > > > > > -----Original message----- > > > > > > From: Yan, Zheng <ukernel(a)gmail.com> > > > > > > Sent: Thu 07-11-2019 07:21 > > > > > > Subject: Re: [ceph-users] Re: mds crash loop > > > > > > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > > > > > > CC: ceph-users(a)ceph.io; > > > > > > > On Thu, Nov 7, 2019 at 5:50 AM Karsten Nielsen <karsten(a)foo-bar.dk> > > > wrote: > > > > > > > > > > > > > > > > -----Original message----- > > > > > > > > From: Yan, Zheng <ukernel(a)gmail.com> > > > > > > > > Sent: Wed 06-11-2019 14:16 > > > > > > > > Subject: Re: [ceph-users] mds crash loop > > > > > > > > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > > > > > > > > CC: ceph-users(a)ceph.io; > > > > > > > > > On Wed, Nov 6, 2019 at 4:42 PM Karsten Nielsen > <karsten(a)foo-bar.dk> > > > > > wrote: > > > > > > > > > > > > > > > > > > > > -----Original message----- > > > > > > > > > > From: Yan, Zheng <ukernel(a)gmail.com> > > > > > > > > > > Sent: Wed 06-11-2019 08:15 > > > > > > > > > > Subject: Re: [ceph-users] mds crash loop > > > > > > > > > > To: Karsten Nielsen <karsten(a)foo-bar.dk>; > > > > > > > > > > CC: ceph-users(a)ceph.io; > > > > > > > > > > > On Tue, Nov 5, 2019 at 5:29 PM Karsten Nielsen > > > <karsten(a)foo-bar.dk> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > Last week I upgraded my ceph cluster from luminus to mimic > > > 13.2.6 > > > > > > > > > > > > It was running fine for a while but yesterday my mds went > > > into a > > > > > crash > > > > > > > > > loop. > > > > > > > > > > > > > > > > > > > > > > > > I have 1 active and 1 standby mds for my cephfs both of > which > > > is > > > > > > > running > > > > > > > > > the > > > > > > > > > > > same crash loop. > > > > > > > > > > > > I am running ceph based on > > > https://hub.docker.com/r/ceph/daemon > > > > > > > version > > > > > > > > > > > v3.2.7-stable-3.2-minic-centos-7-x86_64 with a etcd kv > store. > > > > > > > > > > > > > > > > > > > > > > > > Log details are: https://paste.debian.net/1113943/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > please try again with debug_mds=20. Thanks > > > > > > > > > > > > > > > > > > > > > > Yan, Zheng > > > > > > > > > > > > > > > > > > > > Yes I have set that and had to move to pastebin.com as debian > > > > > apperently > > > > > > > only > > > > > > > > > supports 150k > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://pastebin.com/Gv7c5h54 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looks like on-disk root inode is corrupted. have you > encountered any > > > > > > > > > unusually things during the upgrade? > > > > > > > > > > > > > > > > > > please run 'rados -p <cephfs metadata pool> stat > 1.00000000.inode' , > > > > > > > > > check if the object is modified before or after the 'luminous -> > > > > > > > > > 13.2.6' upgrade. > > > > > > > > > To fix the corrupted object. Run 'cephfs-data-scan init > > > > > > > > > --force-init'. Then restart mds. After mds become active, run > 'ceph > > > > > > > > > daemon mds.x scrub_path / force repair' > > > > > > > > > > > > > > > > > > > > > > > > > I followed the steps I got the mds started but now a lot of files > are > > > in > > > > > > > lost+found 24283 and I have these errors in the mds log > > > > > > > > > > > > > > > 'cephfs-data-scan init --force-init' does not move files into > > > > > > > lost+found. have you ever run other 'cephfs-data-scan foo' command > or > > > > > > > 'cephfs-journal-tool foo' command? > > > > > > > > > > > > I have had a similar problem with the cluster before where I went > through > > > the > > > > > cycle of: > > > > > > https://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/ -> Using an > > > > > alternate metadata pool for recovery > > > > > > > > > > > > I did run the cephfs-journal-tool journal reset command, mostly > because > > > > > cephfs is not that utilized so I thought it was safe to do as after the > > > upgrade > > > > > the cluster has not been used much, so data lose would be minimal - > > > apparently > > > > > I was wrong. > > > > > > > > > > > > > > > > > > > > > 2019-11-06 20:20:18.215 7f0bd9090700 1 mds.0.32011 cluster > recovered. > > > > > > > > 2019-11-06 20:20:19.019 7f0bd2dfa700 0 > mds.0.cache.dir(0x100013acfcb) > > > > > > > _fetched missing object for [dir 0x100013acfcb > > > > > /nextcloud/custom_apps/carnet/ > > > > > > > [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741888|fetching f() n() > > > > > > > hs=0+0,ss=0+0 | waiter=1 authpin=1 0x55d4dc4f5100] > > > > > > > > 2019-11-06 20:20:19.019 7f0bd2dfa700 -1 log_channel(cluster) log > > > [ERR] : > > > > > dir > > > > > > > 0x100013acfcb object missing on disk; some files may be lost > > > > > > > (/nextcloud/custom_apps/carnet) > > > > > > > > 2019-11-06 20:20:19.275 7f0bd2dfa700 0 > mds.0.cache.dir(0x100013a3156) > > > > > > > _fetched missing object for [dir 0x100013a3156 > > > /nextcloud/custom_apps/mail/ > > > > > > > [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741888|fetching f() n() > > > > > > > hs=0+0,ss=0+0 | waiter=1 authpin=1 0x55d4dcc40000] > > > > > > > > 2019-11-06 20:20:19.275 7f0bd2dfa700 -1 log_channel(cluster) log > > > [ERR] : > > > > > dir > > > > > > > 0x100013a3156 object missing on disk; some files may be lost > > > > > > > (/nextcloud/custom_apps/mail) > > > > > > > > 2019-11-06 20:20:19.371 7f0bd2dfa700 0 > mds.0.cache.dir(0x100013abb3c) > > > > > > > _fetched missing object for [dir 0x100013abb3c > > > > > > > /nextcloud/custom_apps/passwords/ [2,head] auth v=0 cv=0/0 ap=1+0+0 > > > > > > > state=1073741888|fetching f() n() hs=0+0,ss=0+0 | waiter=1 authpin=1 > > > > > > > 0x55d4dcc40700] > > > > > > > > 2019-11-06 20:20:19.371 7f0bd2dfa700 -1 log_channel(cluster) log > > > [ERR] : > > > > > dir > > > > > > > 0x100013abb3c object missing on disk; some files may be lost > > > > > > > (/nextcloud/custom_apps/passwords) > > > > > > > > 2019-11-06 20:20:19.383 7f0bd2dfa700 0 > mds.0.cache.dir(0x100013a9b9b) > > > > > > > _fetched missing object for [dir 0x100013a9b9b > > > > > > > /nextcloud/custom_apps/phonetrack/ [2,head] auth v=0 cv=0/0 ap=1+0+0 > > > > > > > state=1073741888|fetching f() n() hs=0+0,ss=0+0 | waiter=1 authpin=1 > > > > > > > 0x55d4dcc40e00] > > > > > > > > 2019-11-06 20:20:19.383 7f0bd2dfa700 -1 log_channel(cluster) log > > > [ERR] : > > > > > dir > > > > > > > 0x100013a9b9b object missing on disk; some files may be lost > > > > > > > (/nextcloud/custom_apps/phonetrack) > > > > > > > > 2019-11-06 20:20:19.431 7f0bd2dfa700 0 > mds.0.cache.dir(0x100013a2659) > > > > > > > _fetched missing object for [dir 0x100013a2659 > > > > > > > /nextcloud/custom_apps/richdocuments/ [2,head] auth v=0 cv=0/0 > ap=1+0+0 > > > > > > > state=1073741888|fetching f() n() hs=0+0,ss=0+0 | waiter=1 authpin=1 > > > > > > > 0x55d4dcc41500] > > > > > > > > 2019-11-06 20:20:19.431 7f0bd2dfa700 -1 log_channel(cluster) log > > > [ERR] : > > > > > dir > > > > > > > 0x100013a2659 object missing on disk; some files may be lost > > > > > > > (/nextcloud/custom_apps/richdocuments) > > > > > > > > 2019-11-06 20:20:22.360 7f0bd9090700 1 mds.k8s-node-01 Updating > MDS > > > map > > > > > to > > > > > > > version 32015 from mon.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Karsten > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for any hints > > > > > > > > > > > > - Karsten > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > > > > > > > > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > > > > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > > > > > > > > > > > _______________________________________________ > > > > ceph-users mailing list -- ceph-users(a)ceph.io > > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > > > > > >

4 years, 6 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users November 2019