- ceph-users - lists.ceph.io

Re: FileStore OSD, journal direct symlinked, permission troubles.

by Alwin Antreich

Hello Marco, On Thu, Aug 29, 2019 at 12:55:56PM +0200, Marco Gaiarin wrote: > > I've just finished a double upgrade on my ceph (PVE-based) from hammer > to jewel and from jewel to luminous. > > All went well, apart that... OSD does not restart automatically, > because permission troubles on the journal: > > Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: starting osd.2 at - osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal > Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.449886 7fa505a43e00 -1 filestore(/var/lib/ceph/osd/ceph-2) mount(1822): failed to open journal /var/lib/ceph/osd/ceph-2/journal: (13) Permission denied > Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.453524 7fa505a43e00 -1 osd.2 0 OSD:init: unable to mount object store > Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.453535 7fa505a43e00 -1 #033[0;31m ** ERROR: osd init failed: (13) Permission denied#033[0m > > > A little fast rewind: when i've setup the cluster i've used some 'old' > servers, using a couple of SSD disks as SO and as journal. > Because servers was old, i was forced to partition the boot disk in > DOS, not GPT mode. > > While creating the OSD, i've received some warnings: > > WARNING:ceph-disk:Journal /dev/sdaX was not prepared with ceph-disk. Symlinking directly. > > > Looking at the cluster now, seems to me that osd init scripts try to > idetify journal based on GPT partition label/info, and clearly fail. > > > Not that if i do, on servers that hold OSD: > > for l in $(readlink -f /var/lib/ceph/osd/ceph-*/journal); do chown ceph: $l; done > > OSD start flawlessy. > > > There's something i can do? Thanks. Did you go through our upgrade guide(s)? See the link [0] below, for the permission changes. They are needed when an upgrade from Hammer to Jewel is done. On the wiki you can also find the upgrade guides for PVE 5.x -> 6.x and Luminous -> Nautilus. -- Cheers, Alwin [0] https://pve.proxmox.com/wiki/Ceph_Hammer_to_Jewel#Set_permission

4 years, 8 months

1
0
0 0

pg_autoscale HEALTH_WARN

by James Dingwall

Hi, I'm running a small nautilus cluster (14.2.2) which was recently upgraded from mimic (13.2.6). After the upgrade I enabled the pg_autoscaler which resulted in most of the pools having their pg count changed. All the remapping has completed but the cluster is still reporting a HEALTH_WARN. I have adjusted the target ratios such that sum < 1.0 but this didn't help. What else can I look at? Thanks, James # ceph -s cluster: id: ... health: HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes 1 subtrees have overcommitted pool target_size_ratio services: mon: 3 daemons, quorum ceph-00,ceph-01,ceph-02 (age 3d) mgr: ceph-01(active, since 6d), standbys: ceph-02, ceph-00 osd: 32 osds: 32 up (since 2d), 32 in (since 2d) rgw: 1 daemon active (rgw-00) data: pools: 14 pools, 1512 pgs objects: 4.17M objects, 16 TiB usage: 47 TiB used, 69 TiB / 116 TiB avail pgs: 1510 active+clean 2 active+clean+scrubbing+deep # ceph osd pool autoscale-status (this might wrap horribly...): POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE loc.rgw.buckets.index 0 3.0 116.1T 0.0000 1.0 4 on vms1 5318G 3.0 116.1T 0.1341 0.2000 1.0 256 on vms2 3419G 3.0 116.1T 0.0862 0.0200 1.0 64 on .rgw.root 3648k 3.0 116.1T 0.0000 1.0 4 on default.rgw.meta 384.0k 3.0 116.1T 0.0000 1.0 4 on lov.rgw.log 384.0k 3.0 116.1T 0.0000 1.0 4 on vms3 35799G 3.0 116.1T 0.9028 0.6000 1.0 1024 on default.rgw.control 0 3.0 116.1T 0.0000 1.0 4 on loc.rgw.meta 768.5k 3.0 116.1T 0.0000 1.0 4 on vms4 2306G 3.0 116.1T 0.0582 0.1000 1.0 128 on loc.rgw.buckets.non-ec 200.4k 3.0 116.1T 0.0000 1.0 4 on loc.rgw.buckets.data 56390M 3.0 116.1T 0.0014 1.0 4 on loc.rgw.control 0 3.0 116.1T 0.0000 1.0 4 on default.rgw.log 0 3.0 116.1T 0.0000 1.0 4 on Zynstra is a private limited company registered in England and Wales (registered number 07864369). Our registered office and Headquarters are at The Innovation Centre, Broad Quay, Bath, BA1 1UD. This email, its contents and any attachments are confidential. If you have received this message in error please delete it from your system and advise the sender immediately.

4 years, 8 months

1
0
0 0

the ceph rbd read dd with fio performance diffrent so huge?

by linghucongsong

The performance with the dd and fio diffrent is so huge? I have 25 OSDS with 8TB hdd. with dd I only get 410KB/s read perfomance,but with fio I get 991.23MB/s read perfomance. like below: Thanks in advance! root@Server-d5754749-cded-4964-8129-ba1accbe86b3:~# time dd of=/dev/zero if=/mnt/testw.dbf bs=4k count=10000 iflag=direct 10000+0 records in 10000+0 records out 40960000 bytes (41 MB, 39 MiB) copied, 99.9445 s, 410 kB/s real 1m39.950s user 0m0.040s sys 0m0.292s root@Server-d5754749-cded-4964-8129-ba1accbe86b3:~# fio --filename=/mnt/test1 -direct=1 -iodepth 1 -thread -rw=read -ioengine=libaio -bs=4k -size=1G -numjobs=30 -runtime=10 -group_reporting -name=mytest mytest: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1 ... fio-2.2.10 Starting 30 threads Jobs: 30 (f=30): [R(30)] [100.0% done] [1149MB/0KB/0KB /s] [294K/0/0 iops] [eta 00m:00s] mytest: (groupid=0, jobs=30): err= 0: pid=5261: Tue Aug 27 13:37:28 2019 read : io=9915.2MB, bw=991.23MB/s, iops=253752, runt= 10003msec slat (usec): min=2, max=200020, avg=39.10, stdev=1454.14 clat (usec): min=1, max=160019, avg=38.57, stdev=1006.99 lat (usec): min=4, max=200022, avg=87.37, stdev=1910.99 clat percentiles (usec): | 1.00th=[ 1], 5.00th=[ 1], 10.00th=[ 1], 20.00th=[ 1], | 30.00th=[ 1], 40.00th=[ 1], 50.00th=[ 1], 60.00th=[ 1], | 70.00th=[ 1], 80.00th=[ 2], 90.00th=[ 2], 95.00th=[ 2], | 99.00th=[ 612], 99.50th=[ 684], 99.90th=[ 780], 99.95th=[ 1020], | 99.99th=[56064] bw (KB /s): min= 7168, max=46680, per=3.30%, avg=33460.79, stdev=12024.35 lat (usec) : 2=73.62%, 4=22.38%, 10=0.05%, 20=0.03%, 50=0.01% lat (usec) : 100=0.01%, 250=0.03%, 500=1.93%, 750=1.75%, 1000=0.14% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% lat (msec) : 100=0.03%, 250=0.01% cpu : usr=1.83%, sys=4.30%, ctx=104743, majf=0, minf=59 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=2538284/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: io=9915.2MB, aggrb=991.23MB/s, minb=991.23MB/s, maxb=991.23MB/s, mint=10003msec, maxt=10003msec Disk stats (read/write): vdb: ios=98460/0, merge=0/0, ticks=48840/0, in_queue=49144, util=17.28%

4 years, 8 months

3
3
0 0

Upgrade procedure on Ubuntu Bionic with stock packages

by Mark Schouten

Hi, I have a cluster running on Ubuntu Bionic, with stock Ubuntu Ceph packages. When upgrading, I always try to follow the procedure as documented here: https://docs.ceph.com/docs/master/install/upgrading-ceph/ However, the Ubuntu packages restart all daemons upon upgrade, per node. So if I upgrade the first node, it will restart mon, osds, rgw, and mds'es on that node, even though the rest of the cluster is running the old version. I tried upgrading a single package, to see how that goes, but due to dependencies in dpkg, all other packages are upgraded as well. How should I proceed? Thanks, -- Mark Schouten <mark(a)tuxis.nl> Tuxis, Ede, https://www.tuxis.nl T: +31 318 200208

4 years, 8 months

2
3
0 0

3 OSD down and unable to start

by Jordi Blasco

Hello, I've been facing some issues with a single node ceph cluster (mimic). I know an environment like this shouldn't be in production but the server end up dealing with operational workloads for the last 2 years. Some users detected some issues in cephfs; some files not being accessible and hanging the node while trying to list the content of affected folders. I noticed a heavy memory load on the server. Main memory was consumed by cache as well as quite a reasonable swap. The command "ceph health detail" reported some inactive PGs. Those PGs didn't exist. After rebooting the node, an fsck was run in the 3 affected OSDs. ceph-bluestore-tool fsck --deep yes --path /var/lib/ceph/osd/ceph-1/ Unfortunately, all of them crashed with a core dump and now they don't start anymore. The logs report messages like: 2019-08-28 03:00:12.999 7f21d787c240 4 rocksdb: [/build/ceph-13.2.1/src/rocksdb/db/version_set.cc:3088] Recovering from manifest file: MANIFEST-004059 2019-08-28 03:00:12.999 7f21d787c240 4 rocksdb: [/build/ceph-13.2.1/src/rocksdb/db/db_impl.cc:252] Shutdown: canceling all background work 2019-08-28 03:00:12.999 7f21d787c240 4 rocksdb: [/build/ceph-13.2.1/src/rocksdb/db/db_impl.cc:397] Shutdown complete 2019-08-28 03:00:12.999 7f21d787c240 -1 rocksdb: NotFound: 2019-08-28 03:00:12.999 7f21d787c240 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_db erroring opening db: 2019-08-28 03:00:12.999 7f21d787c240 1 bluefs umount 2019-08-28 03:00:12.999 7f21d787c240 1 stupidalloc 0x0x5650c5255800 shutdown 2019-08-28 03:00:12.999 7f21d787c240 1 bdev(0x5650c5604a80 /var/lib/ceph/osd/ceph-0/block) close 2019-08-28 03:00:13.247 7f21d787c240 1 bdev(0x5650c5604700 /var/lib/ceph/osd/ceph-0/block) close 2019-08-28 03:00:13.479 7f21d787c240 -1 osd.0 0 OSD:init: unable to mount object store 2019-08-28 03:00:13.479 7f21d787c240 -1 ** ERROR: osd init failed: (5) Input/output error I'm not sure if the fsck has introduced additional damage. After that, I tried to mark unfound as lost with the following commands: ceph pg 4.1e mark_unfound_lost revert ceph pg 9.1d mark_unfound_lost revert ceph pg 13.3 mark_unfound_lost revert ceph pg 13.e mark_unfound_lost revert Currently, since there are 3 OSD down, there are: 316 unclean PGs 76 inactive PGs root@ceph-s01:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -2 0.43599 root ssd -4 0.43599 disktype ssd_disk 12 ssd 0.43599 osd.12 up 1.00000 1.00000 -1 60.03792 root default -5 60.03792 disktype hdd_disk 0 hdd 0 osd.0 down 1.00000 1.00000 1 hdd 5.45799 osd.1 down 0 1.00000 2 hdd 5.45799 osd.2 up 1.00000 1.00000 3 hdd 5.45799 osd.3 up 1.00000 1.00000 4 hdd 5.45799 osd.4 up 1.00000 1.00000 5 hdd 5.45799 osd.5 up 1.00000 1.00000 6 hdd 5.45799 osd.6 up 1.00000 1.00000 7 hdd 5.45799 osd.7 down 0 1.00000 8 hdd 5.45799 osd.8 up 1.00000 1.00000 9 hdd 5.45799 osd.9 up 1.00000 1.00000 10 hdd 5.45799 osd.10 up 1.00000 1.00000 11 hdd 5.45799 osd.11 up 1.00000 1.00000 Running the following command, a MANIFEST file appeared in the folder db/lost. I guess that the repair moved here. # ceph-bluestore-tool bluefs-export --path /var/lib/ceph/osd/ceph-7 --out-dir osd7/ ... db/LOCK db/MANIFEST-000001 db/OPTIONS-018543 db/OPTIONS-018581 db/lost/ db/lost/MANIFEST-018578 Any ideas? Suggestions? Thank you. Regards, Jordi

4 years, 8 months

1
0
0 0

Best way to stop an OSD form coming back online

by Cory Hawkless

I have an OSD that is throwing sense errors - It's at it's end of life and needs to be replaced. The server is in the datacentre and I won't get there for a few weeks so I've stopped the service (systemctl stop ceph-osd@208) and let the cluster rebalance, all is well. My thinking is that if for some reason the host that OSD208 resides within was to reboot, that OSD would start and become part of the cluster again. So I'd like to prevent this OSD from ever starting again without physically being able to remove it from the server. I was thinking that deleting it's key from the auth list might work. So a ceph osd purge 208 Then when the service tries to start it'll fail with an auth error. Any other suggestions? Cheers, Cory

4 years, 8 months

4
3
0 0

meta: lists.ceph.io password reset

by Dan van der Ster

Sorry to post this to the list, but does this lists.ceph.io password reset work for anyone? https://lists.ceph.io/accounts/password/reset/ For my accounts which are getting mail I have "The e-mail address is not assigned to any user account". Best Regards, Dan

4 years, 8 months

1
1
0 0

Re: MON DNS Lookup & Version 2 Protocol

by Ricardo Dias

Hi Dominic, I just created a feature ticket in the Ceph tracker to keep track of this issue. Here's the ticket: https://tracker.ceph.com/issues/41537 Cheers, Ricardo Dias On 17/07/19 20:06, DHilsbos(a)performair.com wrote: > All; > > I'm trying to firm up my understanding of how Ceph works, and ease of management tools and capabilities. > > I stumbled upon this: http://docs.ceph.com/docs/nautilus/rados/configuration/mon-lookup-dns/ > > It got me wondering; how do you convey protocol version 2 capabilities in this format? > > The examples all list port 6789, which is the port for protocol version 1. Would I add SRV records for port 3300? How does the client distinguish v1 from v2 in this case? > > Thank you, > > Dominic L. Hilsbos, MBA > Director - Information Technology > Perform Air International, Inc. > DHilsbos(a)PerformAir.com > www.PerformAir.com > > > _______________________________________________ > ceph-users mailing list > ceph-users(a)lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Ricardo Dias Senior Software Engineer - Storage Team SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)

4 years, 8 months

1
0
0 0

ceph's replicas question

by Wesley Peng

Hi, We have all SSD disks as ceph's backend storage. Consider the cost factor, can we setup the cluster to have only two replicas for objects? thanks & regards Wesley

4 years, 8 months

7
8
0 0

krdb upmap compatibility

by Frank R

It seems that with Linux kernel 4.16.10 krdb clients are seen as Jewel rather than Luminous. Can someone tell me which kernel version will be seen as Luminous as I want to enable the Upmap Balancer.

4 years, 8 months

4
5
0 0

2024

2023

2022

2021

2020

2019

ceph-users