October 2019 - ceph-users

by Konstantin Shalygin

On 10/27/19 6:01 AM, Frank R wrote: > I hate to be a pain but I have one more question. > > After I run > > radosgw-admin reshard stale-instances rm > > if I run > > radosgw-admin reshard stale-instances list > > some new entries appear for a bucket that no longer exists. Is there a > way to cancel the operation on the old bucket? > `radosgw-admin reshard stale-instances rm` should fix all your issues, if this not you should debug this. k

4 years, 5 months

2
3
0 0

Dirlisting hangs with cephfs

by Kári Bertilsson

This seems to happen mostly when listing folders containing 10k+ folders. The dirlisting hangs indefinitely or until i restart the active MDS and then the hanging "ls" command will finish running. Every time restarting the active MDS fixes the problem for a while.

4 years, 5 months

4
14
0 0

Correct Migration Workflow Replicated -> Erasure Code

by Mac Wynkoop

Hi Everyone, So, I'm in the process of trying to migrate our rgw.buckets.data pool from a replicated rule pool to an erasure coded pool. I've gotten the EC pool set up, good EC profile and crush ruleset, pool created successfully, but when I go to "rados cppool xxx.rgw.buckets.data xxx.rgw.buckets.data.new", I get this error after it transfers 4GB of data: error copying object: (2) No such file or directory error copying pool xxx.rgw.buckets.data => xxx.rgw.buckets.data.new: (2) No such file or directory Is "rados cppool" still the blessed way to do the migration, or has something better/not deprecated been developed that I can use? Thanks, Mac

4 years, 5 months

3
3
0 0

Lower mem radosgw config?

by Dan van der Ster

Hi all, Does anyone have a good config for lower memory radosgw machines? We have 16GB VMs and our radosgw's go OOM when we have lots of parallel clients (e.g. I see around 500 objecter_ops via the rgw asok). Maybe lowering rgw_thread_pool_size from 512 would help? (This is running latest luminous). Thanks, Dan

4 years, 5 months

2
1
0 0

ceph: build_snap_context 100020859dd ffff911cca33b800 fail -12

by Marc Roos

I am getting these since Nautilus upgrade [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859dd ffff911cca33b800 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859d2 ffff911d3eef5a00 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859d9 ffff9117b63e3900 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859de ffff9121cf0c3c00 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859da ffff911993e84700 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859e6 ffff911993e85500 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859e8 ffff911993e85e00 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859db ffff911993e85800 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859df ffff9119697ca700 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859e7 ffff91203fbdb600 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859e5 ffff9121cf0c2b00 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859dd ffff911cca33b800 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859d2 ffff911d3eef5a00 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859d9 ffff9117b63e3900 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859de ffff9121cf0c3c00 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859da ffff911993e84700 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859e6 ffff911993e85500 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859e8 ffff911993e85e00 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859db ffff911993e85800 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859df ffff9119697ca700 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859e7 ffff91203fbdb600 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859e5 ffff9121cf0c2b00 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859dd ffff911cca33b800 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859d2 ffff911d3eef5a00 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859de ffff9121cf0c3c00 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859e8 ffff911993e85e00 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859e7 ffff91203fbdb600 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859e5 ffff9121cf0c2b00 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859dd ffff911cca33b800 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859d2 ffff911d3eef5a00 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859de ffff9121cf0c3c00 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859e8 ffff911993e85e00 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859e7 ffff91203fbdb600 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859e5 ffff9121cf0c2b00 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859dd ffff911cca33b800 fail -12 [Wed Oct 30 01:32:09 2019] ceph: build_snap_context 100020859d2 ffff911d3eef5a00 fail -12

4 years, 5 months

1
0
0 0

V/v Log IP clinet in rados gateway log

by tuan dung

Hi all, I want to log IP client to rados gateway log to check information about loadbalancing and other things. I using LB before rados gateway nodes, what need to be done for configuration in rados gateway? thank you very much. Br, ---------------------------------------------- Dương Tuấn Dũng Email: dungdt.aicgroup(a)gmail.com ĐT: 0986153686

4 years, 5 months

2
1
0 0

pg stays in unknown states for a long time

by 潘东元

I have 104 pg stays in unknown states for a long time [root@node-1 /]# ceph -s cluster: id: 653c6c1a-607e-4a62-bb92-dfe2f0d7afb6 health: HEALTH_ERR 1 osds down Reduced data availability: 104 pgs inactive 24 slow requests are blocked > 32 sec. Implicated osds 0,1,2,8,9,10 14 stuck requests are blocked > 4096 sec. Implicated osds 5,6 services: mon: 3 daemons, quorum node-1,node-2,node-3 mgr: node-1(active), standbys: node-2, node-3 osd: 12 osds: 11 up, 12 in flags nodeep-scrub rbd-mirror: 1 daemon active data: pools: 7 pools, 360 pgs objects: 1.80k objects, 3.91GiB usage: 17.6GiB used, 7.96TiB / 7.98TiB avail pgs: 28.889% pgs unknown 256 active+clean 104 unknown io: client: 1.56MiB/s wr, 0op/s rd, 83op/s wr [root@node-1 /]# ceph health detail HEALTH_ERR Reduced data availability: 104 pgs inactive; 30 slow requests are blocked > 32 sec. Implicated osds 0,1,2,4,8,9,10; 14 stuck requests are blocked > 4096 sec. Implicated osds 5,6 PG_AVAILABILITY Reduced data availability: 104 pgs inactive pg 1.0 is stuck inactive for 2857.069686, current state unknown, last acting [] pg 1.1 is stuck inactive for 2857.069686, current state unknown, last acting [] pg 1.2 is stuck inactive for 2857.069686, current state unknown, last acting [] pg 1.3 is stuck inactive for 2857.069686, current state unknown, last acting [] pg 1.4 is stuck inactive for 2857.069686, current state unknown, last acting [] pg 1.5 is stuck inactive for 2857.069686, current state unknown, last acting [] pg 1.6 is stuck inactive for 2857.069686, current state unknown, last acting [] pg 1.7 is stuck inactive for 2857.069686, current state unknown, last acting [] pg 2.0 is stuck inactive for 2857.069686, current state unknown, last acting [] ...... [root@node-1 /]# ceph pg dump_stuck inactive ok PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY 3.1d unknown [] -1 [] -1 3.1c unknown [] -1 [] -1 3.1b unknown [] -1 [] -1 3.1a unknown [] -1 [] -1 3.19 unknown [] -1 [] -1 ...... my pool size = 3 [root@node-1 /]# ceph pg 3.1d query Error ENOENT: i don't have pgid 3.1d

4 years, 5 months

1
0
0 0

Several ceph osd commands hang

by Thomas Schneider

Hi, in my unhealthy cluster I cannot run several ceph osd command because they hang, e.g. ceph osd df ceph osd pg dump Also, ceph balancer status hangs. How can I fix this issue? THX

4 years, 5 months

3
5
0 0

After delete 8.5M Objects in a bucket still 500K left

by EDH - Manuel Rios Fernandez

Hi Ceph's! We started deteling a bucket several days ago. Total size 47TB / 8.5M objects. Now we see the cli bucket rm stucked and by console drop this messages. [root@ceph-rgw03 ~]# 2019-10-28 13:55:43.880 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 1000 incomplete multipart uploads 2019-10-28 13:56:24.021 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 2000 incomplete multipart uploads 2019-10-28 13:57:04.726 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 3000 incomplete multipart uploads 2019-10-28 13:57:45.424 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 4000 incomplete multipart uploads 2019-10-28 13:58:25.905 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 5000 incomplete multipart uploads 2019-10-28 13:59:06.898 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 6000 incomplete multipart uploads 2019-10-28 13:59:47.829 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 7000 incomplete multipart uploads 2019-10-28 14:00:42.102 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 8000 incomplete multipart uploads 2019-10-28 14:01:23.829 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 9000 incomplete multipart uploads 2019-10-28 14:02:06.028 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 10000 incomplete multipart uploads 2019-10-28 14:02:48.648 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 11000 incomplete multipart uploads 2019-10-28 14:03:29.807 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 12000 incomplete multipart uploads 2019-10-28 14:04:11.180 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 13000 incomplete multipart uploads 2019-10-28 14:04:52.396 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 14000 incomplete multipart uploads 2019-10-28 14:05:33.050 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 15000 incomplete multipart uploads 2019-10-28 14:06:13.652 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 16000 incomplete multipart uploads 2019-10-28 14:06:54.806 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 17000 incomplete multipart uploads 2019-10-28 14:07:35.867 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 18000 incomplete multipart uploads 2019-10-28 14:08:16.886 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 19000 incomplete multipart uploads 2019-10-28 14:08:57.711 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 20000 incomplete multipart uploads 2019-10-28 14:09:38.032 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 21000 incomplete multipart uploads 2019-10-28 14:10:18.377 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 22000 incomplete multipart uploads 2019-10-28 14:10:58.833 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 23000 incomplete multipart uploads 2019-10-28 14:11:39.078 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 24000 incomplete multipart uploads 2019-10-28 14:12:24.731 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 25000 incomplete multipart uploads 2019-10-28 14:13:12.176 7f0dd92c9700 0 abort_bucket_multiparts WARNING : aborted 26000 incomplete multipart uploads Bucket stats show 500K objects left. Looks like bucket rm is trying to abort all incompleted mutipart. But in bucket stats this operation is not reflected removing objects from stats. May be wait to get up 500K or it's a bug? Regards Manuel

4 years, 5 months

2
2
0 0

Compression on existing RGW buckets

by Bryan Stillwell

I'm wondering if it's possible to enable compression on existing RGW buckets? The cluster is running Luminous 12.2.12 with FileStore as the backend (no BlueStore compression then). We have a cluster that recently started to rapidly fill up with compressible content (qcow2 images) and I would like to enable compression for new uploads to slow the growth. The documentation seems to imply that changing zone placement rules can only be done at creation time. Is there something I'm missing that would allow me to enable compression on a per-bucket or even a per-user basis after a cluster has been used for quite a while? Thanks, Bryan

4 years, 5 months

3
3
0 0

2024

2023

2022

2021

2020

2019

ceph-users October 2019