April 2020 - ceph-users - lists.ceph.io

Goodbye and thanks for all the cuttlefish!

by Christopher Kunz

Hello ceph-{users,devel}, most of you might not know me, but I've been around a while. I've been on these lists since probably 2012, when my company, Filoo, started operating one of the early Ceph clusters for VM images. Much has changed since then, and today we shut down our biggest Ceph cluster. This was not for technical reasons, but purely for operational and commercial reasons. I still think Ceph is a superior concept and I'm sure it will continue to evolve and mature. The interactions on and off the Ceph lists, especially by Sage and many of the original Inktank team, have helped us tremendously, thanks a lot for that! I wish you all the best for the future of Ceph and will unsubscribe now. See you around! Best regads, Chris

4 years, 1 month

1
0
0 0

Ceph 14.2.8 radosgw start failed due to handle_auth_bad_method

by Amit Ghadge

Hi All, I seen one issue while deploying radosgw on CentOS 7. monitor and data node deploy successfully but in radosgw seen below issue. I set ceph config set mon auth_cluster_required cephx ceph config set mon auth_service_required cephx ceph config set mon auth_client_required cephx Error: $ /usr/bin/radosgw -f --cluster ceph --name client.radosgw.masifgw01 --setuser ceph --setgroup ceph --debug_rgw 20 2020-04-19 09:05:05.421 7fe2eeea9700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [1] but i only support [2,1] failed to fetch mon config (--no-mon-config to skip mon dump: $ ceph mon dump dumped monmap epoch 4 epoch 4 fsid 10122b0b-8375-4a5a-9769-ff25a2d1d8b4 last_changed 2020-04-18 20:48:29.497461 created 2020-04-18 19:42:20.346011 min_mon_release 14 (nautilus) 0: [v2:10.44.127.xxx:3300/0,v1:10.44.127.xxx:6789/0] mon.node01 1: [v2:10.44.127.xxx:3300/0,v1:10.44.127.xxx:6789/0] mon.node02 2: [v2:10.44.127.xxx:3300/0,v1:10.44.127.xxx:6789/0] mon.node03 Anyone can face same issue, please help here. Thanks, Amit G

4 years, 1 month

1
0
0 0

Re: Ghost usage on pool and unable to reclaim free space.

by Kári Bertilsson

Hi Khodayar That rados purge command did not actually seem to delete anything. It is still using 90GiB like before. I have 4 pools on these OSD's. But the "fast" pool still has 90GB but should be empty. Looking at the PG's in `ceph pg ls` LOG is using very little space on the pool. Are there any other logs ? # ceph osd pool ls detail pool 42 'cephfs_metadata' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 186707 lfor 0/124366/185624 flags hashpspool stripe_width 0 application cephfs pool 51 'rbd' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 186709 lfor 0/186698/186696 flags hashpspool,selfmanaged_snaps stripe_width 0 compression_algorithm lz4 compression_mode aggressive compression_required_ratio 0.9 application rbd removed_snaps [1~9] pool 53 'kube' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 186711 lfor 0/185831/185829 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd removed_snaps [1~3] pool 54 'fast' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 186713 flags hashpspool,selfmanaged_snaps stripe_width 0 compression_algorithm lz4 compression_mode aggressive compression_required_ratio 0.9 application cephfs removed_snaps [52~2,55~2,59~2] # ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME 175 nvme 1.81898 1.00000 1.8 TiB 361 GiB 342 GiB 9.7 GiB 9.8 GiB 1.5 TiB 19.38 0.23 160 up osd.175 176 nvme 1.81898 1.00000 1.8 TiB 361 GiB 342 GiB 9.5 GiB 9.6 GiB 1.5 TiB 19.36 0.23 160 up osd.176 # ceph df POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR cephfs_metadata 42 320 MiB 2.82M 320 MiB 0.01 1.4 TiB N/A N/A 2.82M 0 B 0 B rbd 51 417 GiB 106.93k 417 GiB 12.32 1.4 TiB N/A N/A 106.93k 83 GiB 167 GiB kube 53 38 B 3 38 B 0 1.4 TiB N/A N/A 3 0 B 0 B fast 54 90 GiB 5.25M 90 GiB 2.93 1.4 TiB N/A N/A 5.25M 0 B 0 B # ceph pg ls PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP... ... 54.3a 81396 0 0 0 1512202599 0 0 3020 active+clean 9m 186704'339296 186722:615571 54.3b 82612 0 0 0 1556478139 0 0 3073 active+clean 9m 186706'340432 186722:609005 54.3c 82526 0 0 0 1553816829 0 0 3031 active+clean 9m 186704'806269 186722:1175811 54.3d 82144 0 0 0 1454278881 0 0 3016 active+clean 9m 186706'339167 186722:743880 54.3e 82740 0 0 0 1501081654 0 0 3056 active+clean 9m 186704'343212 186722:621323 54.3f 80846 0 0 0 1474357106 0 0 3048 active+clean 9m 186706'333741 186722:733149 On Sun, Apr 19, 2020 at 4:00 AM Khodayar Doustar <doustar(a)rayanexon.ir> wrote: > Hi Kári, > > You are purging only 2.6M objects with your purge command, so I guess that > 5.25M objects would be something else like logs. > Have you checked the osd df detail and osd metadata? > I have a case which osds bluestore logs are eating my whole space up, > maybe you are facing a similar one. > > Regards, > Khodayar > > On Sun, Apr 19, 2020 at 6:47 AM Kári Bertilsson <karibertils(a)gmail.com> > wrote: > >> Hello >> >> Running ceph v14.2.8 on everything. Pool is using replicated_rule with >> size/min_size 2 with 2 OSD's. I have scrubbed and deep scrubbed the OSDs. >> >> This pool was attached as data pool to cephfs containing alot of small >> files. I have since removed all files and detached the pool from the fs. >> Somehow the pool is still using 90GB. >> >> POOLS: >> POOL ID STORED OBJECTS USED %USED >> MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR >> UNDER COMPR >> fast 54 90 GiB 5.25M 90 GiB 2.93 >> 1.4 TiB N/A N/A 5.25M 0 B >> 0 B >> >> # rados -p fast ls|wc -l >> 1312030 >> >> # rados -p fast stat 100073acce3.00000000 >> error stat-ing fast/100073acce3.00000000: (2) No such file or directory >> >> # rados -p fast rm 100073acce3.00000000 >> error removing fast>100073acce3.00000000: (2) No such file or directory >> >> # rados -p fast get 100073acce3.00000000 test >> error getting fast/100073acce3.00000000: (2) No such file or directory >> >> I get the same error for every single object in the pool >> >> # rados purge fast --yes-i-really-really-mean-it >> Warning: using slow linear search >> Removed 2625749 objects >> successfully purged pool fast >> >> There was 5.25M objects in the pool before and after running this command. >> No change in ceph df. >> >> Any ideas how to reclaim the free space ? I can remove & recreate the >> pool, >> but i would like to know why and how to deal with this situation when i >> don't have that privilege >> >> Best regards >> Kári Bertilsson >> _______________________________________________ >> ceph-users mailing list -- ceph-users(a)ceph.io >> To unsubscribe send an email to ceph-users-leave(a)ceph.io >> >

4 years, 1 month

1
0
0 0

Ghost usage on pool and unable to reclaim free space.

by Kári Bertilsson

Hello Running ceph v14.2.8 on everything. Pool is using replicated_rule with size/min_size 2 with 2 OSD's. I have scrubbed and deep scrubbed the OSDs. This pool was attached as data pool to cephfs containing alot of small files. I have since removed all files and detached the pool from the fs. Somehow the pool is still using 90GB. POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR fast 54 90 GiB 5.25M 90 GiB 2.93 1.4 TiB N/A N/A 5.25M 0 B 0 B # rados -p fast ls|wc -l 1312030 # rados -p fast stat 100073acce3.00000000 error stat-ing fast/100073acce3.00000000: (2) No such file or directory # rados -p fast rm 100073acce3.00000000 error removing fast>100073acce3.00000000: (2) No such file or directory # rados -p fast get 100073acce3.00000000 test error getting fast/100073acce3.00000000: (2) No such file or directory I get the same error for every single object in the pool # rados purge fast --yes-i-really-really-mean-it Warning: using slow linear search Removed 2625749 objects successfully purged pool fast There was 5.25M objects in the pool before and after running this command. No change in ceph df. Any ideas how to reclaim the free space ? I can remove & recreate the pool, but i would like to know why and how to deal with this situation when i don't have that privilege Best regards Kári Bertilsson

4 years, 1 month

1
0
0 0

some ceph general questions about osd and pg

by harald.freidhof＠gmail.com

Hello togehter we want to implement a 3 nodes ceph cluster wirh nautilus. i already tested some ceph installations in our test enviroment and i have some generall questions. end of this month we will have three physical server with 256gb ram and two cpus and nerly 48 x 6tb disks. iam a little bit confused to calculate the pgs with the pgcalc on the ceph side. in the field osds what exactly meands that? the 3 physicaly osd nodes or the disks that will be used for the osds? what can you recomment us? we want later connect the ceph with rados and openstack and lvm thx in advance hfreidhof

4 years, 1 month

3
3
0 0

Does ceph support CentOS8/RHEL8 now?

by FuLong Wang

Hi Everyone, We want to deploy ceph on CentOS8/RHEL8 environment, is it supported now? From below link, i see only CentOS7/RHEL7 supported, right? https://docs.ceph.com/docs/master/start/os-recommendations/ -- FuLong Wang _______________________________________________

4 years, 1 month

1
0
0 0

Understanding PG peering count increase without osdmap changes

by Prasad Krishnan

Hi, I am trying to understand how the PG peering count could increase between two OSDMAP epochs when none of the OSDs went down or up, nor were there changes in their weights. This behaviour was seen on a Hammer cluster, but I guess it should be no different from the recent releases. I'm pasting the relevant piece of monitor logs below and attaching the relevant odmap dumps. As it can be seen, the changes to the osdmap between dump_e1053237 and dump_e10532398 seem to be restricted to a few pg_temp additions and changes to up_thru, but the OSD weights and OSD up/down state have remained the same. Kindly let me know what I'm missing here in my understanding. Thanks, K.Prasad Monitor logs -------------------- 2020-04-15 08:06:10.956681 mon.1 10.33.89.159:6789/0 52232955 : cluster [INF] pgmap v107549051: 18432 pgs: 18332 active+clean, 1 peering, 99 active+remapped+backfill_toofull; 330 TB data, 518 TB used, 596 TB / 1114 TB avail; 34776/1684702815 objects degraded (0.002%); 1766213/1684702815 objects misplaced (0.105%) 2020-04-15 08:06:11.862613 mon.1 10.33.89.159:6789/0 52232956 : cluster [INF] osdmap e1053236: 388 osds: 315 up, 308 in 2020-04-15 08:06:11.868380 mon.1 10.33.89.159:6789/0 52232957 : cluster [INF] pgmap v107549052: 18432 pgs: 18332 active+clean, 1 peering, 99 active+remapped+backfill_toofull; 330 TB data, 518 TB used, 596 TB / 1114 TB avail; 34776/1684702815 objects degraded (0.002%); 1766213/1684702815 objects misplaced (0.105%) 2020-04-15 08:06:10.460979 osd.57 10.33.249.161:6811/4066471 200448 : cluster [INF] 5.356s0 restarting backfill on osd.382(0) from (936954'62108,945456'62308] 0//0//-1 to 1053229'64769 2020-04-15 08:06:12.696839 mon.1 10.33.89.159:6789/0 52232959 : cluster [INF] osdmap e1053237: 388 osds: 315 up, 308 in 2020-04-15 08:06:12.703424 mon.1 10.33.89.159:6789/0 52232960 : cluster [INF] pgmap v107549053: 18432 pgs: 18332 active+clean, 1 peering, 99 active+remapped+backfill_toofull; 330 TB data, 518 TB used, 596 TB / 1114 TB avail; 34776/1684702815 objects degraded (0.002%); 1766213/1684702815 objects misplaced (0.105%) 2020-04-15 08:06:13.745554 mon.1 10.33.89.159:6789/0 52232961 : cluster [INF] osdmap e1053238: 388 osds: 315 up, 308 in 2020-04-15 08:06:13.899273 mon.1 10.33.89.159:6789/0 52232962 : cluster [INF] pgmap v107549054: 18432 pgs: 1 inactive, 1 active+remapped+wait_backfill+backfill_toofull, 2 activating, 17078 active+clean, 1251 peering, 2 activating+remapped, 5 remapped+peering, 1 activating+degraded, 91 active+remapped+backfill_toofull; 329 TB data, 518 TB used, 600 TB / 1118 TB avail; 32298/1678972195 objects degraded (0.002%); 1666716/1678972195 objects misplaced (0.099%) 2020-04-15 08:06:14.766039 mon.1 10.33.89.159:6789/0 52232963 : cluster [INF] osdmap e1053239: 388 osds: 315 up, 308 in 2020-04-15 08:06:14.833707 mon.1 10.33.89.159:6789/0 52232964 : cluster [INF] pgmap v107549055: 18432 pgs: 2 inactive, 1 active+remapped+wait_backfill+backfill_toofull, 3 activating, 16496 active+clean, 1794 peering, 4 activating+remapped, 42 remapped+peering, 3 activating+degraded, 87 active+remapped+backfill_toofull; 329 TB data, 518 TB used, 600 TB / 1118 TB avail; 30842/1678959760 objects degraded (0.002%); 1576530/1678959760 objects misplaced (0.094%) 2020-04-15 08:06:15.730301 mon.1 10.33.89.159:6789/0 52232965 : cluster [INF] osdmap e1053240: 388 osds: 315 up, 308 in 2020-04-15 08:06:15.774021 mon.1 10.33.89.159:6789/0 52232966 : cluster [INF] pgmap v107549056: 18432 pgs: 2 inactive, 1 active+remapped+wait_backfill+backfill_toofull, 3 activating, 16139 active+clean, 2103 peering, 6 activating+remapped, 92 remapped+peering, 3 activating+degraded, 83 active+remapped+backfill_toofull; 329 TB data, 518 TB used, 600 TB / 1118 TB avail; 29414/1678947337 objects degraded (0.002%); 1490753/1678947337 objects misplaced (0.089%) 2020-04-15 08:06:09.424940 osd.146 10.33.129.28:6800/2689 125912 : cluster [INF] 5.252fs0 restarting backfill on osd.382(3) from (935430'62507,945456'62707] 0//0//-1 to 1053229'65086 2020-04-15 08:06:16.713440 mon.1 10.33.89.159:6789/0 52232967 : cluster [INF] osdmap e1053241: 388 osds: 315 up, 308 in -- *-----------------------------------------------------------------------------------------* *This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this email. Please notify the sender immediately by email if you have received this email by mistake and delete this email from your system. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.***** **** *Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the organization. Any information on shares, debentures or similar instruments, recommended product pricing, valuations and the like are for information purposes only. It is not meant to be an instruction or recommendation, as the case may be, to buy or to sell securities, products, services nor an offer to buy or sell securities, products or services unless specifically stated to be so on behalf of the Flipkart group. Employees of the Flipkart group of companies are expressly required not to make defamatory statements and not to infringe or authorise any infringement of copyright or any other legal right by email communications. Any such communication is contrary to organizational policy and outside the scope of the employment of the individual concerned. The organization will not accept any liability in respect of such communication, and the employee responsible will be personally liable for any damages or other liability arising.***** **** *Our organization accepts no liability for the content of this email, or for the consequences of any actions taken on the basis of the information *provided,* unless that information is subsequently confirmed in writing. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.* _-----------------------------------------------------------------------------------------_

4 years, 1 month

1
0
0 0

HELP! Ceph( v 14.2.8) bucket notification dose not work!

by 曹海旺

Hi, I upgrade the ceph from 14.2.7 to the new version 14.2.8 . The bucket notification dose not work. I can’t create a TOPIC : I use post man to send a post flow by https://docs.ceph.com/docs/master/radosgw/notifications/#create-a-topic REQUEST: POST http://rgw1:7480/?Action=CreateTopic&Name=webno&push-endpoint=https://192.1… RESPONSE: <?xml version="1.0" encoding="UTF-8"?> <Error> <Code>MethodNotAllowed</Code> <RequestId>tx000000000000000000008-005e6a0eab-cbcad-bj</RequestId> <HostId>cbcad-bj-bjz</HostId> </Error> The debug info on the node below: 2020-03-12 18:49:24.684 7fdde1e1d700 1 ====== starting new request req=0x55c91a51e8f0 ===== 2020-03-12 18:49:24.684 7fdde1e1d700 2 req 14 0.000s initializing for trans_id = tx00000000000000000000e-005e6a13b4-cbc6a-bj 2020-03-12 18:49:24.684 7fdde1e1d700 10 rgw api priority: s3=8 s3website=7 2020-03-12 18:49:24.684 7fdde1e1d700 10 host=192.168.3.250 2020-03-12 18:49:24.684 7fdde1e1d700 20 subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 2020-03-12 18:49:24.684 7fdde1e1d700 20 final domain/bucket subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s->info.domain= s->info.request_uri=/ 2020-03-12 18:49:24.684 7fdde1e1d700 10 meta>> HTTP_X_AMZ_CONTENT_SHA256 2020-03-12 18:49:24.684 7fdde1e1d700 10 meta>> HTTP_X_AMZ_DATE 2020-03-12 18:49:24.684 7fdde1e1d700 10 x>> x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 2020-03-12 18:49:24.684 7fdde1e1d700 10 x>> x-amz-date:20200312T104924Z 2020-03-12 18:49:24.684 7fdde1e1d700 20 get_handler handler=26RGWHandler_REST_Service_S3 2020-03-12 18:49:24.684 7fdde1e1d700 10 handler=26RGWHandler_REST_Service_S3 2020-03-12 18:49:24.684 7fdde1e1d700 2 req 14 0.000s getting op 4 2020-03-12 18:49:24.684 7fdde1e1d700 20 handler->ERRORHANDLER: err_no=-2003 new_err_no=-2003 2020-03-12 18:49:24.684 7fdde1e1d700 2 req 14 0.000s http status=405 2020-03-12 18:49:24.684 7fdde1e1d700 1 ====== req done req=0x55c91a51e8f0 op status=0 http_status=405 latency=0s ====== 2020-03-12 18:49:25.502 7fde0fe79700 2 RGWDataChangesLog::ChangesRenewThread: start The same post works in the version 14.2.7 What is the correct way to create a Topic in version 14.2.8 ？

4 years, 1 month

3
3
0 0

Re: RGW and the orphans

by Eric Ivancich

> On Apr 16, 2020, at 1:58 PM, EDH - Manuel Rios <mriosfer(a)easydatahost.com> wrote: > > Hi Eric, > > Are there any ETA for get those script backported maybe in 14.2.10? > > Regards > Manuel There is a nautilus backport PR where the code works. It’s waiting on the added testing to be complete on master, so that can be backported as well. See: https://github.com/ceph/ceph/pull/34127 <https://github.com/ceph/ceph/pull/34127> -- J. Eric Ivancich he / him / his Red Hat Storage Ann Arbor, Michigan, US

4 years, 1 month

1
0
0 0

radosgw-admin error: "could not fetch user info: no user info saved"

by Mathew Snyder

I'm running into a problem that I've found around the Internet, but for which I'm unable to find a solution: $ sudo radosgw-admin user info could not fetch user info: no user info saved I've seen others report this issue, but the troubleshooting has either failed to pinpoint the issue, or has gone from "I have a problem" to "I solved it" with no information as to how it was solved. Unfortunately, due to the nature of our network, I'm unable to provide debug information en masse. I can try to provide individual lines of output if someone knows what I should look for. I am able to run the user creation command: $ sudo radosgw-admin user create --uid="testuser2" --display-name="testuser2" { "user_id": "testuser2", "display_name": "testuser2", "email": "", "suspended": 0, "max_buckets": 1000, ...<snip>.... } After doing this, I'm still unable to see the user information instead getting the same "no user info saved" output. That said, I know the user is added: $ sudo radosgw-admin metadata list user [ "devtest1", "testuser", "testuser2", ...<snip>... ] I'm pretty new to this system and Ceph in general. I looked at /etc/ceph/ceph.conf but I don't see anything that would have anything to do with users. I also don't have an account on the Ansible server that controls the configurations so I can't see if anything is defined there. I know this isn't particularly detailed in its description of the issue, but I can't find any other useful help via search. Thank you, Mathew Sent with [ProtonMail](https://protonmail.com) Secure Email.

4 years, 1 month

2
3
0 0

2024

2023

2022

2021

2020

2019

ceph-users April 2020