April 2021 - ceph-users - lists.ceph.io

by Gesiel Galvão Bernardes

Hi, I have 3 pools, where I use it exclusively for RBD images. 2 They are mirrored and one is an erasure code. It turns out that today I received the warning that a PG was inconsistent in the pool erasure, and then I ran "ceph pg repair <pg>". It turns out that after that the entire cluster became extremely slow, to the point that no VM works. This is the output of "ceph -s": # ceph -s cluster: id: 4ea72929-6f9e-453a-8cd5-bb0712f6b874 health: HEALTH_ERR 1 scrub errors Possible data damage: 1 pg inconsistent, 1 pg repair services: mon: 2 daemons, cmonitor quorum, cmonitor2 mgr: cmonitor (active), standbys: cmonitor2 osd: 87 osds: 87 up, 87 in tcmu-runner: 10 active daemons date: pools: 7 pools, 3072 pgs objects: 30.00 M objects, 113 TiB usage: 304 TiB used, 218 TiB / 523 TiB avail pgs: 3063 active + clean 8 active + clean + scrubbing + deep 1 active + clean + scrubbing + deep + inconsistent + repair io: client: 24 MiB / s rd, 23 MiB / s wr, 629 op / s rd, 519 op / s wr cache: 5.9 MiB / s flush, 35 MiB / s evict, 9 op / s promote Does anyone have any idea how to make it available again? Regards, Gesiel

3 years

1
1
0 0

Re: Getting `InvalidInput` when trying to create a notification topic with Kafka endpoint

by Szabo, Istvan (Agoda)

Hello, Thank you very much to pickup the question and sorry for the late response. Yes, we are sending in cleartext also using HTTPS, but how it should be send if not like this? Also connected to this issue a bit, when we subscribe a bucket to a topic with non-ACL kafka topic, any operations (PUT or DELETE) is simply blocking and not returning. Not even any error response. $ s3cmd -c ~/.s3cfg put --add-header x-amz-meta-foo:bar3 certificate.pdf s3://vig-test WARNING: certificate.pdf: Owner groupname not known. Storing GID=1354917867 instead. WARNING: Module python-magic is not available. Guessing MIME types based on file extensions. upload: 'certificate.pdf' -> 's3://vig-test/certificate.pdf' [1 of 1] 65536 of 91224 71% in 0s 291.17 KB/s Istvan Szabo Senior Infrastructure Engineer --------------------------------------------------- Agoda Services Co., Ltd. e: istvan.szabo(a)agoda.com<mailto:istvan.szabo@agoda.com> --------------------------------------------------- From: Yuval Lifshitz <ylifshit(a)redhat.com> Sent: Wednesday, April 21, 2021 10:34 PM To: Szabo, Istvan (Agoda) <Istvan.Szabo(a)agoda.com> Cc: ceph-users(a)ceph.io Subject: Re: [ceph-users] Getting `InvalidInput` when trying to create a notification topic with Kafka endpoint Hi Istvan, Can you please share the relevant part for the radosgw log, indicating which input was invalid? The only way I managed to reproduce that error is by sending the request to a non-HTTPS radosgw (which does not seem to be your case). In such a case it replies with "InvalidInput" because we are trying to send user/password in cleartext. I used curl, similarly to what you did against a vstart cluster based off of master: https://paste.sh/SQ_8IrB5#BxBYbh1kTh15n7OKvjB5wEOM Yuval On Wed, Apr 21, 2021 at 11:23 AM Szabo, Istvan (Agoda) <Istvan.Szabo(a)agoda.com<mailto:Istvan.Szabo@agoda.com>> wrote: Hi Ceph Users, Here is the latest request I tried but still not working curl -v -H 'Date: Tue, 20 Apr 2021 16:05:47 +0000' -H 'Authorization: AWS <accessid>:<signature>' -L -H 'content-type: application/x-www-form-urlencoded' -k -X POST https://servername -d Action=CreateTopic&Name=test-ceph-event-replication&Attributes.entry.8.key=push-endpoint&Attributes.entry.8.value=kafka://<username>:<password>@servername2:9093&Attributes.entry.5.key=use-ssl&Attributes.entry.5.value=true And the response I get is still Invalid Input <?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidInput</Code><RequestId>tx000000000000007993081-00607efbdd-1c7e96b-hkg</RequestId><HostId>1c7e96b-hkg-data</HostId></Error> Can someone please help with this? Istvan Szabo Senior Infrastructure Engineer --------------------------------------------------- Agoda Services Co., Ltd. e: istvan.szabo(a)agoda.com<mailto:istvan.szabo@agoda.com><mailto:istvan.szabo@agoda.com<mailto:istvan.szabo@agoda.com>> --------------------------------------------------- ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses. _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io<mailto:ceph-users@ceph.io> To unsubscribe send an email to ceph-users-leave(a)ceph.io<mailto:ceph-users-leave@ceph.io>

3 years

2
3
0 0

BlueFS.cc ceph_assert(bl.length() <= runway): protection against bluefs log file growth

by Konstantin Shalygin

Hi, Recently was added [1] protection against BlueFS log growth infinite, I get assert on 14.2.19: /build/ceph-14.2.19/src/os/bluestore/BlueFS.cc: 2404: FAILED ceph_assert(bl.length() <= runway) Then OSD dead. Tracker (may be already exists?), logs is interested for this case? [1] https://github.com/ceph/ceph/pull/37948 <https://github.com/ceph/ceph/pull/37948> Thanks, k

3 years

1
0
0 0

Cephadm multiple public networks

by Stanislav Datskevych

Hi all, I've installed latest Pacific version 16.2.1 using Cephadm. I try using multiple public networks with this setting: ceph config set mon public_network "100.90.1.0/24,100.90.2.0/24" The networks seem to be successfully passed to /etc/ceph/ceph.conf on the daemons, however I constantly see the following messages in the log: 4/26/21 11:40:05 PM[INF]Filtered out host mon-1: could not verify host allowed virtual ips 4/26/21 11:40:05 PM[INF]Filtered out host mon-2: could not verify host allowed virtual ips 4/26/21 11:40:05 PM[INF]Filtered out host mon-3: could not verify host allowed virtual ips As the result, ceph orch doesn't deploy any monitor. Eventually I was able to deploy monitors by first setting public_network to 100.90.1.0/24, then deploy monitors in this subnet, then setting public_network to 100.90.2.0/24, deploy the rest of monitors. Is this some kind of bug or I miss something? Thanks for the replies in advance.

3 years

4
5
0 0

Rbd map fails occasionally with module libceph: Relocation (type 6) overflow vs section 4

by huxiaoyu＠horebdata.cn

Dear Cephers, I encountered a strange issue when using rbd map (Luminous 12.2.13 version), rbd map not always fail, but occasionally, with the following dmesg [16818.700000] module libceph: Relocation (type 6) overflow vs section 4 [16857.460000] module libceph: Relocation (type 6) overflow vs section 4 [16891.850000] module libceph: Relocation (type 6) overflow vs section 4 what could be wrong? and how to fix it? Any help or suggestions would be highly appreciated, thanks a lot in advance, samuel huxiaoyu(a)horebdata.cn

3 years

1
0
0 0

How to set bluestore_rocksdb_options_annex

by ceph＠elchaka.de

Hello, I have an octopus cluster and want to change some values - but i cannot find any documentation on how to set values(multiple) with bluestore_rocksdb_options_annex Could someone give me some examples. I would like to do this like ceph config set ... Thanks in advice Mehmet

3 years

1
0
0 0

ceph-csi on openshift

by Bosteels Nino

Hi, I’m trying to get the ceph-csi working on openshift (I followed this guide: https://docs.ceph.com/en/latest/rbd/rbd-kubernetes/). On openshift it seems you can’t run privileged containers per default and can’t use HostPath etc. For these you need to create a security context constraint (a custom one). I’d like to enable the next person that searches for this, so I contacted red hat through our support plan, and they suggested: kind: SecurityContextConstraints apiVersion: v1 metadata: name: custom-scc allowPrivilegedContainer: true allowHostDirVolumePlugin: true allowHostIPC: false allowHostNetwork: false allowHostPID: false allowHostPorts: false allowPrivilegeEscalation: true allowedCapabilities: - KILL - NET_ADMIN - SYS_ADMIN - SYS_BOOT - SYS_TIME runAsUser: type: RunAsAny seLinuxContext: type: RunAsAny fsGroup: type: RunAsAny supplementalGroups: type: RunAsAny users: - <your-user-for-which-the-previleges-are-required> It didn’t work to create this after going through the guide, so I’ll run through it again, but wanted to ask if anyone else has already done this ánd also if someone could add it to the ceph wiki. Kr, Nino *************************************************************** Dit e-mail bericht inclusief eventuele ingesloten bestanden kan informatie bevatten die vertrouwelijk is en/of beschermd door intellectuele eigendomsrechten. Dit bericht is uitsluitend bestemd voor de geadresseerde(n). Elk gebruik van de informatie vervat in dit bericht (waaronder de volledige of gedeeltelijke reproductie of verspreiding onder elke vorm) door andere personen dan de geadresseerde(n) is verboden. Indien u dit bericht per vergissing heeft ontvangen, gelieve de afzender hiervan te verwittigen en dit bericht te verwijderen. This e-mail and any attachment thereto may contain information which is confidential and/or protected by intellectual property rights and are intended for the sole use of the addressees. Any use of the information contained herein (including but not limited to total or partial reproduction or distribution in any form) by other persons than the addressees is prohibited. If you have received this e-mail in error, please notify the sender and delete its contents. Ce courriel et les annexes eventuelles peuvent contenir des informations confidentielles et/ou protegees par des droits de propriete intellectuelle. Ce message est adresse exclusivement e son (ses) destinataire(s). Toute utilisation du contenu de ce message (y compris la reproduction ou diffusion partielle ou complete sous toute forme) par une autre personne que le(s) destinataire(s) est formellement interdite. Si vous avez recu ce message par erreur, veuillez prevenir l expediteur du message et en detruire le contenu. ***************************************************************e

3 years

2
3
0 0

Re: how to handle rgw leaked data (aka data that is not available via buckets but eats diskspace)

by Boris Behrens

Hi Anthony, yes we are using replication, the lost space is calculated before it's replicated. RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 1.1 PiB 191 TiB 968 TiB 968 TiB 83.55 TOTAL 1.1 PiB 191 TiB 968 TiB 968 TiB 83.55 POOLS: POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL rbd 0 64 0 B 0 0 B 0 13 TiB .rgw.root 1 64 99 KiB 119 99 KiB 0 13 TiB eu-central-1.rgw.control 2 64 0 B 8 0 B 0 13 TiB eu-central-1.rgw.data.root 3 64 947 KiB 2.82k 947 KiB 0 13 TiB eu-central-1.rgw.gc 4 64 101 MiB 128 101 MiB 0 13 TiB eu-central-1.rgw.log 5 64 267 MiB 500 267 MiB 0 13 TiB eu-central-1.rgw.users.uid 6 64 2.9 MiB 6.91k 2.9 MiB 0 13 TiB eu-central-1.rgw.users.keys 7 64 263 KiB 6.73k 263 KiB 0 13 TiB eu-central-1.rgw.meta 8 64 384 KiB 1k 384 KiB 0 13 TiB eu-central-1.rgw.users.email 9 64 40 B 1 40 B 0 13 TiB eu-central-1.rgw.buckets.index 10 64 10 GiB 67.28k 10 GiB 0.03 13 TiB eu-central-1.rgw.buckets.data 11 2048 313 TiB 151.71M 313 TiB 89.25 13 TiB ... EC profile is pretty standard [root@s3db16 ~]# ceph osd erasure-code-profile ls default [root@s3db16 ~]# ceph osd erasure-code-profile get default k=2 m=1 plugin=jerasure technique=reed_sol_van We use mainly ceph 14.2.18. There is an OSD host with 14.2.19 and one with 14.2.20 Object populations is mixed, but the most amount of data is in huge files. We store our platforms RBD snapshots in it. Cheers Boris Am Di., 27. Apr. 2021 um 06:49 Uhr schrieb Anthony D'Atri < anthony.datri(a)gmail.com>: > Are you using Replication? EC? How many copies / which profile? > On which Ceph release were your OSDs built? BlueStore? Filestore? > What is your RGW object population like? Lots of small objects? Mostly > large objects? Average / median object size? > > > On Apr 26, 2021, at 9:32 PM, Boris Behrens <bb(a)kervyn.de> wrote: > > > > HI, > > > > we still have the problem that our rgw eats more diskspace than it > should. > > Summing up the "size_kb_actual" of all buckets show only half of the used > > diskspace. > > > > There are 312TiB stored acording to "ceph df" but we only need around > 158TB. > > > > I've already wrote to this ML with the problem, but there were no > solutions > > that would help. > > I've doug through the ML archive and found some interesting threads > > regarding orphan objects and these kind of issues. > > > > Did someone ever solved this problem? > > Or do you just add more disk space. > > > > we tried to: > > * use the "radosgw-admin orphan find/finish" tool (didn't work) > > * manually triggering the GC (didn't work) > > > > currently running (since yesterday evening): > > * rgw-orphan-list, which procused 270GB of text output, and it's not done > > yet (I have 60GB diskspace left) > > > > -- > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > > groÃƒ¼en Saal. > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groÃƒ¼en Saal.

3 years

1
1
0 0

Re: RBD tuning for virtualization (all flash)

by by morphin

2x10G for cluster + Public 2x10G for Users lacp = 802.3ad Smart Weblications GmbH <info(a)smart-weblications.de>, 26 Nis 2021 Pzt, 17:25 tarihinde şunu yazdı: > > Hi, > > > Am 25.04.2021 um 03:58 schrieb by morphin: > > Hello. > > > > We're running 1000vm on 28 node with 6 ssd (no seperate db device) and > > these vms are Mostly win10. > > > > 2 lvm osd Per 4tb device total 288osd and one RBD pool with 8192 PG. > > Replication 3. > > > > Ceph version : nautilus 14.2.16 > > > > I'm looking for all flash RBD tuning. > > This is good test env and tomorrow gonna be prod. In vms I see 1800mb/s > > read and 900mb/s write (qemu = write back) > > With good tuning I believe this cluster will go further. > > Do you have any special suggestion? > > > > What is the network bandwidth? > > > -- > > Mit freundlichen Grüßen, > > > Smart Weblications GmbH > Martinsberger Str. 1 > D-95119 Naila > > fon.: +49 9282 9638 200 > fax.: +49 9282 9638 205 > 24/7: +49 900 144 000 00 - 0,99 EUR/Min* > http://www.smart-weblications.de > > -- > Sitz der Gesellschaft: Naila > Geschäftsführer: Florian Wiessner > HRB-Nr.: HRB 3840 Amtsgericht Hof > *aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz

3 years

1
0
0 0

[ CEPH ANSIBLE FAILOVER TESTING ] Ceph Native Driver issue

by Lokendra Rathour

Hi Team, We have setup two Node Ceph Cluster using *Native Cephfs Driver* with *Details as:* - 3 Node / 2 Node MDS Cluster - 3 Node Monitor Quorum - 2 Node OSD - 2 Nodes for Manager Cephnode3 have only Mon and MDS (only for test case 4-7) rest two nodes i.e. cephnode1 and cephnode2 have (mgr,mds,mon,rgw) We have tested following failover scenarios for Native Cephfs Driver by mounting for any one sub-volume on a VM or client with continuous I/O operations(Directory creation after every 1 Second)*:* [image: image.png] In the table above we have few queries as: - Refer test case 2 and test case 7, both are similar test case with only difference in number of Ceph MDS with time for both the test cases is different. It should be zero. But time is coming as 17 seconds for testcase 7. - Is there any configurable parameter/any configuration which we need to make in the Ceph cluster to get the failover time reduced to few seconds? In current default deployment we are getting something around 35-40 seconds. Best Regards, -- ~ Lokendra www.inertiaspeaks.com www.inertiagroups.com skype: lokendrarathour

3 years

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users April 2021