March 2021 - ceph-users - lists.ceph.io

Re: How to reset and configure replication on multiple RGW servers from scratch?

by Scheurer François

Dear All We have the same question here, if anyone can help ... Thank you! We did not find any documentation about the steps to reset & restart the sync. Especially the implications of 'bilog trim', 'mdlog trim' and 'datalog trim'. Our secondary zone is read-only. Both master and secondary zone on Nautilus (master 14.2.9 and secondary 14.2.12). Can someone also clarify following points? Many thanks in advance! 1) Is it safe to use these 3 commands (bilog trim, mdlog trim, datalog trim) on the master ? Are the bi logs exclusively used for the sync or are they needed even without multi-site? (mdlog/datalog are obviously only for multi-site) 2) Can we run these 3 commands during the sync or do we need first to stop all instances on the secondary zone ? In the latter case, do we need to stop the client traffic and wait on md/data sync to catch up prior to stop the secondary zone instances? 3) Can we then restart the instances on the secondary zone and expect rgw sync to run correctly ? Or do we need first to run 'metadata sync init' and 'data sync init' on the secondary zone ? (to trigger a full sync) Or is it necessary to delete all rgw pools on the secondary zone ? 4) And regarding the full sync, is it verifying the full object data, or only object size and mtime? If we update the secondary zone to Nautilus 14.2.18 and enable rgw_sync_obj_etag_verify, does a full sync will also detect ETag mismatches on objects that are already present on the secondary zone? Cheers Francois ________________________________ From: ceph-users on behalf of Osiński Piotr <Piotr.Osinski(a)grupawp.pl> Sent: Saturday, June 22, 2019 11:44 AM To: ceph-users(a)lists.ceph.com Subject: [ceph-users] How to reset and configure replication on multiple RGW servers from scratch? Hi, For testing purposes, I configured RGW multisite synchronization between two ceph mimic 13.2.6 clusters (I also tried: 13.2.5). Now I want to reset all current settings and configure replication from scratch. Data(pools, buckets) on the master zone will not be deleted. What has been done: 1) Deleted the secondary zone # radosgw-admin zone delete --rgw-zone=dc2_zone 2) Removed the secondary zone from zonegroup # radosgw-admin zonegroup remove --rgw-zonegroup=master_zonegroup --rgw-zone=dc2_zone 3) Commited changes # radosgw-admin period update --commit 4) Trimmed all datalogs on master zone # radosgw-admin datalog trim --start-date="2019-06-12 12:01:54" --end-date="2019-06-22 12:01:56" 5) Trimmed all error sync on master zone # radosgw-admin sync error trim --start-date="2019-06-07 07:19:26" --end-date="2019-06-22 15:59:00" 6) Deleted and recreated empty pools on secondary cluster: dc2_zone.rgw.control dc2_zone.rgw.meta dc2_zone.rgw.log dc2_zone.rgw.buckets.index dc2_zone.rgw.buckets.data Should I clear any other data / metadata in the master zone? Can data be kept somewhere in the master zone that may affect the new replication statement? I'm trying to track down a problem with blocked shards synchronization. Thank you in advance for your help. Best regards, Piotr Osiński << ATT00001.txt (0.4KB) (0.4KB) >>

3 years, 1 month

1
0
0 0

New Issue - Mapping Block Devices

by duluxoz

Hi All, I've got a new issue (hopefully this one will be the last). I have a working Ceph (Octopus) cluster with a replicated pool (my-pool), an erasure-coded pool (my-pool-data), and an image (my-image) created - all *seems* to be working correctly. I also have the correct Keyring specified (ceph.client.my-id.keyring). ceph -s is reporting all healthy. The ec profile (my-ec-profile) was created with: ceph osd erasure-code-profile set my-ec-profile k=4 m=2 crush-failure-domain=host The replicated pool was created with: ceph osd pool create my-pool 100 100 replicated Followed by: rbd pool init my-pool The ec pool was created with: ceph osd pool create my-pool-data 100 100 erasure my-ec-profile --autoscale-mode=on Followed by: rbd pool init my-pool-data The image was created with: rbd create -s 1T --data-pool my-pool-data my-pool/my-image The Keyring was created with: ceph auth get-or-create client.my-id mon 'profile rbd' osd 'profile rbd pool=my-pool' mgr 'profile rbd pool=my-pool' -o /etc/ceph/ceph.client.my-id.keyring On a centos8 client machine I have installed ceph-common, placed the Keyring file into /etc/ceph/, and run the command: rbd device map my-pool/my-image --id my-id All *seems* AOK. However - and here's my issue - when I try to create a partition on /dev/rbd0 and/or try to mount it, the client reports: fdisk: cannot open /dev/rbd0: Input/output error OR mount: /my-rbd-bloc-device: special device /dev/rbd0 does not exist (respectively). What am I doing wrong? Thanks in advance for the help Matthew J -- Peregrine IT Signature *Matthew J BLACK* M.Inf.Tech.(Data Comms) MBA B.Sc. MACS (Snr), CP, IP3P When you want it done /right/ ‒ the first time! Phone: +61 4 0411 0089 Email: matthew(a)peregrineit.net <mailto:matthew@peregrineit.net> Web: www.peregrineit.net <http://www.peregrineit.net> View Matthew J BLACK's profile on LinkedIn <http://au.linkedin.com/in/mjblack> This Email is intended only for the addressee. Its use is limited to that intended by the author at the time and it is not to be distributed without the author’s consent. You must not use or disclose the contents of this Email, or add the sender’s Email address to any database, list or mailing list unless you are expressly authorised to do so. Unless otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the contents of this Email except where subsequently confirmed in writing. The opinions expressed in this Email are those of the author and do not necessarily represent the views of Peregrine I.T. Pty Ltd. This Email is confidential and may be subject to a claim of legal privilege. If you have received this Email in error, please notify the author and delete this message immediately.

3 years, 1 month

4
4
0 0

How to know which client hold the lock of a file

by Norman.Kern

Hi, Anyone knows how to know which client hold lock of a file in Ceph fs? I met a dead lock problem that a client holding on get the lock, but I don't kown which client held it.

3 years, 1 month

2
1
0 0

Re: Multisite RGW - Large omap objects related with bilogs

by Scheurer François

Dear All We have the same question here, if anyone can help ... Thank you! Cheers Francois ________________________________ From: ceph-users on behalf of P. O. <posdub(a)gmail.com> Sent: Friday, August 9, 2019 11:05 AM To: ceph-users(a)lists.ceph.com Subject: [ceph-users] Multisite RGW - Large omap objects related with bilogs Hi all, I have two ceph clusters in RGW multisite environment, with ~1500 bucketes ( 500M objects, 70TB ). Some of the buckets are very dynamic (objects are constantly changing). I have problems with large omap objects in bucket indexes, related with "dynamic buckets". For example: [root@rgw ~]# radosgw-admin bucket stats --bucket bucket_s3d33 |grep num_objects "num_objects": 564 In /var/log/ceph/ceph.log: cluster [WRN] Large omap object found. Object: 10:297646ca:::.dir.86a05ec8-9982-429b-9f94-28363610a95c.12546d0.17892:head Key count: 5307523 Size (bytes): 748792509 I found, this is because of bucket index logs: [root@rgw-1 ~]# rados -p default.rgw.buckets.index listomapkeys .dir.86a05ec8-9982-429b-9f94-28363610a95c.12546d0.17892 | wc -l 5307523 There are a lot of keys: �0_00013758656.71188336.4 �0_00013758657.71188337.5 �0_00013758658.71188338.4 �0_00013758659.71188339.5 �0_00013758660.71188342.4 �0_00013758661.71188343.5 �0_00013758662.71188344.4 [root@rgw-1 ~]# radosgw-admin bilog list --bucket bucket_s3d33 --max-entries 6000000 |grep op_id | wc -l 5307523 I have configured parameters in my ceph.conf: rgw sync log trim concurrent buckets = 32 rgw sync log trim max buckets = 64 rgw sync log trim interval = 1200 rgw sync log trim min cold buckets = 4 But from two weeks, the omap key count is still growing. How can I safely clean these bilogs (with no bucket damage and no replication damage)? I found two radosgw-admin parameters related with bilogs trimming: 1) radosgw-admin bilog trim --bucket=bucket_s3d33 --start-marker XXXX --end-marker YYYY I dont know what values should be in: --start-marker XXXX --end-marker YYYY. Is it safe to use "bilog trim" on bucket with replication in progress? If yes, should i run this on both sites? 2) radosgw-admin bilog autotrim Is this command safe? Can I use autotrim on selected bucket? Maybe there is some other way, to delete bilogs? Best regards, P.O. << ATT00001.txt (0.4KB) (0.4KB) >>

3 years, 1 month

1
0
0 0

DocuBetter Meeting -- APAC 25 Mar 2021 0100 UTC

by John Zachary Dover

There will be a DocuBetter meeting on Thursday, 25 Mar 2021 at 0100 UTC. We will discuss the Google Season of Docs proposal (the Comprehensive Contribution Guide), the rewriting of the cephadm documentation and the new sectin of the Teuthology Guide. DocuBetter Meeting -- APAC 25 Mar 2021 0100 UTC https://bluejeans.com/908675367 https://pad.ceph.com/p/Ceph_Documentation

3 years, 1 month

1
0
0 0

March 2021 Tech Talk and Code Walk-through

by Mike Perez

Hi everyone! I'm excited to announce two talks we have on the schedule for March 2021: Persistent Bucket Notifications By Yuval Lifshitz https://ceph.io/ceph-tech-talks/ The stream starts on March 25th at 17:00 UTC / 18:00 CET / 1:00 PM EST / 10:00 AM PST Persistent bucket notifications are going to be introduced in Ceph "Pacific." The idea behind them is to allow for reliable and asynchronous delivery of notifications from the RADOS gateway (RGW) to the endpoint configured at the topic. Regular notifications could also be considered reliable since the delivery to the endpoint is performed synchronously during the request. However, this reliability is only from the RGW perspective, meaning that the client will not get an ACK until an ACK is received from the endpoint but does not retry if the endpoint is down or disconnected. Also, note that, with regular notifications, if the endpoint sends back a NACK, the operation is still considered successful (since there is no way to rollback the RADOS operations that happened before the notification was tried). When the endpoint is down but failed to push, the notification is only timeout base; using regular notifications will slow down the operation of the RGW and may bring it to a complete halt. With persistent notifications, we allow the RGW to retry sending notifications even if the endpoint is down or a network disconnect with it during the operation (notifications are retried if not successfully delivered to the endpoint). Also, the operation is a-synchronous, so during the operation, the notifications are just pushed into a queue (see below), and the actual sending to the endpoint is happening a-synchronously. The queuing operation is done in 2 phases (reserve and commit or abort) to guarantee the queuing operation's atomicity with the other operations. In case you missed the last Tech Talk, see Sage Weil's presentation on What's new in Pacific: https://www.youtube.com/watch?v=PVtn53MbxTc -------------- Samuel Just will be giving a code walkthrough on RADOS Snapshots. https://tracker.ceph.com/projects/ceph/wiki/Code_Walkthroughs The stream starts on March 23rd at 17:00 UTC / 18:00 CET / 1:00 PM EST / 10:00 AM PST In case you missed it, watch Part 2 of LibRBD I/O Flow by Jason Dillaman: https://www.youtube.com/watch?v=nVjYVmqNClM All live streams will be recorded. -- Mike Perez (thingee)

3 years, 1 month

1
0
0 0

how to disable write-back mode in ceph octopus

by 无名万剑归宗

I tried cache tier in write-back mode in my cluster, but because my ssd drive is home used, can not satisfy the needs of IOPS. Now I want disable write-back mode , I founded office documents,but the doc was outdated https://docs.ceph.com/en/latest/rados/operations/cache-tiering/?highlight=c… ）. root@e9000-22:~# ceph osd tier cache-mode cache proxy > Invalid command: proxy not in writeback|readproxy|readonly|none > osd tier cache-mode <pool> writeback|readproxy|readonly|none > [--yes-i-really-mean-it] : specify the caching mode for cache tier <pool> > Error EINVAL: invalid command Anyone can tell my how to disable this mode.

3 years, 1 month

1
0
0 0

Question about migrating from iSCSI to RBD

by Justin Goetz

Hello! I was hoping to inquire if anyone here has attempted similar operations, and if they ran into any issues. To give a brief overview of my situation, I have a standard octopus cluster running 15.2.2, with ceph-iscsi installed via ansible. The original scope of a project we were working on changed, and we no longer need the iSCSI overhead added to the project (the machine using CEPH is Linux, so we would like to use native RBD block devices instead). Ideally we would create some new pools and migrate the data from the iSCSI pools over to the new pools, however, due to the massive amount of data (close to 200 TB), we lack the physical resources necessary to copy the files. Digging a bit on the backend of the pools utilized by ceph-iscsi, it appears that the iSCSI utility uses standard RBD images on the actual backend: ~]# rbd info iscsi/pool-name rbd image 'pool-name': size 200 TiB in 52428800 objects order 22 (4 MiB objects) snapshot_count: 0 id: 137b45a37ad84a block_name_prefix: rbd_data.137b45a37ad84a format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten op_features: flags: object map invalid, fast diff invalid create_timestamp: Thu Nov 12 16:14:31 2020 access_timestamp: Tue Mar 16 16:13:41 2021 modify_timestamp: Tue Mar 16 16:15:36 2021 And I can also see that, like a standard rbd image, our 1st iSCSI gateway currently holds the lock on the image: ]# rbd lock ls --pool iscsi pool-name There is 1 exclusive lock on this image. Locker ID Address client.3618592 auto 259361792 10.101.12.61:0/1613659642 Theoretically speaking, would I be able to simply stop & disable the tcmu-runner processes on all iSCSI gateways in our cluster, which would release the lock on the RBD image, then create another user with rwx permissions to the iscsi pool? Would this work, or am I missing something that would come back to bite me later on? Looking for any advice on this topic. Thanks in advance for reading! -- Justin Goetz Systems Engineer, TeraSwitch Inc. jgoetz(a)teraswitch.com 412-945-7045 (NOC) | 412-459-7945 (Direct)

3 years, 1 month

2
2
0 0

How to sizing nfs-ganesha.

by Quang Lê

Hi guys, I'm using manila-openstack to provide a filesystem service using backend CEPHFS. My design use nfs-ganesha as the gateway for the VM in openstack mount to CephFS. I am having problems with sizing the ganesha-servers. Can anyone suggest me *what are the hardware requirements of the ganesha server are* or *what parameters are needed to consider when sizing a ganesha server* ? My simple topology in link: *https://i.imgur.com/xrYqxAh.png <https://i.imgur.com/xrYqxAh.png>* Thank you guys.

3 years, 1 month

2
1
0 0

Incomplete pg , any chance to to make it survive or data loss :( ?

by Szabo, Istvan (Agoda)

Hi, What can I do with this pg to make it work? We lost and don't have the osds 61,122 but we have the 32,33,70. I've exported the pg chunk from them, but they are very small and when I imported back to another osd that osd never started again so I had to remove that chunk (44.1aas2, 44.1aas3) to be able to start the osd. [WRN] PG_AVAILABILITY: Reduced data availability: 1 pg incomplete pg 44.1aa is incomplete, acting [59,128,127,43] (reducing pool cephfs1-data01-pool min_size from 3 may help; search ceph.com/docs for 'incomplete') [WRN] PG_NOT_DEEP_SCRUBBED: 1 pgs not deep-scrubbed in time pg 44.1aa not deep-scrubbed since 2021-01-14T05:50:23.852626+0100 [WRN] PG_NOT_SCRUBBED: 1 pgs not scrubbed in time pg 44.1aa not scrubbed since 2021-01-14T05:50:23.852626+0100 [WRN] SLOW_OPS: 96 slow ops, oldest one blocked for 228287 sec, osd.59 has slow ops This is the pg query and pg map important parts: "probing_osds": [ "29(3)", "34(3)", "43(3)", "56(1)", "59(0)", "72(2)", "73(2)", "74(2)", "127(2)", "128(1)", "131(2)" ], "down_osds_we_would_probe": [ 32, 33, 61, 70, 122 ], "peering_blocked_by": [], "peering_blocked_by_detail": [ { "detail": "peering_blocked_by_history_les_bound" osdmap e5666778 pg 44.1aa (44.1aa) -> up [59,128,127,43] acting [59,128,127,43] Internal ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

3 years, 1 month

1
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2021