Dear All
We have the same question here, if anyone can help ... Thank you!
We did not find any documentation about the steps to reset & restart the sync.
Especially the implications of 'bilog trim', 'mdlog trim' and 'datalog trim'.
Our secondary zone is read-only. Both master and secondary zone on Nautilus (master 14.2.9 and secondary 14.2.12).
Can someone also clarify following points? Many thanks in advance!
1) Is it safe to use these 3 commands (bilog trim, mdlog trim, datalog trim) on the master ?
Are the bi logs exclusively used for the sync or are they needed even without multi-site? (mdlog/datalog are obviously only for multi-site)
2) Can we run these 3 commands during the sync or do we need first to stop all instances on the secondary zone ?
In the latter case, do we need to stop the client traffic and wait on md/data sync to catch up prior to stop the secondary zone instances?
3) Can we then restart the instances on the secondary zone and expect rgw sync to run correctly ?
Or do we need first to run 'metadata sync init' and 'data sync init' on the secondary zone ? (to trigger a full sync)
Or is it necessary to delete all rgw pools on the secondary zone ?
4) And regarding the full sync, is it verifying the full object data, or only object size and mtime?
If we update the secondary zone to Nautilus 14.2.18 and enable rgw_sync_obj_etag_verify,
does a full sync will also detect ETag mismatches on objects that are already present on the secondary zone?
Cheers
Francois
________________________________
From: ceph-users on behalf of Osiński Piotr <Piotr.Osinski(a)grupawp.pl>
Sent: Saturday, June 22, 2019 11:44 AM
To: ceph-users(a)lists.ceph.com
Subject: [ceph-users] How to reset and configure replication on multiple RGW servers from scratch?
Hi,
For testing purposes, I configured RGW multisite synchronization between two ceph mimic 13.2.6 clusters (I also tried: 13.2.5).
Now I want to reset all current settings and configure replication from scratch.
Data(pools, buckets) on the master zone will not be deleted.
What has been done:
1) Deleted the secondary zone
# radosgw-admin zone delete --rgw-zone=dc2_zone
2) Removed the secondary zone from zonegroup
# radosgw-admin zonegroup remove --rgw-zonegroup=master_zonegroup --rgw-zone=dc2_zone
3) Commited changes
# radosgw-admin period update --commit
4) Trimmed all datalogs on master zone
# radosgw-admin datalog trim --start-date="2019-06-12 12:01:54" --end-date="2019-06-22 12:01:56"
5) Trimmed all error sync on master zone
# radosgw-admin sync error trim --start-date="2019-06-07 07:19:26" --end-date="2019-06-22 15:59:00"
6) Deleted and recreated empty pools on secondary cluster:
dc2_zone.rgw.control
dc2_zone.rgw.meta
dc2_zone.rgw.log
dc2_zone.rgw.buckets.index
dc2_zone.rgw.buckets.data
Should I clear any other data / metadata in the master zone?
Can data be kept somewhere in the master zone that may affect the new replication statement?
I'm trying to track down a problem with blocked shards synchronization.
Thank you in advance for your help.
Best regards,
Piotr Osiński
<< ATT00001.txt (0.4KB) (0.4KB) >>
Hi All,
I've got a new issue (hopefully this one will be the last).
I have a working Ceph (Octopus) cluster with a replicated pool
(my-pool), an erasure-coded pool (my-pool-data), and an image (my-image)
created - all *seems* to be working correctly. I also have the correct
Keyring specified (ceph.client.my-id.keyring).
ceph -s is reporting all healthy.
The ec profile (my-ec-profile) was created with: ceph osd
erasure-code-profile set my-ec-profile k=4 m=2 crush-failure-domain=host
The replicated pool was created with: ceph osd pool create my-pool 100
100 replicated
Followed by: rbd pool init my-pool
The ec pool was created with: ceph osd pool create my-pool-data 100 100
erasure my-ec-profile --autoscale-mode=on
Followed by: rbd pool init my-pool-data
The image was created with: rbd create -s 1T --data-pool my-pool-data
my-pool/my-image
The Keyring was created with: ceph auth get-or-create client.my-id mon
'profile rbd' osd 'profile rbd pool=my-pool' mgr 'profile rbd
pool=my-pool' -o /etc/ceph/ceph.client.my-id.keyring
On a centos8 client machine I have installed ceph-common, placed the
Keyring file into /etc/ceph/, and run the command: rbd device map
my-pool/my-image --id my-id
All *seems* AOK.
However - and here's my issue - when I try to create a partition on
/dev/rbd0 and/or try to mount it, the client reports: fdisk: cannot open
/dev/rbd0: Input/output error OR mount: /my-rbd-bloc-device: special
device /dev/rbd0 does not exist (respectively).
What am I doing wrong?
Thanks in advance for the help
Matthew J
--
Peregrine IT Signature
*Matthew J BLACK*
M.Inf.Tech.(Data Comms)
MBA
B.Sc.
MACS (Snr), CP, IP3P
When you want it done /right/ ‒ the first time!
Phone: +61 4 0411 0089
Email: matthew(a)peregrineit.net <mailto:matthew@peregrineit.net>
Web: www.peregrineit.net <http://www.peregrineit.net>
View Matthew J BLACK's profile on LinkedIn
<http://au.linkedin.com/in/mjblack>
This Email is intended only for the addressee. Its use is limited to
that intended by the author at the time and it is not to be distributed
without the author’s consent. You must not use or disclose the contents
of this Email, or add the sender’s Email address to any database, list
or mailing list unless you are expressly authorised to do so. Unless
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the
contents of this Email except where subsequently confirmed in
writing. The opinions expressed in this Email are those of the author
and do not necessarily represent the views of Peregrine I.T. Pty
Ltd. This Email is confidential and may be subject to a claim of legal
privilege.
If you have received this Email in error, please notify the author and
delete this message immediately.
Hi,
Anyone knows how to know which client hold lock of a file in Ceph fs?
I met a dead lock problem that a client holding on get the lock, but I don't kown which client held it.
Dear All
We have the same question here, if anyone can help ... Thank you!
Cheers
Francois
________________________________
From: ceph-users on behalf of P. O. <posdub(a)gmail.com>
Sent: Friday, August 9, 2019 11:05 AM
To: ceph-users(a)lists.ceph.com
Subject: [ceph-users] Multisite RGW - Large omap objects related with bilogs
Hi all,
I have two ceph clusters in RGW multisite environment, with ~1500 bucketes ( 500M objects, 70TB ).
Some of the buckets are very dynamic (objects are constantly changing).
I have problems with large omap objects in bucket indexes, related with "dynamic buckets".
For example:
[root@rgw ~]# radosgw-admin bucket stats --bucket bucket_s3d33 |grep num_objects
"num_objects": 564
In /var/log/ceph/ceph.log:
cluster [WRN] Large omap object found. Object: 10:297646ca:::.dir.86a05ec8-9982-429b-9f94-28363610a95c.12546d0.17892:head Key count: 5307523 Size (bytes): 748792509
I found, this is because of bucket index logs:
[root@rgw-1 ~]# rados -p default.rgw.buckets.index listomapkeys .dir.86a05ec8-9982-429b-9f94-28363610a95c.12546d0.17892 | wc -l
5307523
There are a lot of keys:
�0_00013758656.71188336.4
�0_00013758657.71188337.5
�0_00013758658.71188338.4
�0_00013758659.71188339.5
�0_00013758660.71188342.4
�0_00013758661.71188343.5
�0_00013758662.71188344.4
[root@rgw-1 ~]# radosgw-admin bilog list --bucket bucket_s3d33 --max-entries 6000000 |grep op_id | wc -l
5307523
I have configured parameters in my ceph.conf:
rgw sync log trim concurrent buckets = 32
rgw sync log trim max buckets = 64
rgw sync log trim interval = 1200
rgw sync log trim min cold buckets = 4
But from two weeks, the omap key count is still growing.
How can I safely clean these bilogs (with no bucket damage and no replication damage)?
I found two radosgw-admin parameters related with bilogs trimming:
1) radosgw-admin bilog trim --bucket=bucket_s3d33 --start-marker XXXX --end-marker YYYY
I dont know what values should be in: --start-marker XXXX --end-marker YYYY.
Is it safe to use "bilog trim" on bucket with replication in progress? If yes, should i run this on both sites?
2) radosgw-admin bilog autotrim
Is this command safe? Can I use autotrim on selected bucket?
Maybe there is some other way, to delete bilogs?
Best regards,
P.O.
<< ATT00001.txt (0.4KB) (0.4KB) >>
There will be a DocuBetter meeting on Thursday, 25 Mar 2021 at 0100 UTC.
We will discuss the Google Season of Docs proposal (the Comprehensive
Contribution Guide), the rewriting of the cephadm documentation and the new
sectin of the Teuthology Guide.
DocuBetter Meeting -- APAC
25 Mar 2021
0100 UTC
https://bluejeans.com/908675367https://pad.ceph.com/p/Ceph_Documentation
Hi everyone!
I'm excited to announce two talks we have on the schedule for March 2021:
Persistent Bucket Notifications By Yuval Lifshitz
https://ceph.io/ceph-tech-talks/
The stream starts on March 25th at 17:00 UTC / 18:00 CET / 1:00 PM
EST / 10:00 AM PST
Persistent bucket notifications are going to be introduced in Ceph
"Pacific." The idea behind them is to allow for reliable and
asynchronous delivery of notifications from the RADOS gateway (RGW) to
the endpoint configured at the topic. Regular notifications could also
be considered reliable since the delivery to the endpoint is performed
synchronously during the request.
However, this reliability is only from the RGW perspective, meaning
that the client will not get an ACK until an ACK is received from the
endpoint but does not retry if the endpoint is down or disconnected.
Also, note that, with regular notifications, if the endpoint sends
back a NACK, the operation is still considered successful (since there
is no way to rollback the RADOS operations that happened before the
notification was tried).
When the endpoint is down but failed to push, the notification is only
timeout base; using regular notifications will slow down the operation
of the RGW and may bring it to a complete halt.
With persistent notifications, we allow the RGW to retry sending
notifications even if the endpoint is down or a network disconnect
with it during the operation (notifications are retried if not
successfully delivered to the endpoint). Also, the operation is
a-synchronous, so during the operation, the notifications are just
pushed into a queue (see below), and the actual sending to the
endpoint is happening a-synchronously. The queuing operation is done
in 2 phases (reserve and commit or abort) to guarantee the queuing
operation's atomicity with the other operations.
In case you missed the last Tech Talk, see Sage Weil's presentation on
What's new in Pacific:
https://www.youtube.com/watch?v=PVtn53MbxTc
--------------
Samuel Just will be giving a code walkthrough on RADOS Snapshots.
https://tracker.ceph.com/projects/ceph/wiki/Code_Walkthroughs
The stream starts on March 23rd at 17:00 UTC / 18:00 CET / 1:00 PM
EST / 10:00 AM PST
In case you missed it, watch Part 2 of LibRBD I/O Flow by Jason Dillaman:
https://www.youtube.com/watch?v=nVjYVmqNClM
All live streams will be recorded.
--
Mike Perez (thingee)
I tried cache tier in write-back mode in my cluster, but because my ssd
drive is home used, can not satisfy the needs of IOPS. Now I want disable
write-back mode , I founded office documents,but the doc was outdated
https://docs.ceph.com/en/latest/rados/operations/cache-tiering/?highlight=c…
).
root@e9000-22:~# ceph osd tier cache-mode cache proxy
> Invalid command: proxy not in writeback|readproxy|readonly|none
> osd tier cache-mode <pool> writeback|readproxy|readonly|none
> [--yes-i-really-mean-it] : specify the caching mode for cache tier <pool>
> Error EINVAL: invalid command
Anyone can tell my how to disable this mode.
Hello!
I was hoping to inquire if anyone here has attempted similar operations,
and if they ran into any issues. To give a brief overview of my
situation, I have a standard octopus cluster running 15.2.2, with
ceph-iscsi installed via ansible. The original scope of a project we
were working on changed, and we no longer need the iSCSI overhead added
to the project (the machine using CEPH is Linux, so we would like to use
native RBD block devices instead).
Ideally we would create some new pools and migrate the data from the
iSCSI pools over to the new pools, however, due to the massive amount of
data (close to 200 TB), we lack the physical resources necessary to copy
the files.
Digging a bit on the backend of the pools utilized by ceph-iscsi, it
appears that the iSCSI utility uses standard RBD images on the actual
backend:
~]# rbd info iscsi/pool-name
rbd image 'pool-name':
size 200 TiB in 52428800 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 137b45a37ad84a
block_name_prefix: rbd_data.137b45a37ad84a
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
op_features:
flags: object map invalid, fast diff invalid
create_timestamp: Thu Nov 12 16:14:31 2020
access_timestamp: Tue Mar 16 16:13:41 2021
modify_timestamp: Tue Mar 16 16:15:36 2021
And I can also see that, like a standard rbd image, our 1st iSCSI
gateway currently holds the lock on the image:
]# rbd lock ls --pool iscsi pool-name
There is 1 exclusive lock on this image.
Locker ID Address
client.3618592 auto 259361792 10.101.12.61:0/1613659642
Theoretically speaking, would I be able to simply stop & disable the
tcmu-runner processes on all iSCSI gateways in our cluster, which would
release the lock on the RBD image, then create another user with rwx
permissions to the iscsi pool? Would this work, or am I missing
something that would come back to bite me later on?
Looking for any advice on this topic. Thanks in advance for reading!
--
Justin Goetz
Systems Engineer, TeraSwitch Inc.
jgoetz(a)teraswitch.com
412-945-7045 (NOC) | 412-459-7945 (Direct)
Hi guys,
I'm using manila-openstack to provide a filesystem service using
backend CEPHFS. My design use nfs-ganesha as the gateway for the VM in
openstack mount to CephFS. I am having problems with sizing the
ganesha-servers.
Can anyone suggest me *what are the hardware requirements of the ganesha
server are* or *what parameters are needed to consider when sizing a
ganesha server* ?
My simple topology in link: *https://i.imgur.com/xrYqxAh.png
<https://i.imgur.com/xrYqxAh.png>*
Thank you guys.
Hi,
What can I do with this pg to make it work?
We lost and don't have the osds 61,122 but we have the 32,33,70. I've exported the pg chunk from them, but they are very small and when I imported back to another osd that osd never started again so I had to remove that chunk (44.1aas2, 44.1aas3) to be able to start the osd.
[WRN] PG_AVAILABILITY: Reduced data availability: 1 pg incomplete
pg 44.1aa is incomplete, acting [59,128,127,43] (reducing pool cephfs1-data01-pool min_size from 3 may help; search ceph.com/docs for 'incomplete')
[WRN] PG_NOT_DEEP_SCRUBBED: 1 pgs not deep-scrubbed in time
pg 44.1aa not deep-scrubbed since 2021-01-14T05:50:23.852626+0100
[WRN] PG_NOT_SCRUBBED: 1 pgs not scrubbed in time
pg 44.1aa not scrubbed since 2021-01-14T05:50:23.852626+0100
[WRN] SLOW_OPS: 96 slow ops, oldest one blocked for 228287 sec, osd.59 has slow ops
This is the pg query and pg map important parts:
"probing_osds": [
"29(3)",
"34(3)",
"43(3)",
"56(1)",
"59(0)",
"72(2)",
"73(2)",
"74(2)",
"127(2)",
"128(1)",
"131(2)"
],
"down_osds_we_would_probe": [
32,
33,
61,
70,
122
],
"peering_blocked_by": [],
"peering_blocked_by_detail": [
{
"detail": "peering_blocked_by_history_les_bound"
osdmap e5666778 pg 44.1aa (44.1aa) -> up [59,128,127,43] acting [59,128,127,43]
Internal
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.