Hi Sebastian!!
(solution below)
This is weird, because we had previously tested the ceph-volume
refactor and it looked ok.
Anyway, here is the inventory output: https://pastebin.com/ADFeuNZi
And the ceph-volume log is here: https://termbin.com/i8mk
I couldn't digest why it was rejected.
I believe I'm using ceph-volume as intended...
https://docs.ceph.com/en/latest/ceph-volume/lvm/batch/#idempotency-and-disk…
Wait -- I solved my problem.
The OSDs were originally created like this:
ceph-volume lvm batch /dev/sd[g-z] /dev/sda[a-d] --db-devices /dev/sd[c-f]
Now in order to recreate the osd on sdg, I had tried: ceph-volume lvm
batch /dev/sdg --db-devices /dev/sdf --osd-id 1
That doesn't work.
But, if I use all devices again, it works!
# ceph-volume lvm batch /dev/sd[g-z] /dev/sda[a-d] --db-devices
/dev/sd[c-f] --osd-ids 1
--> passed data devices: 24 physical, 0 LVM
--> relative data size: 1.0
--> passed block_db devices: 4 physical, 0 LVM
Total OSDs: 1
Type Path
LV Size % of device
----------------------------------------------------------------------------------------------------
OSD id 1
data /dev/sdg
5.46 TB 100.00%
block_db /dev/sdf
37.26 GB 16.67%
--> The above OSDs would be created if the operation continues
--> do you want to proceed? (yes/no)
There are several interesting behaviours -- if I pass fewer db-devices
or HDDs it doesn't work as expected.
So lesson learned, in recent ceph releases one must use ceph-volume
batch exactly as it was used originally.
Best Regards,
Dan
On Tue, Apr 27, 2021 at 2:16 PM Sebastien Han <shan(a)redhat.com> wrote:
>
> Hi Dan,
>
> I believe either the ceph-volume logs or the "ceph-volume inventory
> /dev/sdf" command should give you the reason why the device was
> rejected.
> If not legit that's probably a bug...
>
> Thanks!
> –––––––––
> Sébastien Han
> Senior Principal Software Engineer, Storage Architect
>
> "Always give 100%. Unless you're giving blood."
>
> On Tue, Apr 27, 2021 at 2:02 PM Dan van der Ster <dan(a)vanderster.com> wrote:
> >
> > Hi all,
> >
> > In 14.2.20, when re-creating a mixed OSD after device replacement,
> > ceph-volume batch is no longer able to find any available space for a
> > block_db.
> >
> > Below I have shown a zap [1] which frees up the HDD and one LV on the
> > block-dbs VG.
> > But then we try to recreate, and none of the block-dbs are available
> > [2], even though there is free space on the VG:
> >
> > VG #PV #LV #SN Attr VSize VFree
> > ceph-8dfd7f83-b60c-485b-9517-12203301a914 1 5 0 wz--n- <223.57g 37.26g
> >
> > This bug looks similar: https://tracker.ceph.com/issues/49096
> >
> > Is there something wrong with my procedure? Or does someone have an
> > idea how to make this work again?
> >
> > Best Regards,
> >
> > Dan
> >
> >
> > [1]
> >
> > # systemctl stop ceph-osd@1
> > # ceph osd out 1
> > marked out osd.1.
> > # ceph-volume lvm zap --osd-id=1 --destroy
> > --> Zapping: /dev/ceph-15daeeaa-b6d9-46a6-b955-fd7197341334/osd-block-ef46b9bf-c85f-49d8-9db3-9ed164f5cc61
> > --> Unmounting /var/lib/ceph/osd/ceph-1
> > Running command: /usr/bin/umount -v /var/lib/ceph/osd/ceph-1
> > stderr: umount: /var/lib/ceph/osd/ceph-1 unmounted
> > Running command: /usr/bin/dd if=/dev/zero
> > of=/dev/ceph-15daeeaa-b6d9-46a6-b955-fd7197341334/osd-block-ef46b9bf-c85f-49d8-9db3-9ed164f5cc61
> > bs=1M count=10 conv=fsync
> > stderr: 10+0 records in
> > 10+0 records out
> > stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0849557 s, 123 MB/s
> > --> Only 1 LV left in VG, will proceed to destroy volume group
> > ceph-15daeeaa-b6d9-46a6-b955-fd7197341334
> > Running command: /usr/sbin/vgremove -v -f
> > ceph-15daeeaa-b6d9-46a6-b955-fd7197341334
> > stderr: Removing
> > ceph--15daeeaa--b6d9--46a6--b955--fd7197341334-osd--block--ef46b9bf--c85f--49d8--9db3--9ed164f5cc61
> > (253:0)
> > stderr: Archiving volume group
> > "ceph-15daeeaa-b6d9-46a6-b955-fd7197341334" metadata (seqno 5).
> > stderr: Releasing logical volume
> > "osd-block-ef46b9bf-c85f-49d8-9db3-9ed164f5cc61"
> > stderr: Creating volume group backup
> > "/etc/lvm/backup/ceph-15daeeaa-b6d9-46a6-b955-fd7197341334" (seqno 6).
> > stdout: Logical volume
> > "osd-block-ef46b9bf-c85f-49d8-9db3-9ed164f5cc61" successfully removed
> > stderr: Removing physical volume "/dev/sdg" from volume group
> > "ceph-15daeeaa-b6d9-46a6-b955-fd7197341334"
> > stdout: Volume group "ceph-15daeeaa-b6d9-46a6-b955-fd7197341334"
> > successfully removed
> > --> Zapping: /dev/ceph-8dfd7f83-b60c-485b-9517-12203301a914/osd-db-d84267a8-057a-4b54-b1d4-7894e3eabec0
> > Running command: /usr/bin/dd if=/dev/zero
> > of=/dev/ceph-8dfd7f83-b60c-485b-9517-12203301a914/osd-db-d84267a8-057a-4b54-b1d4-7894e3eabec0
> > bs=1M count=10 conv=fsync
> > stderr: 10+0 records in
> > 10+0 records out
> > stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0310497 s, 338 MB/s
> > --> More than 1 LV left in VG, will proceed to destroy LV only
> > --> Removing LV because --destroy was given:
> > /dev/ceph-8dfd7f83-b60c-485b-9517-12203301a914/osd-db-d84267a8-057a-4b54-b1d4-7894e3eabec0
> > Running command: /usr/sbin/lvremove -v -f
> > /dev/ceph-8dfd7f83-b60c-485b-9517-12203301a914/osd-db-d84267a8-057a-4b54-b1d4-7894e3eabec0
> > stdout: Logical volume "osd-db-d84267a8-057a-4b54-b1d4-7894e3eabec0"
> > successfully removed
> > stderr: Removing
> > ceph--8dfd7f83--b60c--485b--9517--12203301a914-osd--db--d84267a8--057a--4b54--b1d4--7894e3eabec0
> > (253:1)
> > stderr: Archiving volume group
> > "ceph-8dfd7f83-b60c-485b-9517-12203301a914" metadata (seqno 25).
> > stderr: Releasing logical volume "osd-db-d84267a8-057a-4b54-b1d4-7894e3eabec0"
> > stderr: Creating volume group backup
> > "/etc/lvm/backup/ceph-8dfd7f83-b60c-485b-9517-12203301a914" (seqno
> > 26).
> > --> Zapping successful for OSD: 1
> > #
> >
> > [2]
> >
> > # ceph-volume lvm batch /dev/sdg --db-devices /dev/sdf --osd-ids 1
> > --> passed data devices: 1 physical, 0 LVM
> > --> relative data size: 1.0
> > --> passed block_db devices: 1 physical, 0 LVM
> > --> 1 fast devices were passed, but none are available
> >
> > Total OSDs: 0
> >
> > Type Path
> > LV Size % of device
> > --> The above OSDs would be created if the operation continues
> > --> do you want to proceed? (yes/no) no
> > _______________________________________________
> > ceph-users mailing list -- ceph-users(a)ceph.io
> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
> >
>
Hi all,
In 14.2.20, when re-creating a mixed OSD after device replacement,
ceph-volume batch is no longer able to find any available space for a
block_db.
Below I have shown a zap [1] which frees up the HDD and one LV on the
block-dbs VG.
But then we try to recreate, and none of the block-dbs are available
[2], even though there is free space on the VG:
VG #PV #LV #SN Attr VSize VFree
ceph-8dfd7f83-b60c-485b-9517-12203301a914 1 5 0 wz--n- <223.57g 37.26g
This bug looks similar: https://tracker.ceph.com/issues/49096
Is there something wrong with my procedure? Or does someone have an
idea how to make this work again?
Best Regards,
Dan
[1]
# systemctl stop ceph-osd@1
# ceph osd out 1
marked out osd.1.
# ceph-volume lvm zap --osd-id=1 --destroy
--> Zapping: /dev/ceph-15daeeaa-b6d9-46a6-b955-fd7197341334/osd-block-ef46b9bf-c85f-49d8-9db3-9ed164f5cc61
--> Unmounting /var/lib/ceph/osd/ceph-1
Running command: /usr/bin/umount -v /var/lib/ceph/osd/ceph-1
stderr: umount: /var/lib/ceph/osd/ceph-1 unmounted
Running command: /usr/bin/dd if=/dev/zero
of=/dev/ceph-15daeeaa-b6d9-46a6-b955-fd7197341334/osd-block-ef46b9bf-c85f-49d8-9db3-9ed164f5cc61
bs=1M count=10 conv=fsync
stderr: 10+0 records in
10+0 records out
stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0849557 s, 123 MB/s
--> Only 1 LV left in VG, will proceed to destroy volume group
ceph-15daeeaa-b6d9-46a6-b955-fd7197341334
Running command: /usr/sbin/vgremove -v -f
ceph-15daeeaa-b6d9-46a6-b955-fd7197341334
stderr: Removing
ceph--15daeeaa--b6d9--46a6--b955--fd7197341334-osd--block--ef46b9bf--c85f--49d8--9db3--9ed164f5cc61
(253:0)
stderr: Archiving volume group
"ceph-15daeeaa-b6d9-46a6-b955-fd7197341334" metadata (seqno 5).
stderr: Releasing logical volume
"osd-block-ef46b9bf-c85f-49d8-9db3-9ed164f5cc61"
stderr: Creating volume group backup
"/etc/lvm/backup/ceph-15daeeaa-b6d9-46a6-b955-fd7197341334" (seqno 6).
stdout: Logical volume
"osd-block-ef46b9bf-c85f-49d8-9db3-9ed164f5cc61" successfully removed
stderr: Removing physical volume "/dev/sdg" from volume group
"ceph-15daeeaa-b6d9-46a6-b955-fd7197341334"
stdout: Volume group "ceph-15daeeaa-b6d9-46a6-b955-fd7197341334"
successfully removed
--> Zapping: /dev/ceph-8dfd7f83-b60c-485b-9517-12203301a914/osd-db-d84267a8-057a-4b54-b1d4-7894e3eabec0
Running command: /usr/bin/dd if=/dev/zero
of=/dev/ceph-8dfd7f83-b60c-485b-9517-12203301a914/osd-db-d84267a8-057a-4b54-b1d4-7894e3eabec0
bs=1M count=10 conv=fsync
stderr: 10+0 records in
10+0 records out
stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0310497 s, 338 MB/s
--> More than 1 LV left in VG, will proceed to destroy LV only
--> Removing LV because --destroy was given:
/dev/ceph-8dfd7f83-b60c-485b-9517-12203301a914/osd-db-d84267a8-057a-4b54-b1d4-7894e3eabec0
Running command: /usr/sbin/lvremove -v -f
/dev/ceph-8dfd7f83-b60c-485b-9517-12203301a914/osd-db-d84267a8-057a-4b54-b1d4-7894e3eabec0
stdout: Logical volume "osd-db-d84267a8-057a-4b54-b1d4-7894e3eabec0"
successfully removed
stderr: Removing
ceph--8dfd7f83--b60c--485b--9517--12203301a914-osd--db--d84267a8--057a--4b54--b1d4--7894e3eabec0
(253:1)
stderr: Archiving volume group
"ceph-8dfd7f83-b60c-485b-9517-12203301a914" metadata (seqno 25).
stderr: Releasing logical volume "osd-db-d84267a8-057a-4b54-b1d4-7894e3eabec0"
stderr: Creating volume group backup
"/etc/lvm/backup/ceph-8dfd7f83-b60c-485b-9517-12203301a914" (seqno
26).
--> Zapping successful for OSD: 1
#
[2]
# ceph-volume lvm batch /dev/sdg --db-devices /dev/sdf --osd-ids 1
--> passed data devices: 1 physical, 0 LVM
--> relative data size: 1.0
--> passed block_db devices: 1 physical, 0 LVM
--> 1 fast devices were passed, but none are available
Total OSDs: 0
Type Path
LV Size % of device
--> The above OSDs would be created if the operation continues
--> do you want to proceed? (yes/no) no
Hi friends,
We've recently deployed a few all-flash OSD nodes to improve both bandwidth
and IOPS for active data processing in CephFS, but before taking it into
active production we've been tuning it to see how far we can get the
performance in practice - it would be interesting to hear your experience
both about the bandwidth that's realistic to expect, and any hints on
remaining profiling we could do to identify the bottlenecks. Is it possible
to have rados (or even cephfs) on a single host reach anywhere close to
line rate for 50Gb networking?
Our setup:
* We use four dedicated SSD-only OSD nodes (Dell R7515, EPYC 7302) each
with 16x Samsung PM883 7.68TB enterprise SSDs connected to a H740P raid
controller where each disk is configured as a RAID0 drive (we have tested
HBA mode too, and as expected the battery-backed write-back caching
significantly improves latency for small writes).
* The client node is a slightly older supermicro dual Xeon E5-2620v4. Both
the OSD nodes and clients have 128GB RAM, and CPU throttling has been
disabled.
* We use Mellanox 50Gb network cards, and when using iperf2 we get very
close to line speed throughput between all servers after doing the usual
sysconf settings to increase network buffers and increasing the card ring
buggers to at least 4096. (Say ~46Gb).
* All nodes have ceph pacific (16.2.0) installed through cephadm, and Linux
kernel 5.8.0 as part of Ubuntu 20.04.2. All storage is bluestore.
To start with plain Rados benchmarking (rados bench), the write performance
for 4M blocks is quite decent with a 3-fold-replicated pool. At 16 threads
we get 2.3GB/s, when bumping it to 32 threads it increases to roughly
2.8GB/s. The client load remains low during writing, and if we reduce the
replicated pool size to 2 instead of 3, these numbers improve to ~3.5GB/s
and ~4.2GB/s, so I assume the remaining overhead is due to latencies with
the extra copies. However, those numbers are good enough that we don't
really worry about it :-)
However.... when it comes to reading, we seem to be stuck at around 2GB/s
no matter what we try. The load on the client is also quite high, with the
"rados bench" process using ~300% CPU.
To test things, we decided to shut down one of the four OSD servers - which
hardly has any effect on writing throughput, and none whatsoever on the
read throughput. In other words, it seems the bottleneck is somewhere on
the client side?
Second, when we add CephFS, we lose quite another bit of performance. If we
copy a single large (5GB) file between cephfs and /dev/shm (dropping page
caches between trials), we see write performance of roughly 1.8GB/s, while
the read performance is just 1GB/s.
(For CephFS clients, we use the kernel client in Linux-5.8 with mount
options
noatime,nowsync,rsize=67108864,wsize=67108864,readdir_max_entries=8192,readdir_max_bytes=4194304,rasize=1073741824).
While the absolute performance is quite OK, it seems a bit sad to only
achieve ~35% of line rate for writes and as little as 10% for reads, so we
want to make sure we're not leaving anything on the table here.
Any suggestions what we could do to identify the bottlenecks would be
welcome; we'd be quite happy to invest in additional hardware if necessary,
but right now we're not quite sure what could be done to improve things :-)
All the best,
Erik
--
Erik Lindahl <erik.lindahl(a)gmail.com>
Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
Hi!
After upgrading MONs and MGRs successfully, the first OSD host I upgraded on Ubuntu Bionic from 14.2.16 to 15.2.10
shredded all OSDs on it by corrupting RocksDB, and they now refuse to boot.
RocksDB complains "Corruption: unknown WriteBatch tag".
The initial crash/corruption occured when the automatic fsck was ran, and when it committed the changes for a lot of "zombie spanning blobs".
Tracker issue with logs: https://tracker.ceph.com/issues/50017
Anyone else encountered this error? I've "suspended" the upgrade for now :)
-- Jonas
Hi,
we have a problem with rgw bilogs not being trimmed. This is a multisite with 2 sites on Nautilius 14.2.18, sync status is fine. radosgw-admin bilog list grows especially for buckets with lots of deletes.
So from time to time we get this "large OMAP" warnings, which we easily fix with bilog trim on the affected buckets and deep-scrubbing the PG.
As this is annoying I digged deeper and found these errors occurring 16 times in a batch for random buckets. It seems to affect every bucket. I think this is related to autotrimming which is not working in our case. The command "radosgw-admin bilog autotrim" shows the same messages (and does no trimming):
ERROR: failed to get bucket instance info for .bucket.meta.57d082821ddc45a097b2e3401388e93d/mybucket:feec71ec-408e-4a2b-b3bd-4d214e7c8685.7724460.7
But these are buckets/instances that exists (no shards btw):
radosgw-admin metadata get bucket.instance:57d082821ddc45a097b2e3401388e93d/mybucket:feec71ec-408e-4a2b-b3bd-4d214e7c8685.7724460.7
{
"key": "bucket.instance:57d082821ddc45a097b2e3401388e93d/mybucket:feec71ec-408e-4a2b-b3bd-4d214e7c8685.7724460.7",
"ver": {
"tag": "_2pmP8x8r0exeEyQ7pn2r0nG",
"ver": 1
},
"mtime": "2020-01-20 08:52:34.220312Z",
"data": {
"bucket_info": {
"bucket": {
"name": "mybucket",
"marker": "feec71ec-408e-4a2b-b3bd-4d214e7c8685.7724460.7",
"bucket_id": "feec71ec-408e-4a2b-b3bd-4d214e7c8685.7724460.7",
"tenant": "57d082821ddc45a097b2e3401388e93d",
Does that look familiar to anyone? Any ideas where to dig further?
Best regards,
Björn
HI,
we still have the problem that our rgw eats more diskspace than it should.
Summing up the "size_kb_actual" of all buckets show only half of the used
diskspace.
There are 312TiB stored acording to "ceph df" but we only need around 158TB.
I've already wrote to this ML with the problem, but there were no solutions
that would help.
I've doug through the ML archive and found some interesting threads
regarding orphan objects and these kind of issues.
Did someone ever solved this problem?
Or do you just add more disk space.
we tried to:
* use the "radosgw-admin orphan find/finish" tool (didn't work)
* manually triggering the GC (didn't work)
currently running (since yesterday evening):
* rgw-orphan-list, which procused 270GB of text output, and it's not done
yet (I have 60GB diskspace left)
--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
A DocuBetter meeting is scheduled for later this week at 11AM AEST
Thursday, which is 6PM PDT Wednesday. This meeting is not much attended,
though, so unless I get responses to this email thread, I'm not going to
hold it.
This email is a sincere request for documentation complaints. If anything
about the documentation irritates you, now's the time to tell me. If
anything about the documentation is incomplete or incorrect, I'm the guy
you should tell. You don't have to attend the DocuBetter Meetings to get
changes into the documentation, you just have to ask me and explain clearly
what needs to be changed.
A couple of documentation initiatives are underway right now: I am cleaning
the syntax of the whole documentation suite (a multi-month project that is
as tedious as it sounds... I am currently in the middle of the cephadm
documentation), and the consolidation of the Intro Guide and part of the
Developer Guide as well as the beefing up of that material when it is
consolidated.
So: If someone responds to this email and says that they will be at the
DocuBetter meeting, then I will hold the meeting. However, if by twelve
hours before the time of the meeting no one has responded, there will be no
DocuBetter meeting. Remember: even if no one wants to have the meeting this
week, this does not mean that you can't get changes into the docs. Write to
me anytime with your complaints. No complaint is too crude, no irritation
will be summarily dismissed.
That's it.
Here are the links relevant to DocuBetter Meetings:
Meeting: https://bluejeans.com/908675367
Etherpad: https://pad.ceph.com/p/Ceph_Documentation
Hi Amit:
Both clusters have a lot of recovering shards. Actually I do not know if
it’s normal or not. 🙁
The rgw_rados_hander is the default value, I have not touched this
parameter. Do I need to increase this value?
Thanks
Amit Ghadge <amitg.b14(a)gmail.com>于2021年4月26日 周一下午10:42写道:
> Both clusters show sync status are up to date or is there any behind
> shards, how many gw endpoints you configure and what rgw_rados_hander
> parameters are set?
>
> On Mon, Apr 26, 2021 at 8:59 AM 特木勒 <twl007(a)gmail.com> wrote:
>
>> Another problem I notice for a new bucket, the first object in the bucket
>> will not be sync. the sync will start with the second object. I tried to
>> fix the index on the bucket and manually rerun bucket sync, but the first
>> object still does not sync with secondary cluster.
>>
>> Do you have any ideas for this issue?
>>
>> Thanks
>>
>> 特木勒 <twl007(a)gmail.com> 于2021年4月26日周一 上午11:16写道:
>>
>>> Hi Istvan:
>>>
>>> Thanks Amit's suggestion.
>>>
>>> I followed his suggestion to fix bucket index and re-do sync on buckets,
>>> but it still did not work for me.
>>>
>>> Then I tried to use bucket rewrite command to rewrite all the objects in
>>> buckets and it works for me. I think the reason is there's something wrong
>>> with bucket index and rewrite has rebuilt the index.
>>>
>>> Here's the command I use:
>>> `sudo radosgw-admin bucket rewrite -b BUCKET-NAME --min-rewrite-size 0`
>>>
>>> Maybe you can try this to fix the sync issues.
>>>
>>> @Amit Ghadge <amitg.b14(a)gmail.com> Thanks for your suggestions. Without
>>> your suggestions, I will not notice something wrong with index part.
>>>
>>> Thanks :)
>>>
>>> Szabo, Istvan (Agoda) <Istvan.Szabo(a)agoda.com> 于2021年4月26日周一 上午9:57写道:
>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> No, doesn’t work, now we will write our own sync app for ceph, I gave
>>>> up.
>>>>
>>>>
>>>>
>>>> Istvan Szabo
>>>> Senior Infrastructure Engineer
>>>> ---------------------------------------------------
>>>> Agoda Services Co., Ltd.
>>>> e: istvan.szabo(a)agoda.com
>>>> ---------------------------------------------------
>>>>
>>>>
>>>>
>>>> *From:* 特木勒 <twl007(a)gmail.com>
>>>> *Sent:* Friday, April 23, 2021 7:50 PM
>>>> *To:* Szabo, Istvan (Agoda) <Istvan.Szabo(a)agoda.com>
>>>> *Cc:* ceph-users(a)ceph.io
>>>> *Subject:* Re: [Suspicious newsletter] [ceph-users] RGW: Multiple Site
>>>> does not sync olds data
>>>>
>>>>
>>>>
>>>> Hi Istvan:
>>>>
>>>>
>>>>
>>>> We just upgraded whole cluster to 15.2.10 and the multiple site still
>>>> cannot sync whole objects to secondary cluster. 🙁
>>>>
>>>>
>>>>
>>>> Do you have any suggestions on this? And I open another issues in ceph
>>>> tracker site:
>>>>
>>>> https://tracker.ceph.com/issues/50474
>>>>
>>>>
>>>>
>>>> Hope someone could go to check this issue.
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> 特木勒 <twl007(a)gmail.com>于2021年3月22日 周一下午9:08写道:
>>>>
>>>> Thank you~
>>>>
>>>>
>>>>
>>>> I will try to upgrade cluster too. Seem like this is the only way for
>>>> now. 😭
>>>>
>>>>
>>>>
>>>> I will let you know once I complete testing. :)
>>>>
>>>>
>>>>
>>>> Have a good day
>>>>
>>>>
>>>>
>>>> Szabo, Istvan (Agoda) <Istvan.Szabo(a)agoda.com>于2021年3月22日 周一下午3:38写道:
>>>>
>>>> Yeah, doesn't work. Last week they fixed my problem ticket which caused
>>>> the crashes, and due to the crashes stopped the replication I'll give a try
>>>> this week again after the update if the daemon doesn't crash, maybe it will
>>>> work, because if crash hasn't happened, the data was synced. Fingers
>>>> crossed ;) Don't give up 😄
>>>> ------------------------------
>>>>
>>>> *From:* 特木勒 <twl007(a)gmail.com>
>>>> *Sent:* Monday, March 22, 2021 1:38 PM
>>>> *To:* Szabo, Istvan (Agoda) <Istvan.Szabo(a)agoda.com>
>>>> *Cc:* ceph-users(a)ceph.io <ceph-users(a)ceph.io>
>>>>
>>>>
>>>> *Subject:* Re: [Suspicious newsletter] [ceph-users] RGW: Multiple Site
>>>> does not sync olds data
>>>>
>>>>
>>>>
>>>> Hi Istvan:
>>>>
>>>>
>>>>
>>>> Do you have any update on directional sync?
>>>>
>>>>
>>>>
>>>> I am trying to upgrade cluster to 15.2.10 to see if the problem is
>>>> solved. :(
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> Szabo, Istvan (Agoda) <Istvan.Szabo(a)agoda.com> 于2021年3月1日周一 上午10:01写道:
>>>>
>>>> So-so. I had some interruption so it failed on one site, but the other
>>>> is kind of working. This is the first time when I saw data caught up in the
>>>> radosgw-admin data sync status on 1 side.
>>>>
>>>> Today will finish the other problematic site, I’ll let you know the
>>>> result is it working or not.
>>>>
>>>>
>>>>
>>>> Istvan Szabo
>>>> Senior Infrastructure Engineer
>>>> ---------------------------------------------------
>>>> Agoda Services Co., Ltd.
>>>> e: istvan.szabo(a)agoda.com
>>>> ---------------------------------------------------
>>>>
>>>>
>>>>
>>>> *From:* 特木勒 <twl007(a)gmail.com>
>>>> *Sent:* Sunday, February 28, 2021 1:34 PM
>>>> *To:* Szabo, Istvan (Agoda) <Istvan.Szabo(a)agoda.com>
>>>> *Cc:* ceph-users(a)ceph.io
>>>> *Subject:* Re: [Suspicious newsletter] [ceph-users] RGW: Multiple Site
>>>> does not sync olds data
>>>>
>>>>
>>>>
>>>> Email received from outside the company. If in doubt don't click links
>>>> nor open attachments!
>>>> ------------------------------
>>>>
>>>> Hi Istvan:
>>>>
>>>>
>>>>
>>>> Thanks for your reply.
>>>>
>>>>
>>>>
>>>> Does directional sync solve the problem? I tried to run `radosgw-admin
>>>> sync init`, bit it still did not work. :(
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> Szabo, Istvan (Agoda) <Istvan.Szabo(a)agoda.com> 于2021年2月26日周五 上午7:47写道:
>>>>
>>>> Same for me, 15.2.8 also.
>>>> I’m trying directional sync now, looks like symmetrical has issue.
>>>>
>>>> Istvan Szabo
>>>> Senior Infrastructure Engineer
>>>> ---------------------------------------------------
>>>> Agoda Services Co., Ltd.
>>>> e: istvan.szabo(a)agoda.com<mailto:istvan.szabo@agoda.com>
>>>> ---------------------------------------------------
>>>>
>>>> On 2021. Feb 26., at 1:03, 特木勒 <twl007(a)gmail.com> wrote:
>>>>
>>>> Email received from outside the company. If in doubt don't click links
>>>> nor open attachments!
>>>> ________________________________
>>>>
>>>> Hi all:
>>>>
>>>> ceph version: 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8)
>>>>
>>>> I have a strange question, I just create a multiple site for Ceph
>>>> cluster.
>>>> But I notice the old data of source cluster is not synced. Only new data
>>>> will be synced into second zone cluster.
>>>>
>>>> Is there anything I need to do to enable full sync for bucket or this
>>>> is a
>>>> bug?
>>>>
>>>> Thanks
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>>
>>>> ________________________________
>>>> This message is confidential and is for the sole use of the intended
>>>> recipient(s). It may also be privileged or otherwise protected by copyright
>>>> or other legal rules. If you have received it by mistake please let us know
>>>> by reply email and delete it from your system. It is prohibited to copy
>>>> this message or disclose its content to anyone. Any confidentiality or
>>>> privilege is not waived or lost by any mistaken delivery or unauthorized
>>>> disclosure of the message. All messages sent to and from Agoda may be
>>>> monitored to ensure compliance with company policies, to protect the
>>>> company's interests and to remove potential malware. Electronic messages
>>>> may be intercepted, amended, lost or deleted, or contain viruses.
>>>>
>>>>
>>>>
>>>> Internal
>>>>
>>>>
Hi
I have a ceph cluster running Nautilus. The ceph services are hosted on
CentOS7
servers.
Right now I have:
- 3 servers, each one running MON+MGR
- 10 servers running OSDs
- 2 servers running RGW
I need to update this cluster to CentOS8 (actually CentOS stream 8) and
Pacific.
What is the update path that you would suggest ?
I was thinking:
CentOS7-Nautilus --> CentOS8-Nautilus --> CentOS8-Pacific
but I don't know if this really the best solution.
For example, as far as I understand ceph-deploy isn't available on CentOS8,
and there is no documentation on how to manually (i.e without using
ceph-deploy) re-deploy RGW on CentOS8
Thanks, Massimo
Hi,
I have a pg where it has been run the following command:
ceph pg 44.1aa mark_unfound_lost delete
After the cluster never report the unknown pgs which was actually the goal to ran this.
However this pg is now inconsistent and can't be deepscrubbed.
ceph health detail
HEALTH_ERR 214275 scrub errors; Possible data damage: 1 pg inconsistent; 1 pgs not deep-scrubbed in time
[ERR] OSD_SCRUB_ERRORS: 214275 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
pg 44.1aa is active+clean+inconsistent, acting [59,128,127,43]
[WRN] PG_NOT_DEEP_SCRUBBED: 1 pgs not deep-scrubbed in time
pg 44.1aa not deep-scrubbed since 2021-01-14T05:50:23.852626+0100
ceph pg dump pgs_brief|grep 'ACTING_PRIMARY\|44.1aa'
dumped pgs_brief
PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY
44.1aa active+clean+inconsistent [59,128,127,43] 59 [59,128,127,43] 59
Any idea what to do with it?
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.